Some hardening, but also I couldnt resist a bit of refactor ;D
Tightens malformed-pak handling so the loader fails with a
diagnostic instead of walking past the buffer.
- obj_desc_t stores nchildren; get_child<T>(i) dbg->fatals on
out-of-bounds i. Pins a NULL-deref reachable via
skin_reader_t::register_obj on a 0-child MENU node.
- node_body wraps one node's fread'd bytes in a bounds-checked
cursor that carries the reader's type name. Each reader's
read_node starts with
auto p = node_body(fp, node.size, get_type_name());
if (!p) return NULL;
and the existing decode_*(p) calls keep working via free
decode_uint*(node_body&) overloads. The raw-pointer
decode_uint*(char*&) overloads stay; pakset_manager.cc still
uses them for the 5 node-header reads. Fatal on overrun
reports offset / total / need / have and the reader's name.
- read_node_info caps a large-record node.size at remaining
file bytes so a truncated pak doesn't drive a multi-GB
malloc. obj_named_desc_t::get_name picks up the same
NULL-check the sibling get_copyright already has;
register_desc<> skips when get_name returns NULL.
One gap not closed here: image_t::alloc(decode_uint32(p)) still
takes its length straight from the buffer. Same
allocation-size class as the read_node_info cap but at the
reader level. Happy to follow up if useful.
Did you check of how much this additionally fseek changes the loading of paks? Because this is done many million times.
Also, simutrans would crash on a malformed pak later anyway, as there is no system in place to unload paks.
Also, the nchildren entry adds 8 bytes to every node in memory (for 64 bit builds), which is a big memory penalty cachewise for something on load time. One may discuss this for debug builds though.
Objs without names cannot exist since this is the fundamental mechanism as how simutrans refers to objects. I have added code to makeobj to explicitely fail on building instead of crashing.
Personally, auto variables are evil. In C you should always know your variables. Also, your node_body leaks memory as the char is never deleted. The array_tpl is allocated on the stack and hence freed afterwards.
I am not opposed to changing decode_xxx to do bounds check (although it still will fail to fatal without a chance to output the faulty pack name). However, that could be done with the array_tpl too ...
So again a mixed response, sorry. Also, for debug node children check only, you would need to rely on the preprocessor ...
Thanks for the review. I appreciate you spending your time on it.
Do you think it is worth guarding against malformed paks? Because if not - then my changes are really not useful. I think simutrans must guard against malformed paks. pak is not a part of the program, it is runtime data, that is determined by users actions and environment (internet). Pakset downloader has closed list of source URLs hardcoded, but it is http and nobody can know for sure what will be read from unencrypted HTTP connection.
Edit: I'm asking this to know should I improve on the codestyle and performance in my patch, or is this pointless, because it solves a nonexistent problem.
refined and more focused version
I think guarding against malformed paks is not needed. It may crash simutrans
BUT- Simutrans does not run with any special privileges (not much to gain)
- Userbase is small, we have very few servers around ... (not much to gain)
- It would take a lot of effort for an exploit which will likely not work in the next version or with a slightly different compiler (i.e. the steam executable are built differently with different compilers from the normal and the flatpak releases) So any exploit would only work on a very small number.
- The target audience of Simutrans is too small. A bad actor would be noticed quickly. Since we have full control over the serverlist, sources and distribution, we could quickly exclude such sites.
The paksetdownloadeder routines download a zip file (or use an external installer on windows and shell scripts on linux/Mac), and the client game is either a bz, Z, or usually a zstd file. Targeting the bz2lib, libzstd or zlib/libz seems way more easier and likely to succeed, since hosting a fake server and sending out fake game files using a common exploit in those libaries seems the likeliest way if one ever wants to online attack simutrans.
I think your patch is fine; I would just replace auto by node_body_t (and add a _t to the class to keep with simutrans style). The type is known after all.
Furthermore, now with this in place it opens up for further optimisations. Liek allocing on the stack or using a static buffer. Bot might lead to considerably speeding up the pakset loading as the countless allocations and deletions are quite slow.
new patch has no auto. it appears in the file itself, but is removed immediately. the file contains three or so commits. should i squash them?
squashed version with node_body_t