News:

Congratulations!
 You've won the News Item Lottery! Your prize? Reading this news item! :)

pak loader: bounds-check descriptor reads

Started by janry, May 28, 2026, 10:23:09 PM

Previous topic - Next topic

0 Members and 4 Guests are viewing this topic.

janry

Some hardening, but also I couldnt resist a bit of refactor ;D

Tightens malformed-pak handling so the loader fails with a
diagnostic instead of walking past the buffer.

- obj_desc_t stores nchildren; get_child<T>(i) dbg->fatals on
  out-of-bounds i.  Pins a NULL-deref reachable via
  skin_reader_t::register_obj on a 0-child MENU node.

- node_body wraps one node's fread'd bytes in a bounds-checked
  cursor that carries the reader's type name.  Each reader's
  read_node starts with
    auto p = node_body(fp, node.size, get_type_name());
    if (!p) return NULL;
  and the existing decode_*(p) calls keep working via free
  decode_uint*(node_body&) overloads.  The raw-pointer
  decode_uint*(char*&) overloads stay; pakset_manager.cc still
  uses them for the 5 node-header reads.  Fatal on overrun
  reports offset / total / need / have and the reader's name.

- read_node_info caps a large-record node.size at remaining
  file bytes so a truncated pak doesn't drive a multi-GB
  malloc.  obj_named_desc_t::get_name picks up the same
  NULL-check the sibling get_copyright already has;
  register_desc<> skips when get_name returns NULL.

One gap not closed here: image_t::alloc(decode_uint32(p)) still
takes its length straight from the buffer.  Same
allocation-size class as the read_node_info cap but at the
reader level.  Happy to follow up if useful.

prissi

Did you check of how much this additionally fseek changes the loading of paks? Because this is done many million times.

Also, simutrans would crash on a malformed pak later anyway, as there is no system in place to unload paks.

Also, the nchildren entry adds 8 bytes to every node in memory (for 64 bit builds), which is a big memory penalty cachewise for something on load time. One may discuss this for debug builds though.

Objs without names cannot exist since this is the fundamental mechanism as how simutrans refers to objects. I have added code to makeobj to explicitely fail on building instead of crashing.

Personally, auto variables are evil. In C you should always know your variables. Also, your node_body leaks memory as the char is never deleted. The array_tpl is allocated on the stack and hence freed afterwards.

I am not opposed to changing decode_xxx to do bounds check (although it still will fail to fatal without a chance to output the faulty pack name). However, that could be done with the array_tpl too ...

So again a mixed response, sorry. Also, for debug node children check only, you would need to rely on the preprocessor ...

janry

Thanks for the review. I appreciate you spending your time on it.

Do you think it is worth guarding against malformed paks? Because if not - then my changes are really not useful. I think simutrans must guard against malformed paks. pak is not a part of the program, it is runtime data, that is determined by users actions and environment (internet). Pakset downloader has closed list of source URLs hardcoded, but it is http and nobody can know for sure what will be read from unencrypted HTTP connection.

Edit: I'm asking this to know should I improve on the codestyle and performance in my patch, or is this pointless, because it solves a nonexistent problem.

janry

refined and more focused version

prissi

#4
I think guarding against malformed paks is not needed. It may crash simutrans BUT
  • Simutrans does not run with any special privileges (not much to gain)
  • Userbase is small, we have very few servers around ... (not much to gain)
  • It would take a lot of effort for an exploit which will likely not work in the next version or with a slightly different compiler (i.e. the steam executable are built differently with different compilers from the normal and the flatpak releases) So any exploit would only work on a very small number.
  • The target audience of Simutrans is too small. A bad actor would be noticed quickly. Since we have full control over the serverlist, sources and distribution, we could quickly exclude such sites.

The paksetdownloadeder routines download a zip file (or use an external installer on windows and shell scripts on linux/Mac), and the client game is either a bz, Z, or usually a zstd file. Targeting the bz2lib, libzstd or zlib/libz seems way more easier and likely to succeed, since hosting a fake server and sending out fake game files using a common exploit in those libaries seems the likeliest way if one ever wants to online attack simutrans.

I think your patch is fine; I would just replace auto by node_body_t (and add a _t to the class to keep with simutrans style). The type is known after all.

Furthermore, now with this in place it opens up for further optimisations. Liek allocing on the stack or using a static buffer. Bot might lead to considerably speeding up the pakset loading as the countless allocations and deletions are quite slow.

janry

new patch has no auto. it appears in the file itself, but is removed immediately. the file contains three or so commits. should i squash them?

janry

squashed version with node_body_t

prissi

I did a lot of tests. Using a static buffer, the routine is almost as fast as the original routine, just about 8% slower (of course in debugmode, so with release builds, it might be even faster).

Comitted the slightly modified routines using the static buffer.

ceeac

There was still typo for the uint64 overread check, this is fixed in r11990.

janry

Some performance gain by checking whole pixel block range at once.