The International Simutrans Forum

Development => Patches & Projects => Topic started by: janry on May 28, 2026, 10:23:09 PM

Title: pak loader: bounds-check descriptor reads
Post by: janry on May 28, 2026, 10:23:09 PM
Some hardening, but also I couldnt resist a bit of refactor ;D

Tightens malformed-pak handling so the loader fails with a
diagnostic instead of walking past the buffer.

- obj_desc_t stores nchildren; get_child<T>(i) dbg->fatals on
  out-of-bounds i.  Pins a NULL-deref reachable via
  skin_reader_t::register_obj on a 0-child MENU node.

- node_body wraps one node's fread'd bytes in a bounds-checked
  cursor that carries the reader's type name.  Each reader's
  read_node starts with
    auto p = node_body(fp, node.size, get_type_name());
    if (!p) return NULL;
  and the existing decode_*(p) calls keep working via free
  decode_uint*(node_body&) overloads.  The raw-pointer
  decode_uint*(char*&) overloads stay; pakset_manager.cc still
  uses them for the 5 node-header reads.  Fatal on overrun
  reports offset / total / need / have and the reader's name.

- read_node_info caps a large-record node.size at remaining
  file bytes so a truncated pak doesn't drive a multi-GB
  malloc.  obj_named_desc_t::get_name picks up the same
  NULL-check the sibling get_copyright already has;
  register_desc<> skips when get_name returns NULL.

One gap not closed here: image_t::alloc(decode_uint32(p)) still
takes its length straight from the buffer.  Same
allocation-size class as the read_node_info cap but at the
reader level.  Happy to follow up if useful.
Title: Re: pak loader: bounds-check descriptor reads
Post by: prissi on May 29, 2026, 03:37:39 AM
Did you check of how much this additionally fseek changes the loading of paks? Because this is done many million times.

Also, simutrans would crash on a malformed pak later anyway, as there is no system in place to unload paks.

Also, the nchildren entry adds 8 bytes to every node in memory (for 64 bit builds), which is a big memory penalty cachewise for something on load time. One may discuss this for debug builds though.

Objs without names cannot exist since this is the fundamental mechanism as how simutrans refers to objects. I have added code to makeobj to explicitely fail on building instead of crashing.

Personally, auto variables are evil. In C you should always know your variables. Also, your node_body leaks memory as the char is never deleted. The array_tpl is allocated on the stack and hence freed afterwards.

I am not opposed to changing decode_xxx to do bounds check (although it still will fail to fatal without a chance to output the faulty pack name). However, that could be done with the array_tpl too ...

So again a mixed response, sorry. Also, for debug node children check only, you would need to rely on the preprocessor ...
Title: Re: pak loader: bounds-check descriptor reads
Post by: janry on May 29, 2026, 09:28:09 AM
Thanks for the review. I appreciate you spending your time on it.

Do you think it is worth guarding against malformed paks? Because if not - then my changes are really not useful. I think simutrans must guard against malformed paks. pak is not a part of the program, it is runtime data, that is determined by users actions and environment (internet). Pakset downloader has closed list of source URLs hardcoded, but it is http and nobody can know for sure what will be read from unencrypted HTTP connection.

Edit: I'm asking this to know should I improve on the codestyle and performance in my patch, or is this pointless, because it solves a nonexistent problem.
Title: Re: pak loader: bounds-check descriptor reads
Post by: janry on May 29, 2026, 07:15:00 PM
refined and more focused version
Title: Re: pak loader: bounds-check descriptor reads
Post by: prissi on May 30, 2026, 09:02:43 AM
I think guarding against malformed paks is not needed. It may crash simutrans BUT

The paksetdownloadeder routines download a zip file (or use an external installer on windows and shell scripts on linux/Mac), and the client game is either a bz, Z, or usually a zstd file. Targeting the bz2lib, libzstd or zlib/libz seems way more easier and likely to succeed, since hosting a fake server and sending out fake game files using a common exploit in those libaries seems the likeliest way if one ever wants to online attack simutrans.

I think your patch is fine; I would just replace auto by node_body_t (and add a _t to the class to keep with simutrans style). The type is known after all.

Furthermore, now with this in place it opens up for further optimisations. Liek allocing on the stack or using a static buffer. Bot might lead to considerably speeding up the pakset loading as the countless allocations and deletions are quite slow.
Title: Re: pak loader: bounds-check descriptor reads
Post by: janry on May 30, 2026, 09:33:28 AM
new patch has no auto. it appears in the file itself, but is removed immediately. the file contains three or so commits. should i squash them?
Title: Re: pak loader: bounds-check descriptor reads
Post by: janry on May 30, 2026, 09:42:43 AM
squashed version with node_body_t