News:

Simutrans.com Portal
Our Simutrans site. You can find everything about Simutrans from here.

pak loader: bounds-check descriptor reads

Started by janry, May 28, 2026, 10:23:09 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

janry

Some hardening, but also I couldnt resist a bit of refactor ;D

Tightens malformed-pak handling so the loader fails with a
diagnostic instead of walking past the buffer.

- obj_desc_t stores nchildren; get_child<T>(i) dbg->fatals on
  out-of-bounds i.  Pins a NULL-deref reachable via
  skin_reader_t::register_obj on a 0-child MENU node.

- node_body wraps one node's fread'd bytes in a bounds-checked
  cursor that carries the reader's type name.  Each reader's
  read_node starts with
    auto p = node_body(fp, node.size, get_type_name());
    if (!p) return NULL;
  and the existing decode_*(p) calls keep working via free
  decode_uint*(node_body&) overloads.  The raw-pointer
  decode_uint*(char*&) overloads stay; pakset_manager.cc still
  uses them for the 5 node-header reads.  Fatal on overrun
  reports offset / total / need / have and the reader's name.

- read_node_info caps a large-record node.size at remaining
  file bytes so a truncated pak doesn't drive a multi-GB
  malloc.  obj_named_desc_t::get_name picks up the same
  NULL-check the sibling get_copyright already has;
  register_desc<> skips when get_name returns NULL.

One gap not closed here: image_t::alloc(decode_uint32(p)) still
takes its length straight from the buffer.  Same
allocation-size class as the read_node_info cap but at the
reader level.  Happy to follow up if useful.

prissi

Did you check of how much this additionally fseek changes the loading of paks? Because this is done many million times.

Also, simutrans would crash on a malformed pak later anyway, as there is no system in place to unload paks.

Also, the nchildren entry adds 8 bytes to every node in memory (for 64 bit builds), which is a big memory penalty cachewise for something on load time. One may discuss this for debug builds though.

Objs without names cannot exist since this is the fundamental mechanism as how simutrans refers to objects. I have added code to makeobj to explicitely fail on building instead of crashing.

Personally, auto variables are evil. In C you should always know your variables. Also, your node_body leaks memory as the char is never deleted. The array_tpl is allocated on the stack and hence freed afterwards.

I am not opposed to changing decode_xxx to do bounds check (although it still will fail to fatal without a chance to output the faulty pack name). However, that could be done with the array_tpl too ...

So again a mixed response, sorry. Also, for debug node children check only, you would need to rely on the preprocessor ...

janry

Thanks for the review. I appreciate you spending your time on it.

Do you think it is worth guarding against malformed paks? Because if not - then my changes are really not useful. I think simutrans must guard against malformed paks. pak is not a part of the program, it is runtime data, that is determined by users actions and environment (internet). Pakset downloader has closed list of source URLs hardcoded, but it is http and nobody can know for sure what will be read from unencrypted HTTP connection.

Edit: I'm asking this to know should I improve on the codestyle and performance in my patch, or is this pointless, because it solves a nonexistent problem.

janry

refined and more focused version