News:

The Forum Rules and Guidelines
Our forum has Rules and Guidelines. Please, be kind and read them ;).

pak files and their images

Started by ojii, October 11, 2012, 09:38:11 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ojii

Can anyone shed some light on the pak file format?


More specifically, how would I go about trying to read the image data in a pak file (for preview purposes)?


I've been trying to read the source, but seeing I am not proficient at C++ that turned out to be not too useful.

prissi

There are some program around to read pak files. However they change with every revision. Extracting the image is easier, just have a look at image reader and the patch directory.

Ters

For simple things like vehicles, you might get away with just finding an image inside the file somewhere. For other stuff, you possibly need to combine the matching front and back images to get a meaningful image of the thing. With multi-tile buildings, it's even more "fun", as you need to combine the images for the right tiles in sequence. It's not impossible, I've done it, but I consider myself fluent in C++ (though what I wrote was in Java, which I code professionally).

ojii

Okay, thanks for the answers. Looks like this will be a lot harder than I thought. I really wished simutrans used formats for it's files (pak/savegames) that was documented and doesn't change on each revision...

Dwachs

Quote from: ojii on October 12, 2012, 08:27:52 AM
I really wished simutrans used formats for it's files (pak/savegames) that was documented and doesn't change on each revision...
The code is the documentation of the pak-file format. If the code changes, the file-format changes. Of course it is backward compatible, new simutrans versions can read old pak-files.

What kind of program/service are you intending to do? Where would you like to see these previews?
Parsley, sage, rosemary, and maggikraut.

ojii

I'm almost done writing a tool that can manage pak files. One thing that would be super cool would be to have a preview of the paks. I'll just finish the tool without that though.

ojii

FTR: Here's what I mean with a format that "doesn't change on each revision": A format that can be (partially) read by old readers even if the file is a newer version.

How can this be done?

Using a format that goes a little like this (just an example, hope you get the idea):



int: length of header
... headers
int: length of content 1
... content 1
int: length of content 2
... content 2
...




New headers would be added at the *end* of the headers. so older versions would just ignore those bits and hope that they can still figure out what the file does.


The stuff about content 1/content 2 is in case there's more than one bit of information in a file (which in the case of paks would probably be true).


The headers would eg say: this is a vehicle. it has 4 images. the images start at byte a, b, c and d. etc.


The same kind of logic could be applied to savegames to allow tools to be written outside the simutrans codebase to do interesting and hopefully useful things with it.

Combuijs

That would help a bit, but only a bit. For instance if we get a new way type (represented by a number I presume) then the format does not change, old pak files can be read by new software, but new pak files can't be read by old software.

It is always very difficult to predict in your software what your future changes will be and it is next to impossible to cater for all possible eventualities.

Instead you should never write your own reader, but use the reader available. Far from easy if you are not using C++ for your tool (int this case), but a far better solution. Better ask for interface facilities on the reader and writer (COM-interface?, I almost do not dare to ask...). Or write them themselves...
Bob Marley: No woman, no cry

Programmer: No user, no bugs



VS

Is there a reason why you can't use the sources for this information?

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

Dwachs

Quote from: ojii on October 12, 2012, 09:51:07 AM
I'm almost done writing a tool that can manage pak files.
Manage as in 'File Manager' ?

If you would have one wish free, what would you like to have? A pak-to-png converter?

About your second question. The pak files are in a tree-like structure. Each node in the tree has a four letter identifier (TEXT, XREF, IMAG, etc). So you could scan the pak-files and extract some fields, while igoring others.
Parsley, sage, rosemary, and maggikraut.

ojii

Manage as in a website that knows about paks and their relations, and a command line tool (GUI will follow) to upload, install, activate, deactivate and remove paks.

Works quite nicely so far. The CLI needs a little overhaul to be easier to use and the website needs a real HTML frontend (right now it's pretty much just an API).

The reason behind this is that I got annoyed at how paks and especially their addons are handled. I mainly use pak128.japan, but with a ton of addons. Now some of those I want to remove but I have no way of knowing which files belong to which addon. The tool I wrote keeps track of that and I can deactivate a single addon to a pak, while leaving the pak active.

Ters

In a sense, the pak file format hasn't changed in a long time. The internal structure in the individual nodes, and which nodes can be present does change. And it has to, or new features would be impossible.

One problem when dealing with individual pak files, is that they are interlinked. As far as I can tell, a pak file for a vehicle might contain no images at all, just references to images that must be loaded from another pak file.

I did consider making a web front end for my pak browser, but it doesn't look like Java web servers for rent are anywhere near as easy to find as PHP, or even ASP.

greenling

ojii
I Like your idea.
This tool bring overview over my addons.
Opening hours 20:00 - 23:00
(In Night from friday on saturday and saturday on sunday it possibly that i be keep longer in Forum.)
I am The Assistant from Pakfilearcheologist!
Working on a big Problem!

VS

Quote from: Ters on October 12, 2012, 04:30:09 PM
One problem when dealing with individual pak files, is that they are interlinked. As far as I can tell, a pak file for a vehicle might contain no images at all, just references to images that must be loaded from another pak file.
Not true - fortunately. The interlinking refers mostly to goods and constraints (I think?), where you have a name as string and must resolve it to pointer to other object. It happens after paks are loaded, and is a number one nightmare in the codebase, due to heavy templatization... or so I remember. It's "that dreaded XREF resolver". What could confuse you is that objects' images are loaded into a global list and can be reused, but only after loading again. In paks they must be completely present.

As far as I can tell, the pak format is hierarchical but without explicit start/end markers, like PNG has. At the same time, the numerical fields (speed, capacity, ...) are not tagged in any way, so you must know the length of a node a priori. (Please correct this if I'm wrong!) This makes it impossible to write an application that can skip unknown data.

Knowing PNG's internals fairly well, I must agree with ojii that PAKs' structure is very poor in this aspect, and that one can do a much better job. However, that point is pointless as paks are here to stay in their current form.

Okay, okay, let's not be only negative ;) The hierarchy in a pak file is not arbitrary, you can easily tell to what "level" a node type belongs. IIRC all nodes are versioned, so having a database of versions and lengths could help with parsing at least into proper structure, ignoring "raw" data. And, since image node format does not change often, extracting these could be done easily.

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

Ters

Makeobj doesn't xref all over the place, but the format and loading allows it. There are also some dummy xrefs that indicate no data if I remember correctly.

All nodes contain a length field, so skipping whole nodes is possible. I did that a lot until I had parsers for all node types in place. Within a "besch" node, you must have knowledge about the individual fields for all supported versions, but as fields don't have variable size (it would then be a subnode), you can skip the fields you're not interested in.

The header of each node is 4 bytes of type (mostly descriptive ASCII characters), a 16 bit integer with the number of nested child nodes, a 16 bit integer with the length of the node (not including children), and possibly a 32 bit integer with the length of the node if the length didn't fit in the 16 bit value (the 16 bit field is then 0xFFFF). Next follows the payload for the node, then child nodes.

For some reason, a pak file doesn't start immediately with the nodes, but with some junk bytes one must seek past.

VS

So I stand corrected. Then... the format is self-descriptive enough and writing a parser is easy.

I wish I could vote -1 for myself :D

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

Ters

Parsing the data is reasonably easy. Interpreting what it means can be a bit harder, especially for buildings (and factories, which contain a building node). Buildings have the most complex structure of image nodes, with layouts, tiles, foreground/background, seasons and animation frames. They are also very diverse, where the meaning of some fields depends on which type of building it is.

prissi

And not to forget the six index list of list of 2d list of 2s list ...

Starting a new pak format I would probably also go for HDF (which PNG is only a subset); especially since I have written a parser for that since years ago. The actual chnages to the pak write reader would be even small. Still it would be in the end a list of inters, where the software has to givethem a meaning.

But makeobj can read paks, that is what dump actually does. Using the reader for certain onjects to create an instance is then rather trivial (really). That would load any revision correctly.