The International Simutrans Forum

Development => Extension Requests => Topic started by: The Hood on November 26, 2011, 05:17:07 PM

Title: Duplicate images saved only once by makeobj
Post by: The Hood on November 26, 2011, 05:17:07 PM
As discussed in the pak128.Britain forum (http://forum.simutrans.com/index.php?topic=4454.msg80445;topicseen#msg80445) the current implementation of makeobj does not recognise (AFAIK) when one or more dats point to the same image file (correct me if I'm wrong here).  This results in very large pak files (compared to the original png) and presumably also eats into the limit on number of image definitions allowed.  It strikes me as inefficient and I'm hoping this could be improved (it would particularly reduce the download size for pak128.Britain, and maybe other paksets too).  Would it be possible for makeobj to detect this somehow and only save the image file once in the pak file?  This could either be multiple definitions within the same dat file or even multiple definitions within the same batched pak file.
Title: Re: Duplicate images saved only once by makeobj
Post by: wlindley on November 26, 2011, 05:24:35 PM
Where is the canonical definition for the .pak file format?
Title: Re: Duplicate images saved only once by makeobj
Post by: Dwachs on November 26, 2011, 05:26:21 PM
These duplicate images should are detected upon loading the pak. So at least, memory consumption of simutrans (or maximum number of loaded images) are not affected.
Title: Re: Duplicate images saved only once by makeobj
Post by: The Hood on November 26, 2011, 05:34:35 PM
OK, so that's one less problem, but I would still think it makes more sense to do this on pak creation rather than pak loading - how big a change would this be?
Title: Re: Duplicate images saved only once by makeobj
Post by: Dwachs on November 26, 2011, 05:39:59 PM
somewhere between 0 and 1 ... :-X
Title: Re: Duplicate images saved only once by makeobj
Post by: jamespetts on November 27, 2011, 10:33:38 PM
Quote from: wlindley on November 26, 2011, 05:24:35 PM
Where is the canonical definition for the .pak file format?

In the open-source code, I suspect will be the answer.
Title: Re: Duplicate images saved only once by makeobj
Post by: prissi on November 27, 2011, 11:08:15 PM
The pak files are autobuilt more or less by the objects themselves, so there is not canonical definition.

Since every images saves itself, it would require some hacks to discover identical images in a consistent way, and additional logic to copy them (or remove them) in case of pak merging.
Title: Re: Duplicate images saved only once by makeobj
Post by: VS on November 28, 2011, 08:41:32 AM
To expand Prissi's answer a bit: Pak file contents are hierarchically arranged, XML could be a good analogy. You can't really share items deep down in the tree, unless you sidestep from the format (not a tree anymore!).

It's wasteful of disk space, but compressing such files works at least for transfer :)
Title: Re: Duplicate images saved only once by makeobj
Post by: Fabio on November 28, 2011, 09:25:33 AM
If the problem is related (mostly) to winter images, could it be possible to define along BackImage and FrontImage a WinterImage (and maybe SummerImage) layer(s) containing only the seasonal differences?
Title: Re: Duplicate images saved only once by makeobj
Post by: prissi on November 28, 2011, 09:28:03 AM
One could of course change the image definition to an index into a list which is last in any pak file. Then upon loading (like the xrefs) those would be loaded afterwards. When paks are splitted or merged those index needs to be renumbered or even merged resp. duplicated.

However, the system is not so easy to follow, to put it mildly.

If somebody starts working there, one could also change the image definition for pointer into 2D lists into something more sensible like an arrays of indices. That way lookups of mages would not need to transverse down lot of lists and there would be always an array entry.
Title: Re: Duplicate images saved only once by makeobj
Post by: wlindley on November 28, 2011, 11:24:33 AM
Seems like the pak format could use a makeover. 

The first step should be actually documenting the current file format. 

Anyone with experience in this department, if you could help write the new wiki page (http://en.wiki.simutrans.com/index.php/Pak_File_Format), using the besch/writer/*.cc files as a guide, let's see what we can do to move to a more compressed, more capable pak format.

(A particularly nice side benefit of documenting the pak format, will be that alternate makeobj-like programs could be written in Perl, (or Python, if you must), perhaps even including a program that returns the binary format back to the component PNGs and text definitions, for all those add-ons whose sources have been lost.)
Title: Re: Duplicate images saved only once by makeobj
Post by: prissi on November 28, 2011, 12:29:27 PM
It does not make any sense to document the pak format beyond the most basic object struct, as any renewed object could (and did) easily redefine order and amount of fields. It will be a snapshot of pak format for simutrans 111.0 r49xx or so. Do not waste time on this, especially as there are routines to read and write paks. And there are programms to retrieve images and the dats are in the text when loading simutrans wiht -debug 3.
Title: Re: Duplicate images saved only once by makeobj
Post by: wlindley on November 28, 2011, 01:17:33 PM
It makes perfect sense to document the format -- unless it's supposed to remain some deep dark secret like Microsoft's Word Document format. 

Simutrans can read old .pak files, right?  So let's document it. 

And we keep running into this same discussion because anyone who wants to help code needs and wants to know how things work.  So let's stop pushing people away and tell them how it works.

Indeed the concept that "we shouldn't document a file format" is the best reason for replacing an obscured binary format with a text format one, or at least something more like TIFF which has known tag-groups and is extensible.  In fact, why not just have Simutrans itself read the dat and png files?

See section 5.1, "The Importance of Being Textual", in Eric S. Raymond's "The Art of UNIX Programming" for a discussion of why binary formats like paks cause more problems than they are worth; specifically, the preamble about nroff: a direct analogue to this situation.
Title: Re: Duplicate images saved only once by makeobj
Post by: Combuijs on November 28, 2011, 01:25:54 PM
The routines to read and write pak-files are the perfect format documentation. No need for seperate documentation which always will be three versions behind.

Having said that, I wonder how fast it would be to import direct from .png and .dat files. Computers are really fast these days, might be feasible...
Title: Re: Duplicate images saved only once by makeobj
Post by: Fabio on November 28, 2011, 01:30:20 PM
And most st paksets are now open, but not pak96 iirc.
Maybe paks could be bzipped  .pak.bz2 and uncompressed during loading.
Title: Re: Duplicate images saved only once by makeobj
Post by: prissi on November 28, 2011, 01:39:31 PM
Simutrans has had the ability to read directly from the png and dat (actually, this was the normal way very early. This ability to directly put the structures together exists and it would not too difficult (only tedious) to add this again. But it opens countless complications and hassles if images doe not stay with there definitions. Especially when using subfolders and the like.

As harddisk space is not the limit, bz2 ing will only slow down loading. But that could be done easily.
Title: Re: Duplicate images saved only once by makeobj
Post by: Fabio on November 28, 2011, 01:57:53 PM
If we wanted an open pak format reducing risks, a bz2 file should contain both dat and png. They would be shipped together and use less disk space.
Title: Re: Duplicate images saved only once by makeobj
Post by: prissi on November 28, 2011, 08:05:03 PM
bz2 is a format that explicitely only contains a single block of data ...
Title: Re: Duplicate images saved only once by makeobj
Post by: Ashley on November 28, 2011, 10:56:26 PM
.png images can store metadata I believe, maybe we could put the text components of the .pak file into the PNG metadata? That way the game objects would literally just be a PNG format file with additional metadata within it.

The entire serialisation model employed by Simutrans could do with a rework IMO, I found this when working on the networking code too, it's a huge job though, rather daunting!
Title: Re: Duplicate images saved only once by makeobj
Post by: Fabio on November 29, 2011, 07:46:25 AM
Quote from: Timothy on November 28, 2011, 10:56:26 PM
.png images can store metadata I believe, maybe we could put the text components of the .pak file into the PNG metadata? That way the game objects would literally just be a PNG format file with additional metadata within it.

This would be *great*
Title: Re: Duplicate images saved only once by makeobj
Post by: VS on November 29, 2011, 08:40:17 AM
I thought about PNG, too, but many programs just drop chunks instead of doing the right thing. Different extension would be needed, if only to prevent this. edit:  Pak files are almost universally larger than png.

Serialization... not my thing to judge, but there is certain logic ;)

edit2: To address one of the original questions - duplicates in paks are stored in memory as one image after loading, so no harm there.
Title: Re: Duplicate images saved only once by makeobj
Post by: The Hood on December 11, 2011, 02:30:12 PM
A more pragmatic question associated with this now:

Quote from: Dwachs on November 26, 2011, 05:26:21 PM
These duplicate images should are detected upon loading the pak. So at least, memory consumption of simutrans (or maximum number of loaded images) are not affected.

When checking for duplicate images does simutrans check for duplicate image references (i.e. same png file and location within png) or actual duplicate images (i.e. two identical sprites in different places/pngs)?  If the former, I'm assuming it's therefore best to reference the same png file/location for lots of different objects, something pak128.Britain doesn't always do but probably could save a lot of images/memory by doing this?
Title: Re: Duplicate images saved only once by makeobj
Post by: Dwachs on December 11, 2011, 02:56:48 PM
Quote from: The Hood on December 11, 2011, 02:30:12 PM
When checking for duplicate images does simutrans check for duplicate image references (i.e. same png file and location within png) or actual duplicate images (i.e. two identical sprites in different places/pngs)?
The latter, pak files contain only pixels, no references to png files.
Title: Re: Duplicate images saved only once by makeobj
Post by: The Hood on December 11, 2011, 03:11:48 PM
So just to double check, If image.0.0 is identical to image.1.1 and I reference them both, simutrans still recognises these as duplicates?
Title: Re: Duplicate images saved only once by makeobj
Post by: VS on December 11, 2011, 03:29:09 PM
1) PNG file can contain anything, but only the part of pictures that are referenced from DATs are saved to PAK.
2) When loading PAK, same images are removed. Identity is determined by content checksum.

So, yes.