News:

Do you need help?
Simutrans Wiki Manual can help you to play and extend Simutrans. In 9 languages.

Duplicate images saved only once by makeobj

Started by The Hood, November 26, 2011, 05:17:07 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

The Hood

As discussed in the pak128.Britain forum (http://forum.simutrans.com/index.php?topic=4454.msg80445;topicseen#msg80445) the current implementation of makeobj does not recognise (AFAIK) when one or more dats point to the same image file (correct me if I'm wrong here).  This results in very large pak files (compared to the original png) and presumably also eats into the limit on number of image definitions allowed.  It strikes me as inefficient and I'm hoping this could be improved (it would particularly reduce the download size for pak128.Britain, and maybe other paksets too).  Would it be possible for makeobj to detect this somehow and only save the image file once in the pak file?  This could either be multiple definitions within the same dat file or even multiple definitions within the same batched pak file.

wlindley

Where is the canonical definition for the .pak file format?

Dwachs

These duplicate images should are detected upon loading the pak. So at least, memory consumption of simutrans (or maximum number of loaded images) are not affected.
Parsley, sage, rosemary, and maggikraut.

The Hood

OK, so that's one less problem, but I would still think it makes more sense to do this on pak creation rather than pak loading - how big a change would this be?

Dwachs

Parsley, sage, rosemary, and maggikraut.

jamespetts

Quote from: wlindley on November 26, 2011, 05:24:35 PM
Where is the canonical definition for the .pak file format?

In the open-source code, I suspect will be the answer.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

The pak files are autobuilt more or less by the objects themselves, so there is not canonical definition.

Since every images saves itself, it would require some hacks to discover identical images in a consistent way, and additional logic to copy them (or remove them) in case of pak merging.

VS

To expand Prissi's answer a bit: Pak file contents are hierarchically arranged, XML could be a good analogy. You can't really share items deep down in the tree, unless you sidestep from the format (not a tree anymore!).

It's wasteful of disk space, but compressing such files works at least for transfer :)

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

Fabio

If the problem is related (mostly) to winter images, could it be possible to define along BackImage and FrontImage a WinterImage (and maybe SummerImage) layer(s) containing only the seasonal differences?

prissi

One could of course change the image definition to an index into a list which is last in any pak file. Then upon loading (like the xrefs) those would be loaded afterwards. When paks are splitted or merged those index needs to be renumbered or even merged resp. duplicated.

However, the system is not so easy to follow, to put it mildly.

If somebody starts working there, one could also change the image definition for pointer into 2D lists into something more sensible like an arrays of indices. That way lookups of mages would not need to transverse down lot of lists and there would be always an array entry.

wlindley

Seems like the pak format could use a makeover. 

The first step should be actually documenting the current file format. 

Anyone with experience in this department, if you could help write the new wiki page, using the besch/writer/*.cc files as a guide, let's see what we can do to move to a more compressed, more capable pak format.

(A particularly nice side benefit of documenting the pak format, will be that alternate makeobj-like programs could be written in Perl, (or Python, if you must), perhaps even including a program that returns the binary format back to the component PNGs and text definitions, for all those add-ons whose sources have been lost.)

prissi

It does not make any sense to document the pak format beyond the most basic object struct, as any renewed object could (and did) easily redefine order and amount of fields. It will be a snapshot of pak format for simutrans 111.0 r49xx or so. Do not waste time on this, especially as there are routines to read and write paks. And there are programms to retrieve images and the dats are in the text when loading simutrans wiht -debug 3.

wlindley

It makes perfect sense to document the format -- unless it's supposed to remain some deep dark secret like Microsoft's Word Document format. 

Simutrans can read old .pak files, right?  So let's document it. 

And we keep running into this same discussion because anyone who wants to help code needs and wants to know how things work.  So let's stop pushing people away and tell them how it works.

Indeed the concept that "we shouldn't document a file format" is the best reason for replacing an obscured binary format with a text format one, or at least something more like TIFF which has known tag-groups and is extensible.  In fact, why not just have Simutrans itself read the dat and png files?

See section 5.1, "The Importance of Being Textual", in Eric S. Raymond's "The Art of UNIX Programming" for a discussion of why binary formats like paks cause more problems than they are worth; specifically, the preamble about nroff: a direct analogue to this situation.

Combuijs

The routines to read and write pak-files are the perfect format documentation. No need for seperate documentation which always will be three versions behind.

Having said that, I wonder how fast it would be to import direct from .png and .dat files. Computers are really fast these days, might be feasible...
Bob Marley: No woman, no cry

Programmer: No user, no bugs



Fabio

And most st paksets are now open, but not pak96 iirc.
Maybe paks could be bzipped  .pak.bz2 and uncompressed during loading.

prissi

Simutrans has had the ability to read directly from the png and dat (actually, this was the normal way very early. This ability to directly put the structures together exists and it would not too difficult (only tedious) to add this again. But it opens countless complications and hassles if images doe not stay with there definitions. Especially when using subfolders and the like.

As harddisk space is not the limit, bz2 ing will only slow down loading. But that could be done easily.

Fabio

If we wanted an open pak format reducing risks, a bz2 file should contain both dat and png. They would be shipped together and use less disk space.

prissi

bz2 is a format that explicitely only contains a single block of data ...

Ashley

.png images can store metadata I believe, maybe we could put the text components of the .pak file into the PNG metadata? That way the game objects would literally just be a PNG format file with additional metadata within it.

The entire serialisation model employed by Simutrans could do with a rework IMO, I found this when working on the networking code too, it's a huge job though, rather daunting!
Use Firefox? Interested in IPv6? Try SixOrNot the IPv6 status indicator for Firefox.
Why not try playing Simutrans online? See the Game Servers board for details.

Fabio

Quote from: Timothy on November 28, 2011, 10:56:26 PM
.png images can store metadata I believe, maybe we could put the text components of the .pak file into the PNG metadata? That way the game objects would literally just be a PNG format file with additional metadata within it.

This would be *great*

VS

#20
I thought about PNG, too, but many programs just drop chunks instead of doing the right thing. Different extension would be needed, if only to prevent this. edit:  Pak files are almost universally larger than png.

Serialization... not my thing to judge, but there is certain logic ;)

edit2: To address one of the original questions - duplicates in paks are stored in memory as one image after loading, so no harm there.

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

The Hood

A more pragmatic question associated with this now:

Quote from: Dwachs on November 26, 2011, 05:26:21 PM
These duplicate images should are detected upon loading the pak. So at least, memory consumption of simutrans (or maximum number of loaded images) are not affected.

When checking for duplicate images does simutrans check for duplicate image references (i.e. same png file and location within png) or actual duplicate images (i.e. two identical sprites in different places/pngs)?  If the former, I'm assuming it's therefore best to reference the same png file/location for lots of different objects, something pak128.Britain doesn't always do but probably could save a lot of images/memory by doing this?

Dwachs

Quote from: The Hood on December 11, 2011, 02:30:12 PM
When checking for duplicate images does simutrans check for duplicate image references (i.e. same png file and location within png) or actual duplicate images (i.e. two identical sprites in different places/pngs)?
The latter, pak files contain only pixels, no references to png files.
Parsley, sage, rosemary, and maggikraut.

The Hood

So just to double check, If image.0.0 is identical to image.1.1 and I reference them both, simutrans still recognises these as duplicates?

VS

1) PNG file can contain anything, but only the part of pictures that are referenced from DATs are saved to PAK.
2) When loading PAK, same images are removed. Identity is determined by content checksum.

So, yes.

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!