The International Simutrans Forum

 

Author Topic: Duplicate images saved only once by makeobj  (Read 5232 times)

0 Members and 1 Guest are viewing this topic.

Offline The Hood

  • Devotee
  • *
  • Posts: 2889
  • pak128.Britain developer
Duplicate images saved only once by makeobj
« on: November 26, 2011, 05:17:07 PM »
As discussed in the pak128.Britain forum (http://forum.simutrans.com/index.php?topic=4454.msg80445;topicseen#msg80445) the current implementation of makeobj does not recognise (AFAIK) when one or more dats point to the same image file (correct me if I'm wrong here).  This results in very large pak files (compared to the original png) and presumably also eats into the limit on number of image definitions allowed.  It strikes me as inefficient and I'm hoping this could be improved (it would particularly reduce the download size for pak128.Britain, and maybe other paksets too).  Would it be possible for makeobj to detect this somehow and only save the image file once in the pak file?  This could either be multiple definitions within the same dat file or even multiple definitions within the same batched pak file.

Offline wlindley us

  • Devotee
  • *
  • Posts: 975
    • Hacking for fun and profit since 1977
  • Languages: EN, DE
Re: Duplicate images saved only once by makeobj
« Reply #1 on: November 26, 2011, 05:24:35 PM »
Where is the canonical definition for the .pak file format?

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4594
  • Languages: EN, DE, AT
Re: Duplicate images saved only once by makeobj
« Reply #2 on: November 26, 2011, 05:26:21 PM »
These duplicate images should are detected upon loading the pak. So at least, memory consumption of simutrans (or maximum number of loaded images) are not affected.

Offline The Hood

  • Devotee
  • *
  • Posts: 2889
  • pak128.Britain developer
Re: Duplicate images saved only once by makeobj
« Reply #3 on: November 26, 2011, 05:34:35 PM »
OK, so that's one less problem, but I would still think it makes more sense to do this on pak creation rather than pak loading - how big a change would this be?

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4594
  • Languages: EN, DE, AT
Re: Duplicate images saved only once by makeobj
« Reply #4 on: November 26, 2011, 05:39:59 PM »
somewhere between 0 and 1 ... :-X

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18721
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Duplicate images saved only once by makeobj
« Reply #5 on: November 27, 2011, 10:33:38 PM »
Where is the canonical definition for the .pak file format?

In the open-source code, I suspect will be the answer.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Duplicate images saved only once by makeobj
« Reply #6 on: November 27, 2011, 11:08:15 PM »
The pak files are autobuilt more or less by the objects themselves, so there is not canonical definition.

Since every images saves itself, it would require some hacks to discover identical images in a consistent way, and additional logic to copy them (or remove them) in case of pak merging.

Offline VS

  • Senior Plumber (Devotee)
  • Devotee
  • *
  • Posts: 4855
  • Vladimír Slávik
    • VS's Simutrans site
  • Languages: CS,EN
Re: Duplicate images saved only once by makeobj
« Reply #7 on: November 28, 2011, 08:41:32 AM »
To expand Prissi's answer a bit: Pak file contents are hierarchically arranged, XML could be a good analogy. You can't really share items deep down in the tree, unless you sidestep from the format (not a tree anymore!).

It's wasteful of disk space, but compressing such files works at least for transfer :)

Offline Fabio

  • Devotee
  • Administrator
  • *
  • Posts: 2898
  • The Pak128 Guy
    • Visit me on Facebook
  • Languages: EN, IT, RO, FR
Re: Duplicate images saved only once by makeobj
« Reply #8 on: November 28, 2011, 09:25:33 AM »
If the problem is related (mostly) to winter images, could it be possible to define along BackImage and FrontImage a WinterImage (and maybe SummerImage) layer(s) containing only the seasonal differences?

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Duplicate images saved only once by makeobj
« Reply #9 on: November 28, 2011, 09:28:03 AM »
One could of course change the image definition to an index into a list which is last in any pak file. Then upon loading (like the xrefs) those would be loaded afterwards. When paks are splitted or merged those index needs to be renumbered or even merged resp. duplicated.

However, the system is not so easy to follow, to put it mildly.

If somebody starts working there, one could also change the image definition for pointer into 2D lists into something more sensible like an arrays of indices. That way lookups of mages would not need to transverse down lot of lists and there would be always an array entry.

Offline wlindley us

  • Devotee
  • *
  • Posts: 975
    • Hacking for fun and profit since 1977
  • Languages: EN, DE
Re: Duplicate images saved only once by makeobj
« Reply #10 on: November 28, 2011, 11:24:33 AM »
Seems like the pak format could use a makeover. 

The first step should be actually documenting the current file format. 

Anyone with experience in this department, if you could help write the new wiki page, using the besch/writer/*.cc files as a guide, let's see what we can do to move to a more compressed, more capable pak format.

(A particularly nice side benefit of documenting the pak format, will be that alternate makeobj-like programs could be written in Perl, (or Python, if you must), perhaps even including a program that returns the binary format back to the component PNGs and text definitions, for all those add-ons whose sources have been lost.)

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Duplicate images saved only once by makeobj
« Reply #11 on: November 28, 2011, 12:29:27 PM »
It does not make any sense to document the pak format beyond the most basic object struct, as any renewed object could (and did) easily redefine order and amount of fields. It will be a snapshot of pak format for simutrans 111.0 r49xx or so. Do not waste time on this, especially as there are routines to read and write paks. And there are programms to retrieve images and the dats are in the text when loading simutrans wiht -debug 3.

Offline wlindley us

  • Devotee
  • *
  • Posts: 975
    • Hacking for fun and profit since 1977
  • Languages: EN, DE
Re: Duplicate images saved only once by makeobj
« Reply #12 on: November 28, 2011, 01:17:33 PM »
It makes perfect sense to document the format -- unless it's supposed to remain some deep dark secret like Microsoft's Word Document format. 

Simutrans can read old .pak files, right?  So let's document it. 

And we keep running into this same discussion because anyone who wants to help code needs and wants to know how things work.  So let's stop pushing people away and tell them how it works.

Indeed the concept that "we shouldn't document a file format" is the best reason for replacing an obscured binary format with a text format one, or at least something more like TIFF which has known tag-groups and is extensible.  In fact, why not just have Simutrans itself read the dat and png files?

See section 5.1, "The Importance of Being Textual", in Eric S. Raymond's "The Art of UNIX Programming" for a discussion of why binary formats like paks cause more problems than they are worth; specifically, the preamble about nroff: a direct analogue to this situation.

Offline Combuijs

  • Web Team
  • Devotee
  • *
  • Posts: 1392
  • Maintainer of maps.simutrans.com
    • Combuijs
  • Languages: EN, NL
Re: Duplicate images saved only once by makeobj
« Reply #13 on: November 28, 2011, 01:25:54 PM »
The routines to read and write pak-files are the perfect format documentation. No need for seperate documentation which always will be three versions behind.

Having said that, I wonder how fast it would be to import direct from .png and .dat files. Computers are really fast these days, might be feasible...

Offline Fabio

  • Devotee
  • Administrator
  • *
  • Posts: 2898
  • The Pak128 Guy
    • Visit me on Facebook
  • Languages: EN, IT, RO, FR
Re: Duplicate images saved only once by makeobj
« Reply #14 on: November 28, 2011, 01:30:20 PM »
And most st paksets are now open, but not pak96 iirc.
Maybe paks could be bzipped  .pak.bz2 and uncompressed during loading.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Duplicate images saved only once by makeobj
« Reply #15 on: November 28, 2011, 01:39:31 PM »
Simutrans has had the ability to read directly from the png and dat (actually, this was the normal way very early. This ability to directly put the structures together exists and it would not too difficult (only tedious) to add this again. But it opens countless complications and hassles if images doe not stay with there definitions. Especially when using subfolders and the like.

As harddisk space is not the limit, bz2 ing will only slow down loading. But that could be done easily.

Offline Fabio

  • Devotee
  • Administrator
  • *
  • Posts: 2898
  • The Pak128 Guy
    • Visit me on Facebook
  • Languages: EN, IT, RO, FR
Re: Duplicate images saved only once by makeobj
« Reply #16 on: November 28, 2011, 01:57:53 PM »
If we wanted an open pak format reducing risks, a bz2 file should contain both dat and png. They would be shipped together and use less disk space.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Duplicate images saved only once by makeobj
« Reply #17 on: November 28, 2011, 08:05:03 PM »
bz2 is a format that explicitely only contains a single block of data ...

Offline Ashley

  • Coder/Patcher
  • Devotee
  • *
  • Posts: 1288
    • entropy.me.uk
Re: Duplicate images saved only once by makeobj
« Reply #18 on: November 28, 2011, 10:56:26 PM »
.png images can store metadata I believe, maybe we could put the text components of the .pak file into the PNG metadata? That way the game objects would literally just be a PNG format file with additional metadata within it.

The entire serialisation model employed by Simutrans could do with a rework IMO, I found this when working on the networking code too, it's a huge job though, rather daunting!

Offline Fabio

  • Devotee
  • Administrator
  • *
  • Posts: 2898
  • The Pak128 Guy
    • Visit me on Facebook
  • Languages: EN, IT, RO, FR
Re: Duplicate images saved only once by makeobj
« Reply #19 on: November 29, 2011, 07:46:25 AM »
.png images can store metadata I believe, maybe we could put the text components of the .pak file into the PNG metadata? That way the game objects would literally just be a PNG format file with additional metadata within it.

This would be *great*

Offline VS

  • Senior Plumber (Devotee)
  • Devotee
  • *
  • Posts: 4855
  • Vladimír Slávik
    • VS's Simutrans site
  • Languages: CS,EN
Re: Duplicate images saved only once by makeobj
« Reply #20 on: November 29, 2011, 08:40:17 AM »
I thought about PNG, too, but many programs just drop chunks instead of doing the right thing. Different extension would be needed, if only to prevent this. edit:  Pak files are almost universally larger than png.

Serialization... not my thing to judge, but there is certain logic ;)

edit2: To address one of the original questions - duplicates in paks are stored in memory as one image after loading, so no harm there.
« Last Edit: November 29, 2011, 08:48:44 AM by VS »

Offline The Hood

  • Devotee
  • *
  • Posts: 2889
  • pak128.Britain developer
Re: Duplicate images saved only once by makeobj
« Reply #21 on: December 11, 2011, 02:30:12 PM »
A more pragmatic question associated with this now:

These duplicate images should are detected upon loading the pak. So at least, memory consumption of simutrans (or maximum number of loaded images) are not affected.

When checking for duplicate images does simutrans check for duplicate image references (i.e. same png file and location within png) or actual duplicate images (i.e. two identical sprites in different places/pngs)?  If the former, I'm assuming it's therefore best to reference the same png file/location for lots of different objects, something pak128.Britain doesn't always do but probably could save a lot of images/memory by doing this?

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4594
  • Languages: EN, DE, AT
Re: Duplicate images saved only once by makeobj
« Reply #22 on: December 11, 2011, 02:56:48 PM »
When checking for duplicate images does simutrans check for duplicate image references (i.e. same png file and location within png) or actual duplicate images (i.e. two identical sprites in different places/pngs)?
The latter, pak files contain only pixels, no references to png files.

Offline The Hood

  • Devotee
  • *
  • Posts: 2889
  • pak128.Britain developer
Re: Duplicate images saved only once by makeobj
« Reply #23 on: December 11, 2011, 03:11:48 PM »
So just to double check, If image.0.0 is identical to image.1.1 and I reference them both, simutrans still recognises these as duplicates?

Offline VS

  • Senior Plumber (Devotee)
  • Devotee
  • *
  • Posts: 4855
  • Vladimír Slávik
    • VS's Simutrans site
  • Languages: CS,EN
Re: Duplicate images saved only once by makeobj
« Reply #24 on: December 11, 2011, 03:29:09 PM »
1) PNG file can contain anything, but only the part of pictures that are referenced from DATs are saved to PAK.
2) When loading PAK, same images are removed. Identity is determined by content checksum.

So, yes.