hi, I am a collector nightly build information (for some years).
(a organized nightly build information posted on my website in Japanese.)
I tried to use build r7411, that could load images over 65534.
but build r7252, that couldn't load images over 65534.
I hypothesized two things,"simutrans don't load surplus images excess amount of 65534"
or "simutrans can load 65534 from r7411".
please give me when simutras can load images over 65534.
65534 what? Is this sprite pixel area? Or is it number of separate sprite images? It sounds like a limit caused by using a uint16_t variable/parameter.
The nightly builds follow the SVN closely with exception of a few months ago where a make error resulted in Windows nightly failing to build. Each commit to the SVN contains a brief description what was changed (although I agree it often lacks detail).
Quote from: DrSuperGood on December 11, 2014, 02:40:42 AM
65534 what? Is this sprite pixel area? Or is it number of separate sprite images? It sounds like a limit caused by using a uint16_t variable/parameter.
The nightly builds follow the SVN closely with exception of a few months ago where a make error resulted in Windows nightly failing to build. Each commit to the SVN contains a brief description what was changed (although I agree it often lacks detail).
I don't know sprite pixel area and number of separate sprite images.
I only know 'can use png file(s) less than 65534'.
sure, I know '65534' is max length of uint16_t.
Japanese developer rewrite uint16_t to uint32_t because of to enable use add-on(s) more than 65534.
This was discussed on a private developers forum post that maybe can be a good idea to move to public discussion. It was agreed to expand the number of images but no final action was taken.
switching that 16 bit id to 32 bit is indeed a solution, but raises memory consumption in-game.
http://forum.simutrans.com/index.php?topic=10105.0 (http://forum.simutrans.com/index.php?topic=10105.0)
Quote from: Markohs on December 11, 2014, 08:40:58 AM
This was discussed on a private developers forum post that maybe can be a good idea to move to public discussion. It was agreed to expand the number of images but no final action was taken.
switching that 16 bit id to 32 bit is indeed a solution, but raises memory consumption in-game.
http://forum.simutrans.com/index.php?topic=10105.0 (http://forum.simutrans.com/index.php?topic=10105.0)
I can't see that thread with message 'Specified thread or bulletin board, seems not found or is not allowed to view.'.
Yep, it's private.
To make it short, we discussed two aproaches to solve this issue, but I think switching the uint16 to uint32, was accepted as a solution. We'll see what prissi has to say about this, but I guess he will just change it to uint32 soon.
I agree switch uint16 to uint32, too.
sometimes, I was plagued for simutrans can't load image over 65534, I couldn't keep play the same save data (also, played whet is over two years).
switch uint16 to uint32, how many issues are happen?
Do any paksets have over 65534 sprite images in them? Or is this some limit on the rasterizer algorithm?
I have played some pretty well developed games in both experimental (which might have already changed that for all I know) and pak64 and never encountered such a limit.
No, paksets themselves stay within that limit for the moment. But if you add a lot of addons (basically all you can get in the same format) you run out of images.
Every image is only stored once, so duplicates are filtered out.
Quote from: Manche on December 11, 2014, 10:28:42 AM
switch uint16 to uint32, how many issues are happen?
Two things first: The game will use more memory, and somewhat related, it will run more slowly. (It will also run slower simply because you have many images.)
At some point, Simutrans might start crashing because it runs out of memory. Switching to uint32 means they can have more images than regular Simutrans has room for in memory (even virtual, actual amount of RAM on computer doesn't directly matter).
Quote from: DrSuperGood on December 11, 2014, 02:00:02 PM
I have played some pretty well developed games in both experimental (which might have already changed that for all I know) and pak64 and never encountered such a limit.
You seem like a logistics-minded type of player, given your focus on industry demand and supply. There are players who just want thing to look pretty, to the point of obsession, and don't care much about how vehicles run, as long as they move and bring some life to the scenery they've created. And some are probably a mix of both. No two players seems to look at and treat Simutrans the same.
Quote from: Ters on December 11, 2014, 04:39:04 PM
Two things first: The game will use more memory, and somewhat related, it will run more slowly. (It will also run slower simply because you have many images.)
At some point, Simutrans might start crashing because it runs out of memory. Switching to uint32 means they can have more images than regular Simutrans has room for in memory (even virtual, actual amount of RAM on computer doesn't directly matter).
there is no problem another two issues for here?
so I think solution of memory problem is 'prepare a 64bit edition a separate for traditional edition'.
64bit application can use memory more than 32bit one.
but preparing 64bit edition may be burden for developer and contributor.
Quote from: DrSuperGood on December 11, 2014, 02:00:02 PM
Do any paksets have over 65534 sprite images in them? Or is this some limit on the rasterizer algorithm?
I have played some pretty well developed games in both experimental (which might have already changed that for all I know) and pak64 and never encountered such a limit.
you can have a face 65534 problem, if you add all of Japanese author(s) addon.
Japanese user made many to many addons.
incidentally, my always use pak64 with many Japanese vehicle(include some of other country vehicle,too).
Quoteso I think solution of memory problem is 'prepare a 64bit edition a separate for traditional edition'.
64bit application can use memory more than 32bit one.
but preparing 64bit edition may be burden for developer and contributor.
This has been discussed many times. The conclusion was that migrating to full 64bit would slow the game down more than the extra memory and faster fixed point calculations it would bring are worth. Specifically due to the larger instruction and pointer sizes. That said there should be 64 bit builds available as the game is not particularly coupled to 32bit (just 64bit is not supported). Experimental for example is distributed as 64bit.
Also some inline assembly code would need to be rewritten in order for the game to perform well. Experimental 64bit suffers from abysmal performance due to some graphic calls not being optimized with inline assembly.
There is a compromise which James pulled out for experimental due to the server game running into the standard 32bit memory limit. It is possible to compile a 32bit build with extended addressing mode which allows for >>4GB virtual memory size. In fact 32bit extended address processes can allocate more memory than most household PCs can allow. Since it is still 32bit the performance is very similar to normal 32bit builds.
Quoteyou can have a face 65534 problem, if you add all of Japanese author(s) addon.
Japanese user made many to many addons.
incidentally, my always use pak64 with many Japanese vehicle(include some of other country vehicle,too).
How many of those sprites are actually in use at any given time? For example an old steam locomotive in 2050 is unlikely to be used unless for specific decorative reasons. The solution in this case would be to dynamically load sprites at run time rather than all sprites on load. This could add resources stalls but it would reduce the number of sprites massively as any sprite not actively in use would not be in memory.
This could be further extended to only visible sprites. The chances of someone having 65534 different sprites in view at any given time is probably impossible. Obviously the code would be more complicated for this and probably have some large overhead.
When I brought up that at some point Simutrans would run out of memory, I was thinking far ahead. 64-bit memory space and swapping images in and out memory are solutions to problems that is some distance away yet.
It is a little more effort needed than just make this a 32 bit pointer, as almost every object in the map stores this id and hence increase their size by 4 bytes. So it would have some effect on memory (like another 16 MB ram or so for a game with 4 million object (not much really). Also some routines need further changes. (I forgot if this was finished or not).
The only way to get over the maximum 65535 for now is putting two paksets in the same folder, like pak128.britain and pak128. However, all pak128 are not made to be mixed since their sizes do not match up.
pak64 has about 6372 images. That leaves place for 8000 different train cars. I am pretty sure the depot will be impossible to use before. (Not to mention that there are not 8000 cars on the japanese webseite, there are about 18 pages with 20 entires with on average less than 8 different cars in them, i.e. ~3000 entries) Not to mention that simutrans merges identical images (like left and right or back views), so the actual limit is even higher.
Some other sets:
pak128.britain 31256
pak128.japan 4406
pak128 36932
pak96.comic 7166
pak192.comic 10575
Even with pak128 you need more than 3500 train cars (assuming all have 8 completely different view). For pak128 there are 14 pages, again with about 20 entries and assuming 16 different cars: Yes, if you put every thing into a game (ignoring that several are there twice or three times and are from many place and so on), you might be just able to reach this limit.
We also have 64 bit nightlies for Linux, since the compatibility on Linux with 32 bit is really really bad. (Why is the a Linux standard dristribution if nobody uses those components anymore.) But that has nothing to do with this pointer.
If someone shows us a sensible mix of addons to reach this number, we increase it.
Since some freight cars have more than 8 different images (4/8 empty, 4/8 freight A, 4/8 freight B, 4/8 freight C, ...), the number of cars required for pak128 to reach the limit is somewhat less. It's still a lot of vehicles. But there are perhaps other add-ons that use image slots like crazy. Buildings can with multiple seasons, layouts and animations require more images than vehicles.
Just increment the id to 32bit, we are not in the year 95 anymore. The increase in memory usage is not huge, we will survive
QuoteThe increase in memory usage is not huge, we will survive
It is not so much memory usage, but also performance decrease since extra bytes of memory have to be read etc. Apparently some people still run Simutrans on systems with quite restricted resources.
The number of bytes read is almost certainly more important than the number of bytes used, as images ids tend to hang around in besch objects, which there are fairly few of. Besch objects are however read very often.
I constantly run out of images. I'd say it is a pressing problem.
You can easily run out of images with a few building sets and vehicles (prissi mentions the Japanese website - you can easily get up to 65535 images assuming you have double-height ways, buildings and station facilities and so on, it's very easy. You won't even go half-way before you reach the point.)
I just looked on wiki and a intel i5 cache line it's 64 bytes wide, and core 2 duo, too. You are just discussing over 2 bytes. Older systems, can't even run Windows 7. Aren't you taking old systems compatibility a bit too far here? Any modern CPU won't really be much affected by this 2 bytes increase.
I doubt it will take a big difference in performance, really. But I didn't benchmarked it, so I might be wrong.
This is just my point of view, I respect that others don't see the things like I do, of course.
QuoteI just looked on wiki and a intel i5 cache line it's 64 bytes wide, and core 2 duo, too. You are just discussing over 2 bytes. Older systems, can't even run Windows 7. Aren't you taking old systems compatibility a bit too far here? Any modern CPU won't really be much affected by this 2 bytes increase.
The target specs for Simutrans I thought were somewhere between late Pentium 3 and Early Pentium 4. I have played on many servers in the last year odd that were powered off Pentium 4 processors. Although a few years old now, Intel i3,5 and 7 are still very modern processors.
Personally I agree with you and even believe migrating to 64bit would not be as bad as people make out (all the fixed point mathematics would be a lot faster and the graphic code should not be that much slower once re-optimized). However I think a lot of Simutrans users do not have 64bit OS.
Well, I agree, but I was not really demanding 64 bit builds here, I was just expressing that turning image_id from 16-bit to 32, shoudn't be a big problem. :) A 64-bit build adds the problem that all pointers double size (from 4 byte, to 8), this might impact performance more than the image_id change. :)
Regarding "between late Pentium 3 & early Pentium 4: Wikipedia (http://en.wikipedia.org/wiki/Pentium_4) states that pentium 4s were introduced almost 15 years ago. I can't imagine many people regularly updating their games and running hardware that's at least 7 years old.
64bit aside, could we make a poll (maybe even site-wide) asking for peoples specs? I see few people on maps of 128² tiles and unaltered paks.
We have, or very recently had, a prominent player using AMD K6 (possibly K6-2) and Windows 2000 (or maybe he had just gotten Windows XP). We noticed this when the nightlies accidentally started requiring SSE. I think Simutrans is quite popular in countries were average wages are a quarter (or less) of what they are in the western world. (Although there are quite a few Windows XP machines around even in wealthy Norway. I even did Windows 98 support three years ago. Simutrans no longer supports that, if only because of Unicode.) Pentium IIIs and AMD K6-2s might be old even for them now, but dropping Pentium 4 will probably alienate many players. Players whos voice might not be heard on an English-speaking board.
It's not that we drop support for that old processors. It's just that simutrans whould require *sightly* more memory and I/O bandwith if we turn the id from 16 to 32 bit. This won't break anything. I see also some machines with Windows XP here in Spain/Catalonia, ofc. But they are not really common, and remember even Microsoft has droped support for that OS. It's nothing spectacular, let's accept those machines are obsolete.
Comeon, it's just converting a number from 16-bit to 32-bit, this should not be so dramatic.
It probably is not as simple as changing a uint16 to unit32 in a declaration. Every piece of code that manipulates the variable would need to be revised to make sure that uint16 bottlenecks are not introduced elsewhere, that is unless it uses a special type definition of uint16 was used for everything that manipulates the images. For future maintainability it might be a good idea to bring in a custom type definition for that field so that future migrations are easier.
It's not hard, I've changed it in the past, only some things need to be changed in simgraph.cc/h and not much more, iirc. maybe simimg.h too, but I remember that piece of code, was more or less well non-coupled with the rest of code, it was designed more or less okay.
Quote from: Markohs on December 12, 2014, 11:23:54 PM
It's not that we drop support for that old processors. It's just that simutrans whould require *sightly* more memory and I/O bandwith if we turn the id from 16 to 32 bit. This won't break anything. I see also some machines with Windows XP here in Spain/Catalonia, ofc. But they are not really common, and remember even Microsoft has droped support for that OS. It's nothing spectacular, let's accept those machines are obsolete.
Comeon, it's just converting a number from 16-bit to 32-bit, this should not be so dramatic.
If SImutrans stops running at playable speeds for some users, it is certainly dramatic for them. I can't believe that some people need threading to render Simutrans at acceptable speeds. When that is the case, the size if image ids might be just as important. Then there's the issue that Simutrans with SDL is very slow, which SDL2 apparently cures, without any good explaination that I'm aware of. Changing the size of image ids should only be done after someone with a slow system verifies that it works OK.
Quote from: Ters on December 13, 2014, 08:38:12 AM
I can't believe that some people need threading to render Simutrans at acceptable speeds.
That's maybe the root of the discussion. Believe me, it's quite a speedup for many people. And people with old computers like me (Core 2)
Supporting legacy machine is right.
but does not support newer machine is right?
and everybody are using legacy machine?
I guess a create item win9x(and 32bit machine) to nightly(also stable).
now, after all, uncertain when simutrans can load images over 65534?
Current support is obviously pretty critical. However generally current systems are backwards compatible so that is not a problem.
The problem comes down to system libraries and compilers. For example Simutrans uses a highly dated specification of C++ despite G++/Windows compilers/ARM compilers all being available for the latest specifications. The reason I have been told is so that old operating system version (which cannot run such compiler builds) need to be supported.
For the most part raising a 16bit field to 32bit on a 32bit architecture should be minimal even on oldish systems. Performance will be lowered but probably not in a critical way.
We can assume that players using older systems will be running a pak64 sized pakset and using smaller(ish) maps. I wonder what the impact would be to a player using pak64 and a 256x256 map? It may be a very small impact to them, with a huge upside to future Simutrans artists who like to combine images from multiple sources.
Then the next step will be to overcome the 2GB RAM barrier...
Quote from: Manche on December 13, 2014, 03:46:50 PM
Supporting legacy machine is right.
but does not support newer machine is right?
With 16-bit image ids, both old and new machines will still work. Players just can't have as much eye candy. With 32-bit image ids, old machines might not be able to run Simutrans anymore, while those with new machines can stuff the game full of more eye candy than they can ever have time to download.
So it's not about supporting old machines versus supporting new machines, but about whether it is worth risking support for old machines just to allow someone else to install more optional add-ons, most of which only offer greater visual diversity.
Though it is important to note that the two largest paksets are already half of the limit. Add some custom addons and they're easily over 40,000 images.
I think the best question to answer at this point is what the actual impact to an old machine running Simutrans would be. I suspect it's not great.
Intel 386, the first chip (on IBM PC) capable of doing 32-bit arithmetics, dates from 1988. I'm never surprised enough of how attached to the past the code is. We are *almost* in 2015 and we are discussing if we can afford turning a number from 16bit to 32bit.
What's more surpriding this is not the first time our players find this limit an obstacle, and we developers still are ignoring the problem. I can't understand that.
I think the limit isn't affecting the amount of different objects so much as the amount of different images a single object can use.
If pak128.britain had as many objects as pak128, but with rotations and winter and everything used as it is now, I'm pretty sure they would have above 65534 images. While it only affects add-ons right now, it's entirely possible for a well-developed pakset to reach that number as well.
This will increase the size of any structure by 4 bytes (or even 8 bytes) due to padding, i.e. the maps will require 20% more memory. Since iterating through the ground during a season change is one of the biggest "stops" during a network game, this will now require instead of 20% more bandwidth (and since the map is too large to fit into most caches, also 20% more time.
As said, no pakset uses more than 36000 images, i.e. half of it. ANd 95% of the japanese addons are passenger trains i.e. without freight images.
Having said this, if this is an issue for the player, then lets start to increase it. However, just uncreasing the number to 32 bit was not enough.
Quote from: Markohs on December 13, 2014, 08:13:53 PM
Intel 386, the first chip (on IBM PC) capable of doing 32-bit arithmetics, dates from 1988. I'm never surprised enough of how attached to the past the code is. We are *almost* in 2015 and we are discussing if we can afford turning a number from 16bit to 32bit.
This has nothing to do with 32-bit computing. Simutrans has always been a 32-bit program. What we're talking about here is the number of bytes in crucial data structures, or rather how many such data structures can be crammed into caches. Why does Linux use UTF-8, and Windows UTF-16, when UTF-32 makes thing so much simpler? UTF-32 would be the only right thing for a 32-bit computer like 386 and later, right? (There are other reasons why Linux and Windows don't use UTF-32 beyond just saving memory, but it shows that just because the program is running on a 32-bit computer, it doesn't have to use 32-bit data types. Unlike Windows, Linux actually has 32-bit wchar_t, but it's rarely used.)
Anyway, I found a way to keep the size of ground the same while almost all relevant other stuff does not store image_id anymore. So it seems there is little penalty to be paid for a 32 image_id.
Quote from: Ters on December 13, 2014, 09:50:18 PM
This has nothing to do with 32-bit computing. Simutrans has always been a 32-bit program. What we're talking about here is the number of bytes in crucial data structures, or rather how many such data structures can be crammed into caches. Why does Linux use UTF-8, and Windows UTF-16, when UTF-32 makes thing so much simpler? UTF-32 would be the only right thing for a 32-bit computer like 386 and later, right? (There are other reasons why Linux and Windows don't use UTF-32 beyond just saving memory, but it shows that just because the program is running on a 32-bit computer, it doesn't have to use 32-bit data types. Unlike Windows, Linux actually has 32-bit wchar_t, but it's rarely used.)
When you get to the point that a 16 bit is packed into a struct, it probably padded to 32-bit anyway, and making it access 16 bit could end being slower that 32 bit. Also, this change can cause padding changes that actually *reduce* or *not affect* memory bandwith at all, it's up to the compiler. We can't really say that without actually benchmarking it, and it can actually depend on differnt compilers too. What I'm just trying to express in my coment, is really arguing about this, when it's just 2 extra bytes per memory structure, on a structure that's probably already 34 or so bytes big, it's ridiculous, in my oppinion. I can understand trying to keep this structure to a minimum, but reached the point when this *is really affecting some players*, maybe it'0s time to change it. Of course having a 32-bit computer doesn't force you to use 32-bit wide characters nor nothing similar, we know what we are talking about.
You all talk about how we can have users of k6, pentium 4, running windows XP, Windows 95, Haiku, SunOS, or a fricking Xterminal. Ok, that people exists, but how many of our players have a machine and OS from this milenium? How many of them have a 64-bit CPU? I can assure you, that way much more than those who use old computers. And well, nothing is focing them to update to the last version of simutrans. I can't pretend having a computer from the nineties, and still use software made nowadays. Probably those people will have a hard time opening simutrans with pak128 anyway. No?
When you posted about how could this could have something to do with dropping support for pentium 4, or K6, you made a point very similar of the one I made when I cited i386.
Anyway it's good to hear prissi found how to avoid this possible problem, and keep this change to the point it affects performance much less.
QuoteWhen you get to the point that a 16 bit is packed into a struct, it probably padded to 32-bit anyway, and making it access 16 bit could end being slower that 32 bit. Also, this change can cause padding changes that actually *reduce* or *not affect* memory bandwith at all, it's up to the compiler. We can't really say that without actually benchmarking it, and it can actually depend on differnt compilers too. What I'm just trying to express in my coment, is really arguing about this, when it's just 2 extra bytes per memory structure, on a structure that's probably already 34 or so bytes big, it's ridiculous, in my oppinion. I can understand trying to keep this structure to a minimum, but reached the point when this *is really affecting some players*, maybe it'0s time to change it. Of course having a 32-bit computer doesn't force you to use 32-bit wide characters nor nothing similar, we know what we are talking about.
It is all dependent on the minimal atomic size of the architecture. Generally individual bytes and units like 16, 32 and 64 bits can be read and written very efficiently as they are addressable. Padding is used for alignment on some architectures (which can only read words such as some ARM sets). Structs are aligned with a combination of cache and memory model.
It is important to note that modern compilers often have quite a strict memory model to help with multi-threading. Now this can be useful to avoid multi-threading related errors from arising due to the underlying platform, it does result in inefficiency. For example bit fields might be ignored completely so as to guarantee that all members can be manipulated by different threads without a conflict occurring. However if you know such a struct is only going to be manipulated by a single thread at any time both performance and memory can be improved by forcing the struct to pack tightly.
Quote from: Markohs on December 14, 2014, 02:55:50 AM
And well, nothing is focing them to update to the last version of simutrans. I can't pretend having a computer from the nineties, and still use software made nowadays. Probably those people will have a hard time opening simutrans with pak128 anyway.
Thank you for putting it in words. No one
has to get the most recent builds. Players on weaker systems won't be playing on a 5000x4000 map with 200 cities and 200 industries anyway.
If we make a good stable before this implementation and label it available as the last version meant for old systems due to memory consumption, we could set a new point from which processor support is aimed towards.
Quote from: Markohs on December 14, 2014, 02:55:50 AM
When you get to the point that a 16 bit is packed into a struct, it probably padded to 32-bit anyway, and making it access 16 bit could end being slower that 32 bit.
Lots of effort has been put into Simutrans so that fields are packed as tightly as possible in important data structures, to the point that some fields don't even span an entire byte. Such things are inherently slower, yes, but it is my understanding that profiling has shown that the gains for having to transfer less data over the system bus outweighs the cost of a few more instructions.
Quote from: Markohs on December 14, 2014, 02:55:50 AM
You all talk about how we can have users of k6, pentium 4, running windows XP, Windows 95, Haiku, SunOS, or a fricking Xterminal. Ok, that people exists, but how many of our players have a machine and OS from this milenium? How many of them have a 64-bit CPU? I can assure you, that way much more than those who use old computers.
I value one person not being able to play Simutrans at all over a million persons not being able to install 100+ add-ons. One does not need 70000 images to play Simutrans. That is luxury. (Well, Simutrans itself is perhaps luxury, so it's luxury squared.)
Quote from: Markohs on December 14, 2014, 02:55:50 AM
And well, nothing is focing them to update to the last version of simutrans.
Yes there are: Bugfixes. Access to the multiplayer servers. Some other new features might also be of interrest, even if not exactly necessary.
Quote from: prissi on December 13, 2014, 11:44:01 PM
Anyway, I found a way to keep the size of ground the same while almost all relevant other stuff does not store image_id anymore. So it seems there is little penalty to be paid for a 32 image_id.
Well that's some real contribution to this discussion. grund_t is even more critical than anything I was thinking of. Isn't bild_t/bild_besch_t an issue, then? But now that I look at it, I wonder if having the original image data itself inline might be far more detrimental to performance. Although these things are perhaps too scattered in memory anyway for it to make any difference.
If there is an increase in memory consumption then this also decreases the maximum size of maps that can be played. That affects everyone who has developed maps which stretch their computer capabilities - whether that's a 1024x1024 map on a Pentium III or a 4096x4096 map on a more recent computer.
QuoteIf there is an increase in memory consumption then this also decreases the maximum size of maps that can be played. That affects everyone who has developed maps which stretch their computer capabilities - whether that's a 1024x1024 map on a Pentium III or a 4096x4096 map on a more recent computer.
It all depends on by how much. For example JIT2 has added ~ 4 bytes more per input/outpt and possibly 12 bytes more per factory for control logic enums (might be more or less depending on what compiler does). No one complained about that because in a map with a thousand odd industries it might only add a few kilobytes here and there, hardly a major increase in memory.
If this change raises the memory cost of some obscure structures then it will not make much of a difference. The main concern was that it would raise the memory cost of highly bulky structures such as the ground itself by a considerable amount however that seems to not really be the case.
An alternative solution to simply raising it could have been separate image pools depending on the type of an object. It is extremely unlikely someone would use train images as buildings for example. Or ground images for ways as another example. Even with 16 bits doing such a thing would probably over double the amount of images you could use.
Quote from: DrSuperGood on December 14, 2014, 01:22:46 PM
An alternative solution to simply raising it could have been separate image pools depending on the type of an object.
This is exactly what Dwachs proposed the last time this was discussed, and is IMHO the way to go.
Quote from: DrSuperGood on December 14, 2014, 01:22:46 PM
An alternative solution to simply raising it could have been separate image pools depending on the type of an object.
Quote from: TurfIt on December 14, 2014, 04:09:03 PM
This is exactly what Dwachs proposed the last time this was discussed, and is IMHO the way to go.
If prissi has taken care of the most critical part, which he believes, then this sounds like a solution that is about as costly as what is left of the problem.
Quote from: TurfIt on December 14, 2014, 04:09:03 PM
This is exactly what Dwachs proposed the last time this was discussed, and is IMHO the way to go.
I agree, that's the best solution, but it requires a lot more coding, who will implement that? If prissi found an acceptable solution, it's good news.
Also, implementing this kind of big changes in the game use to raise a tedious and long batttle of oppinions, and often ends in nothing. Unless prissi is the one implementing it, in wich case it's just accepted as is.
So if prissi found an acceptable solution, let's just rejoice about this, and be happy our players won't have a 64K image limit, and that's it. :)
EDIT:
Quote from: Ters on December 14, 2014, 09:32:29 AM
I value one person not being able to play Simutrans at all over a million persons not being able to install 100+ add-ons. One does not need 70000 images to play Simutrans. That is luxury. (Well, Simutrans itself is perhaps luxury, so it's luxury squared.)
That's different personal views of the subject you and me have, of course. My oppinion it's just the opposite. Everything has its limits of course, but I think the image limit was too tight.
Quote from: Ters on December 14, 2014, 09:32:29 AM
Yes there are: Bugfixes. Access to the multiplayer servers. Some other new features might also be of interrest, even if not exactly necessary.
Probably that person is already playing on a computer that doesn't have OS bugfixes patches, don't you think? I value compatibility with older systems too, but I think it should not stop our game having new features, or remove limits, *if they are reasonable*. And in my oppinion this was a issue worth it.
In my oppinion keeping memory structures small it's more important in the direction of keep giving our players the possibility of having bigger maps, in acceptable performance, and thinking if this game can some day be on portable devices, with much less hardware than a average PC. I coudn't care less about legacy systems, up to a reasonable point, wich I'd put for example in the Microsoft world, in Windows XP, today.
The presented solution to the image limit of classifying vehicles and buildings as different image types (each with its own 64k limit) makes a lot of sense and should resolve the image limit barrier for many years to come with a marginal impact to performance.
While raw computing speeds seem to be slowing down in their exponential growth rate, the amount of RAM available to players seems to be continuing to expand at an impressive rate. This will lead to more and more players wanting to play with large paksets and massive maps, and may open up the possibility of more players undertaking massive simulation efforts such as Carl's UK map. But it also means that more and more players will bump up against the 2GB memory limit. I have no idea of how easily this can be overcome or whether this limit is hardwired in to the code.
Quote from: Sarlock on December 15, 2014, 01:11:26 AM
The presented solution to the image limit of classifying vehicles and buildings as different image types (each with its own 64k limit) makes a lot of sense and should resolve the image limit barrier for many years to come with a marginal impact to performance.
It will only double the capacity, and one of the two may run full quite quickly, while the other has room to spare. And it requires dragging the different besch-types down into the deepest rendering code where it doesn't really belong design-wise. If proven "safe", just changing the size of image_id will be a lot easier to implement and the problem won't resurface anytime soon.
Quote from: Sarlock on December 15, 2014, 01:11:26 AM
While raw computing speeds seem to be slowing down in their exponential growth rate, the amount of RAM available to players seems to be continuing to expand at an impressive rate. This will lead to more and more players wanting to play with large paksets and massive maps, and may open up the possibility of more players undertaking massive simulation efforts such as Carl's UK map. But it also means that more and more players will bump up against the 2GB memory limit. I have no idea of how easily this can be overcome or whether this limit is hardwired in to the code.
Well, you also need more processing power to play large maps, not just more memory.
As for the 2 GB limit, there are several layers to that. On a normal 32-bit Windows system, there is nothing to do about it. 32-bit Windows executables can be marked as large-address aware, which moves the limit to 3 GB, but only on server and enterprise editions of Windows from what I remember. On a 64-bit Windows, the limit for such large-address aware programs is apparently moved to 4 GB for all editions of Windows. Linux also has the ability to move the 2 GB limit to 3 GB, but this requires compiling the kernel with this option enabled. I don't know if it requires a boot switch as well, like Windows does, or some special flag in the executable. I'm also not sure what the limit is for 32-bit programs on 64-bit Linux. As for Macs and other OSes, I have no idea.
A 64-bit executable will of course blow the 2 GB limit away into some distance far away (in the time-scale of computer evolution), but as has been discussed elsewhere, and even slightly in this discussion, there are several issues, great and small, to overcome if 64-bit is going to handle bigger maps.
QuoteA 64-bit executable will of course blow the 2 GB limit away into some distance far away (in the time-scale of computer evolution), but as has been discussed elsewhere, and even slightly in this discussion, there are several issues, great and small, to overcome if 64-bit is going to handle bigger maps.
EDIT: I was talking nonsense...
The LAA approach seems the most immediately viable solution to raise the memory limit and is what James used with Experimental when the server game ran into the 2GB process limit.
Extended 32-bit version? 32-bit is 32-bit.
QuoteExtended 32-bit version? 32-bit is 32-bit.
I was referring to the LAA build you mentioned which was what Experimental used to up the limit to 4GB. It gets confusing at times due to how much garbage there is on the internet over memory limits. Specifically because most 64bit applications are built as large address aware so they often throw out considerably larger virtual address space limits than a 32bit process built like that. This should buy at least double the memory which is sufficient for most purposes (7000*5000 maps).
There are apparently other ways to bypass the limit while still maintaining 32bit support but these are highly platform dependent so I would not recommend them.
It should be noted that 32bit builds of Linux can allocate up to 64GB of memory. The processes are of course still bound by a 4GB virtual address space maximum (of which the OS might use some).
The only really viable solution is a slow migration towards a 64bit build.
But we can't be running out of memory just yet. No home computer can process 2 GB 20+ times per second.
QuoteBut we can't be running out of memory just yet. No home computer can process 2 GB 20+ times per second.
Not all of it is processed every second. Terrain is mostly static from what I can tell so can be viewed as a bulk data structure which is randomly accessed.
Simutrans Experimental hit the 2GB memory limit with its server game, a massive 7000*5000 tile map with over 10,000 convoys and 1,000 stops. However I do admit that it generally hit that limit when re-loading after a save (so people could join) which might mean that for multiplayer games you can have an impulse memory usage when reloading (not sure by how much however).
Quote from: DrSuperGood on December 15, 2014, 05:53:19 PM
Not all of it is processed every second. Terrain is mostly static from what I can tell so can be viewed as a bulk data structure which is randomly accessed.
Climate change will be a huge toll I guess. Path finding for ships can be somewhat heavy as well. And I assume that as maps grow, there will also be more cities, roads and vehicles. I did also write 2 GB to leave room for rarely accessed data, assuming large address aware. (Switching to 64-bit won't help those who can't enable LAA.) Trees might be a big space-waster in big maps.
Quote(Switching to 64-bit won't help those who can't enable LAA.)
As far as I am aware anyone using a 64bit compatible OS automatically supports LAA due to how 64bit mode operates (they are already using an extended paging mode). The only reason disabling LAA for 64bit builds is supported I guess would be for compatibility (application does not support >2GB memory due to some hacky or poorly written parts such as involving pointers treated as signed instead of unsigned).
Unless climate change is new, the snow grow/recede graphic effect is not that costly (the experimental sever has it on).
Quote from: DrSuperGood on December 15, 2014, 06:33:34 PM
As far as I am aware anyone using a 64bit compatible OS automatically supports LAA due to how 64bit mode operates (they are already using an extended paging mode). The only reason disabling LAA for 64bit builds is supported I guess would be for compatibility (application does not support >2GB memory due to some hacky or poorly written parts such as involving pointers treated as signed instead of unsigned).
True, true, but I was referring to those using 32-bit OSes and either can't move the limit to 3 GB (non-enterprise Windows) or don't know how (reconfiguring kernel, switching to another kernel). Not that switching to 64-bit will help those who can and know how to move the 2 GB limit either, if they are stuck with a 32-bit OS.
If people are stuck using XP I would strongly advise they either upgrade to a more modern Windows version (if they need compatibility) or they consider using Linux or other free OSes. All new OS installs should be 64bit at this day and age. Apple has even dropped 32bit support a while ago. Only really legacy hardware (without 64 bit support) should install 32bit OS.
Got a rather new (although really cheap) laptop with windows 8 in 32 bit. Even Windows 10 is announced with x86 architecture - why would they still create 32bit versions if not for people to use?
QuoteGot a rather new (although really cheap) laptop with windows 8 in 32 bit. Even Windows 10 is announced with x86 architecture - why would they still create 32bit versions if not for people to use?
There are two reasons. Firstly is for 16bit process support since if you are running in 64bit mode the processor cannot run 16bit processes unless another operating is mounted as a virtual machine which runs in 32bit mode. Secondly is for driver support since drivers need to be specifically built to target 64bit systems which some companies may not offer. There is also a minor reason of licencing since they can sell 32bit builds cheaper than 64bit builds as they offer less.
As far as I am aware all reasonably ended AMD and Intel processors support 64bit OSes and have for probably a decade now. Compatibility was the main reason the move was not embraced earlier with XP64 being notorious for having practically no drivers and horrendous 32bit support.
Some very cheap processors are not fully 64 bit able (Atom). But these netbooks are most likely not capable of a 5000x5000 map...
Apparently the switch "--large-address-aware" just sets a flag in header, so I added this to the makefile.
Does anybody know how many games actually require 64-bit? (I hardly buy games, so I don't know.) It would be rather odd for Simutrans to require it, if no or few headline games require it.
Quote from: prissi on December 15, 2014, 09:00:28 PM
Apparently the switch "--large-address-aware" just sets a flag in header, so I added this to the makefile.
That's just what it does. It's up to the OS to decide if it can/will do anything about it. I don't think it hurts, because then 64-bit builds would most likely have problem if Simutrans' large address awareness is a lie.
Apart from setting the flag screws up the linker completely, i.e. it finds no standard libs anymore. Does Mingw get anything right?
Anyway, input on this is warmly encouraged.
I've been running LAA for a few years on MinGW with no issues. Have in config.default:
LDFLAGS = -static-libgcc -static-libstdc++ -Wl,--large-address-aware
Resultant 32bit sim.exe can allocate up to the full 4GB on Win7 x64. And no issues running on 32bit WinXP or 32bit Win7, with a 2GB limit (3GB possible if you muck with the boot settings).
Quote from: DrSuperGood on December 15, 2014, 05:53:19 PM
Not all of it is processed every second. Terrain is mostly static from what I can tell so can be viewed as a bulk data structure which is randomly accessed.
The working set of Simutrans is quite large, it tends to completely blow the caches away; Hence it benefits greatly from faster memory. I see a ~70% speedup changing from DDR3-1333 to DDR3-2400. I expect if you created a map consuming 3GB, then populated it with an extra 1GB of objects (factories, convois, halts, buildings, etc.), that there's not currently any home computer capable of actually running it at an acceptable pace.
Atm all commercial games ship 32 bit versions. Just a reduced few also supply 64 bit versions too. Starcraft 2 or wold of warcraft, are 32 bit. But in practise this games won't run in non 64 bit systems because they lack cpu speed and memory. They might run, but almost unplayable. Simcity 2013 or Dragon Age, are in the same situation.
So, 32 bit builds are the standard, but sub-par processors like pentium 4 or atom or k6 will have a hard time running games made in this last 4 years (maybe even more). Even core 2 duo might struggle to run those games in medium quality
My impression over memory usage is that if all games keep themselves under 2 g, maybe 3 like you mention, simutrans should do the same. This should be done via paging and load on demand, of images and if possible, of parts of the map (this makes not much sense in simutrans, maybe just on trees).
I'd keep simutrans a 32 bit progam as default and optimize it to use the lesser amount of memory. Maybe when 64 bit progams are the de facto standard binary for windows, we can switch to that, and then we'll be in much better shape than now, because we'll use less memory.
I know it might sound strange I favoured changing image id to 32 bits and now I say simutrans should remain 32 bit (with laa) and keep memory usage low. I defend extending image id to 32 because it solves a problem our users have *now*, easily. But at the same time I think the game needs mechanics to keep memory usage low so we can give our players bigger maps and a faster simutrans.
What turfit commented above this is very inyeresting btw. He has profiled our code often.
I'd like to see images on demand loading investigated to see if its feasible and saves enough ram, plus thinking if we can swap some memory structures out when the viewport is not looking at them. Reading turfit, looks like not, unless we make substantial design changes in the game.
QuoteDoes anybody know how many games actually require 64-bit? (I hardly buy games, so I don't know.) It would be rather odd for Simutrans to require it, if no or few headline games require it.
Most modern Blizzard Entertainment games are migrating to 64bit builds.
World of Warcraft recently has had a 64bit build added to it. This was needed due to the size of the MMORPG.
Heroes of the Storm Alpha (soon beta) is available in both 32bit ban 64bit builds. It is only available as 64bit build on Mac OS.
StarCraft II : Legacy of the Void will add a 64bit build (as its from the same engine as Heroes of the Storm)
Other companies are also following this trend.
Galactic Civilizations 3, an upcoming game from Stardock currently in alpha is only available as a 64bit build, they have stated 32bit will never be supported as it would hold the project back.
QuoteAtm all commercial games ship 32 bit versions.
Not anymore. As stated above Blizzard is offering 64bit builds for all their new products. Stardock is now making games only targeting 64bit OSes. Additionally all games built for the Xbox One and Playstation 4 are probably 64bit builds as those consoles use standard AMD processors (which have over 4GB of memory) so can easily be ported as 64bit builds.
Quotegames won't run in non 64 bit systems because they lack cpu speed
CPU speed has nothing to do with it. It is mostly for the memory as it is very easy to hit 2GB when using a lot of high quality assets which are processed by the GPU. For a very long time people ran 32bit OSes on fully 64bit capable hardware purely because XP64 was what can only be described as "PoS". Eventually with Windows 7 and memory often being more than 4GB, they migrated to wide spread commercial sale of 64bit OSes. Mac migrated long ago and is pretty much only offered as 64bit. I am sure many people who use 32bit Linux do so without realising they could actually be using 64bit Linux.
QuoteApart from setting the flag screws up the linker completely, i.e. it finds no standard libs anymore. Does Mingw get anything right?
Anyway, input on this is warmly encouraged.
You need libraries that were built with LAA support. Specifically pthread if I recall (probably because it does a lot of low level stuff as Windows does not inherently support pthread). With pthread for Windows there are two versions, the one the nightly server builds with is "pthreadGC2.lib". You need "pthreadGCE2.lib" which is the LAA version of pthread. The other libraries should not mater as I have no problem using them to build experimental and standard. Do note you will have to change the DLL you bundle with Simutrans for Windows distributions from "pthreadGC2.dll" to "pthreadGCE2.dll".
To recap why Simutrans does not officially support 64bit builds, the topic result was two issues. First was that some key pieces of code use inline x86 assembly for performance which must be disabled for 64bit builds resulting is considerably worse performance (much lower than one would expect). Secondly some pieces of code might be using pointers in an unsafe way (at some stage there was a typecast to int somewhere if I recall) which will mean errors once 4GB is exceeded anyway.
Quote from: DrSuperGood on December 16, 2014, 12:44:13 AM
To recap why Simutrans does not officially support 64bit builds, the topic result was two issues. First was that some key pieces of code use inline x86 assembly for performance which must be disabled for 64bit builds resulting is considerably worse performance (much lower than one would expect). Secondly some pieces of code might be using pointers in an unsafe way (at some stage there was a typecast to int somewhere if I recall) which will mean errors once 4GB is exceeded anyway.
I remember the two issues being the assembly thing and that with 64-bit pointers, performance critical data structures will no longer be cache-line aligned. Data structures will also grow due to extra padding caused by pointers being placed together with other 32-bit fields, aligned on four bytes, but not on eight. (Similar to our original issue here, just way more widespread.) Your second issue is of course also critical, if such things still happen in the code. If it does, 64-bit builds live dangerously, and can only work because memory allocations apparently avoid higher virtual addresses until necessary.
As for the assembly, I haven't been convinced that letting the compiler do it's own thing will be slower than the current hand-written assembly, once the compiler is allowed to make use of MMX and SSE (or even just 64-bit registers). An x64 build, unlike a 32-bit build, can rely on these ex-extensions being available. I had no problems that a driver update didn't solve on my first 64-bit computer 6+ years ago. Then again, it wasn't the biggest of maps, and the screen was pre-HD.
Quote from: DrSuperGood on December 16, 2014, 12:44:13 AM
Not anymore. As stated above Blizzard is offering 64bit builds for all their new products. Stardock is now making games only targeting 64bit OSes. Additionally all games built for the Xbox One and Playstation 4 are probably 64bit builds as those consoles use standard AMD processors (which have over 4GB of memory) so can easily be ported as 64bit builds.
Yes, DrSuperGood, but they are 64 bit optional builds, they keep supporting 32-bit. And I guess that will still be the target for some years. I'm pretty sure their main concerns are the 32-bit builds, that's what the bigger part of their players, use.
Supporting 64 bits as we do, as additional builds, is the way to go, it's not time to move to 64, yet. It's better to optimize out builds thinking in 32 bits memory scheme, and keep us under 3-2 Gb.
Quote from: DrSuperGood on December 16, 2014, 12:44:13 AM
CPU speed has nothing to do with it. It is mostly for the memory as it is very easy to hit 2GB when using a lot of high quality assets which are processed by the GPU. For a very long time people ran 32bit OSes on fully 64bit capable hardware purely because XP64 was what can only be described as "PoS".
Well, if we are thinking in a 4Gb memory PC, it's still not time to move to 64 bit games, since that extra 1Gb memory it's probably devoted to disk cache, and multitasking background programs (web browsers, maybe background music, and the ability to alt-tab in a reasonable time). Whan I refered to CPU power related to 64 bits support it's because the more time we advance in the processors business timeline, they added 64bit support and increased CPU throughtput, vía faster clock rates, faster memories requirements (DDR2), and other techincal advances. All commercial games keep supplying 32-bit versions, and I'm pretty sure the main concerns of all develoopers, are with the performance of those versions, the 64 bit build it's just an extra, to prepare for the future. *now*. Maybe in 2-4 years we'll be in the scenario you are refering to, with 64-bit builds being the way to go.
Quote
To recap why Simutrans does not officially support 64bit builds, the topic result was two issues. First was that some key pieces of code use inline x86 assembly for performance which must be disabled for 64bit builds resulting is considerably worse performance (much lower than one would expect). Secondly some pieces of code might be using pointers in an unsafe way (at some stage there was a typecast to int somewhere if I recall) which will mean errors once 4GB is exceeded anyway.
Well, but that assembly can be rewritten to 64-bit equivalent no? I suspect the poorer performance has also much to do on the extra memory bandwith required to move those memory pointers, that turned 64 bits, with the increase in certain memory structures too.
Anyway the only way of really solving this, is using hardware acceleration on the display, that's quite a ingent amount of work that needs to be devoted to it, including a rework of how simutrans manages its inner data structures.
Quote from: Ters on December 16, 2014, 05:56:51 AM
As for the assembly, I haven't been convinced that letting the compiler do it's own thing will be slower than the current hand-written assembly, once the compiler is allowed to make use of MMX and SSE (or even just 64-bit registers). An x64 build, unlike a 32-bit build, can rely on these ex-extensions being available. I had no problems that a driver update didn't solve on my first 64-bit computer 6+ years ago. Then again, it wasn't the biggest of maps, and the screen was pre-HD.
I have to agree with Ters here, I believe the compiler will generate even better code than hand-written assembly most of the times, if it's given enough information, and not forcing what we think they are optimizations with our code. Compiler will do better than a human most of times. Even it looks like the assembler lines in simgraph are quite a significant performance boost, giving benchmarks, I can't deny that.
Supporting both 32-bit and 64-bit can be very difficult. Optimizing for one might be mutually exclusive with optimizing for the other.
Quote
Well, but that assembly can be rewritten to 64-bit equivalent no? I suspect the poorer performance has also much to do on the extra memory bandwith required to move those memory pointers, that turned 64 bits, with the increase in certain memory structures too.
Anyway the only way of really solving this, is using hardware acceleration on the display, that's quite a ingent amount of work that needs to be devoted to it, including a rework of how simutrans manages its inner data structures.
I looked at the assembly yesterday and came to the following conclusions. Unless you build using GCC it is always turned off since the structure used is not supported by Visual C++. Porting it to support Visual C++ should be pretty easy as it has its own inline assembly structure (which is slightly more readable as it is not processed like a string). Visual C++ ignores all inline assembly when building for x64 target (64bit build) so it would need to be disabled with such target anyway. Assembly can still be used in 64bit builds but you need to link it in as opposed to compile with it. I am not sure what GCC does to the assembly when building for 64bit.
QuoteI have to agree with Ters here, I believe the compiler will generate even better code than hand-written assembly most of the times, if it's given enough information, and not forcing what we think they are optimizations with our code. Compiler will do better than a human most of times. Even it looks like the assembler lines in simgraph are quite a significant performance boost, giving benchmarks, I can't deny that.
It all depends if the compiler can see past some of the explicit behaviour shown and do the implicit behaviour. I am trying to test the performance so I can compare between 32 and 64 bit builds to see where the performance improves/worsens.
Quote from: DrSuperGood on December 16, 2014, 04:55:53 PM
I looked at the assembly yesterday and came to the following conclusions. Unless you build using GCC it is always turned off since the structure used is not supported by Visual C++.
I think that is pretty much the reason why releases for all platforms are made using GCC, even though most Windows developers seem to have Visual C++ now. (I don't have Visual Studio, but I do have the Windows SDK, which includes the entire compiler toolchain.) Unless being able to depend only on msvcrt.dll, which comes bundled with Windows, is the main reason, because that is actually a bad thing to do.
The VisualC++ project was badly out of date. With a bit of messing around I have finally got SDL2.0 builds working (which are not even offered as nightlies).
The compiler reported an error with "sdl_sound.cc". Specifically it makes reference to the function "printf" yet fails to "#include <stdio.h>" anywhere so the function is never declared.
SDL was never ever build with MSVC, only GDI.
ANd yesterday -Wl,--large-address-aware did not worked, but today it does?!? Anyway, comitted.
QuoteSDL was never ever build with MSVC, only GDI.
Yes however I have
SLD2 building and working apparently fine. As far as I am aware no build is offered for SDL2 for Windows ATM (only SDL aka SDL1).
I might upload the project files later since they could be useful to people wanting to build with MSVC.
Quote from: DrSuperGood on December 16, 2014, 10:26:34 PM
The compiler reported an error with "sdl_sound.cc". Specifically it makes reference to the function "printf" yet fails to "#include <stdio.h>" anywhere so the function is never declared.
stdio.h is one of those headers that almost always gets dragged in by some other header. Which headers pull in others varies from vendor to vendor, or even version to version.
Quote from: DrSuperGood on December 16, 2014, 10:26:34 PM
With a bit of messing around I have finally got SDL2.0 builds working (which are not even offered as nightlies).
I think Mac nightlies use SDL 2.0. That's pretty much the entire reason there is SDL 2.0 support. On everything else, SDL 1.2 is just as fast as SDL 2.0, or even slightly faster and/or with lesser requirements. SDL 1.2 is also well-known and tested.
Quotestdio.h is one of those headers that almost always gets dragged in by some other header. Which headers pull in others varies from vendor to vendor, or even version to version.
It is still good practice to include things you use where you use them for this reason. You do not want to couple your code to a header you have no control over (as must be happening here).
QuoteI think Mac nightlies use SDL 2.0. That's pretty much the entire reason there is SDL 2.0 support. On everything else, SDL 1.2 is just as fast as SDL 2.0, or even slightly faster and/or with lesser requirements. SDL 1.2 is also well-known and tested.
I am getting similar performance for both GDI and SDL 2.0 as far as frame updates goes which is surprisingly higher than the current release + nightly versions for Windows. There is some issue with SDL 2.0 however that causes maps to load extremely slowly (about 3-5 times longer to load a map compared with GDI). I still need to get a SDL1 build going for comparisons.
EDIT:
It also appears that a 64bit build using GDI achieves higher frame rates than the standard Windows distribution (20-23 range in what is 15-17 with standard distribution). This is most strange...
Quote from: DrSuperGood on December 17, 2014, 01:56:14 PM
It is still good practice to include things you use where you use them for this reason. You do not want to couple your code to a header you have no control over (as must be happening here).
As for me personally, I don't remember which C header provides what (beyond printf and FILE-stuff being in stdio.h), so I just try to use the function, and if it isn't defined work, I have to google what to include. I find the STL-headers somewhat easier to remember, or deduce.
Quote from: DrSuperGood on December 17, 2014, 01:56:14 PM
It also appears that a 64bit build using GDI achieves higher frame rates than the standard Windows distribution (20-23 range in what is 15-17 with standard distribution). This is most strange...
It could be because the compiler can use instructions moving bigger chunks of data when targetting x86-64. Maybe the newer C runtimes are faster than the old msvcrt.dll as well. Standard distributions also have a bit more debugging in them, which might have some effect. Microsoft puts much of the debug information in a separate file.
Here is a patch. It changes image-ids to 32bit, except for the ground images, which are per construction at the lower end of the spectrum. Not much memory wasted for my 64bit build.
Here are the sizes of structures suspected to increase in size (64bit builds).
After patch: curiously vehikel_t and stadtauto_t do not have increased size - maybe there was already enough padding space wasted.
Message: Debug: size of structures
Message: sizes: weg_t: 48
Message: sizes: stadtauto_t: 88
Message: sizes: grund_t: 32
Message: sizes: vehikel_t: 112
Before patch:
Message: Debug: size of structures
Message: sizes: weg_t: 40
Message: sizes: stadtauto_t: 88
Message: sizes: grund_t: 32
Message: sizes: vehikel_t: 112
QuoteAs for me personally, I don't remember which C header provides what (beyond printf and FILE-stuff being in stdio.h), so I just try to use the function, and if it isn't defined work, I have to google what to include. I find the STL-headers somewhat easier to remember, or deduce.
Yeh it can get annoying at times.
QuoteIt could be because the compiler can use instructions moving bigger chunks of data when targetting x86-64. Maybe the newer C runtimes are faster than the old msvcrt.dll as well. Standard distributions also have a bit more debugging in them, which might have some effect. Microsoft puts much of the debug information in a separate file.
It might also have to do with SSE2 instructions being enabled by default on the more recent versions of the MSVC compilers. I am not sure of the standard GCC builds have them disabled for compatibility with old processors. In any case the performance difference I am noticing in regards to frame rate between the standard GCC build and self-made MSVC build when running a well developed pak64 map at maximum view distance (you can see 100s of convoys) is quite substantial. The performance difference between x86 and x64 builds in this case is more trivial.
QuoteMessage: Debug: size of structures
Message: sizes: weg_t: 40
Message: sizes: stadtauto_t: 88
Message: sizes: grund_t: 32
Message: sizes: vehikel_t: 112
The amount of memory spent on convoys, factories etc is trivial. A simple 256*256 map (small for some servers) has 65,536 instances of grund_t however might max out at only 1,000-5,000 convoys (if lucky). Ways are also very important since you can easily accumulate tens of thousands of tiles of way in a well developed map. This is how JIT2 could get away with increasing the size of factories without anyone even noticing lol.
What really needs to be done is look at what the difference is between performance of x86 and x64 builds and try and improve it where possible. It could well be the case that the only non-trivial difference occurs in very massive maps in which case there really is no reason the project should remain officially x86 only.
Quote from: Dwachs on December 17, 2014, 05:36:25 PM
curiously vehikel_t and stadtauto_t do not have increased size - maybe there was already enough padding space wasted.
vehikel_t doesn't increase in size because there is a 32-bit field (speed_limit) following the image_id (bild in vehikel_basis_t). There is therefore, by default, 2 bytes of padding between bild and speed_limit. Changing image_id to 32-bit just fills this padding, as you suspected. It's the same with stadtauto_t, except that the 32-bit field is weg_next.
As for grund_t, summing up the sizes of the fields brings me to 22, not 32. If my calculations are correct, that should mean there are two free bytes available up to a size of 24 (requires some field reordering), except for brueckenboden_t and maybe wasser_t (depends on how large the compiler treats an enum). Was it this prissi saw, or the image_id16 trick? (22 or 24 bytes in not cache line aligned, which I thought grund_t was. 32 is right for this.)
Quote from: DrSuperGood on December 17, 2014, 05:52:13 PM
It might also have to do with SSE2 instructions being enabled by default on the more recent versions of the MSVC compilers. I am not sure of the standard GCC builds have them disabled for compatibility with old processors.
SSE2 was among the things I meant with "instructions moving bigger chunks". I was originally more specific in what I wrote, but changed it. GCC has an option for specifying target CPU, plus individual options for MMX, SSE, SSE2 and so on. It's not 100% clear if the former option controls the defaults for the latter, but I think they do. Target CPU for Simutrans nightlies is Pentium II or III (might be Pentium II because wernieman became tired of going tiny steps down to figure how to make the nightly run on someone's machine, each step taking a day to verify), so no SSE for sure there.
Quote from: Ters on December 17, 2014, 07:27:47 PM
As for grund_t, summing up the sizes of the fields brings me to 22, not 32. If my calculations are correct, that should mean there are two free bytes available up to a size of 24 (requires some field reordering), except for brueckenboden_t and maybe wasser_t (depends on how large the compiler treats an enum). Was it this prissi saw, or the image_id16 trick? (22 or 24 bytes in not cache line aligned, which I thought grund_t was. 32 is right for this.)
The numbers I posted were for a 64bit build. There is a pointer in dataobj that throws the padding off if being 8byte. Also the pointer to the virtual table is 8 bytes.
The images in ground can be all over the range, even addons like tunnels with their own ground could throw this off or you could overlay a foundation. I started a patch to use a second lookup table. But then it went quickly overboard, so for now I just increased the image_id.
Moreover, the images are generated after all other images are loaded, so they most likely do not end up in the 65535 range.
MSVC never does packed structures. You can enable it, but then you loose the ability to load files via stdio, since the FILE structure (and many more) get also packed. I gave up on packing structures with MSVC.
The grund is returned from freelist. In principle, one can add more freelist to have a 2 byte granularity, if the compiler supports it.
QuoteMSVC never does packed structures. You can enable it, but then you loose the ability to load files via stdio, since the FILE structure (and many more) get also packed. I gave up on packing structures with MSVC.
The issue is that technically packed structs are kind of optional in this day and age. Although the C/C++ standard does mentioned that they have to compile (be supported by the language), it does not explicitly define their physical behaviour and memory model (only their logical behaviour is defined which the compiler still obeys) instead leaving it up to the compiler developers to decide. Since they are inherently incompatible with synchronization and have their own performance penalties Microsoft decided to ignore them. I was actually quite surprised to find that GCC does obey them a while back.
Quote from: prissi on December 17, 2014, 10:15:19 PM
MSVC never does packed structures. You can enable it, but then you loose the ability to load files via stdio, since the FILE structure (and many more) get also packed. I gave up on packing structures with MSVC.
#pragma pack allows fine-grained controll over structure packing. It's so widely used that GCC actually understands this Microsoft extension in addition to it's own packing directives.
Quote from: prissi on December 17, 2014, 10:15:19 PM
The grund is returned from freelist. In principle, one can add more freelist to have a 2 byte granularity, if the compiler supports it.
If you're referring to the two free bytes at the end of grund_t, then you might save memory that way, but I think the two bytes going from 22 to 24 would still be cheaper than going from 24 to 26, as the latter would start using a new native machine word (what x86 for legacy reasons calls a double-word). This is however just speculation.
Quote#pragma pack allows fine-grained controll over structure packing. It's so widely used that GCC actually understands this Microsoft extension in addition to it's own packing directives.
Hmmm I might try to look into applying this to some of the objects. The memory saving could raise performance further for MSCV builds.
I can confirm there is a noticeable performance penalty for using x64 builds. In my test map this was about 2-3 fps. The map was very well developed but pretty small so the penalty could increase with larger maps.
I have been trying to benchmark the game to discover where the bottlenecks are occurring (what works faster and slower in x64) however I cannot get the instrumentation to attach. I am guessing its because all the dependant libraries require being built with instrumentation which is a lot of work and annoying.
Quote from: DrSuperGood on December 18, 2014, 01:48:00 PM
Hmmm I might try to look into applying this to some of the objects. The memory saving could raise performance further for MSCV builds.
Most objects in Simutrans have fields that pack perfectly already, as long as pointers are 32-bit. Forcing packing otherwise will lead to lots of misaligned reads, which might be worse than not packing. 32- and 64-bit builds might require different arrangements of fields to be optimal, although solutions that suit both might also be possible. The latter depends somewhat on how important it is to keep related fields close together.
With MSVC using packed structures requires that any pointer to such strcuture (like a planquadrat_t is declared as __unaligned planquadrat_t *p;)
EDIT: However, it compiles normally with the patch. Maybe this is from ARM and MIPS support times of MSVC. (And save 5% of memory).
Benchmark results: 32bit GCC build.
| |r7427 - 16bit| | |r7428 - 32bit| | |r7427 w/ Dwachs 32bit patch| |
sizes: | | | |
koord | 4 | 4 | 4 |
koord3d | 5 | 5 | 5 |
ribi_t::ribi | 1 | 1 | 1 |
| | | |
karte_t | 6408 | 6408 | 6408 |
planquadrat_t | 12 | 12 | 12 |
grund_t | 24 | 24 | 24 |
boden_t | 24 | 24 | 24 |
wasser_t | 24 | 28 | 24 |
| | | |
obj_t | 12 | 12 | 12 |
baum_t | 16 | 16 | 16 |
gebaeude_t | 32 | 32 | 32 |
senke_t | 52 | 56 | 56 |
stadtauto_t | 64 | 68 | 68 |
roadsign_t | 32 | 36 | 36 |
wolke_t | 20 | 20 | 20 |
movingobj_t | 44 | 48 | 48 |
fussaenger_t | 40 | 44 | 44 |
weg_t | 32 | 36 | 36 |
| | | |
convoihandle_t | 2 | 2 | 2 |
convoi_t | 1016 | 1016 | 1016 |
convoi_sync_t | 104 | 104 | 104 |
vehikelbasis_t | 28 | 32 | 32 |
vehikel_t | 84 | 88 | 88 |
automobil_t | 84 | 88 | 88 |
waggon_t | 84 | 88 | 88 |
monorail_waggon_t | 84 | 88 | 88 |
maglev_waggon_t | 84 | 88 | 88 |
narrowgauge_waggon_t | 84 | 88 | 88 |
schiff_t | 84 | 88 | 88 |
aircraft_t | 120 | 128 | 128 |
| | | |
fabrik_t | 1568 | 1568 | 1568 |
ware_t | 12 | 12 | 12 |
| | | |
haltestelle_t | 896 | 896 | 896 |
halthandle_t | 2 | 2 | 2 |
route_t | 12 | 12 | 12 |
linehandle_t | 2 | 2 | 2 |
schedule_t | 12 | 12 | 12 |
| | | |
spieler_t | 348 | 348 | 348 |
finance_t | 116120 | 116120 | 116120 |
| | | |
timings: (usec/ss,step,or frame) | | | |
sync_step() sync | 498 | 504 | 500 |
step() | 404 | 405 | 405 |
sync_step() display | 26972 | 26789 | 26898 |
Structures larger. Simulation marginally slower. Screen display marginally faster :o. Overall insignificant difference.
sync_step() timings per object type show citycars and people taking the biggest hit. When I shrunk stadtauto_t from 72 to 64 bytes there was a ~25% speedup. This change putting it back to 68 bytes is showing ~15% slower again. Obviously these results are from some unreleased work rearranging these structures - trunk is larger and slower...
Quote from: prissi on December 18, 2014, 10:45:55 PM
With MSVC using packed structures requires that any pointer to such strcuture (like a planquadrat_t is declared as __unaligned planquadrat_t *p;)
From what I can read out of MSDN __unaligned means the the structure itself isn't aligned. When I think of packed structures, it's the fields inside the structure that is packed. The structure as a whole would remain aligned. If there is a structure packed inside another structure, pointers to that must however be marked __unaligned. The freelist might need some adapting to operate on multiples of 8 instead of 4 for 64-bit builds.
Rearranging the fields to avoid wasting space on padding should be a far better idea than force-packing. Unaligned access would cause anything for mild (typically x86) to severe (mostly everything else, including x86 in some cases) performance implications. Manual packing has been done successfully for Simutrans so far from what I've read, including TurfIt's post right above.
Quote from: TurfIt on December 19, 2014, 01:43:55 AM
Structures larger. Simulation marginally slower. Screen display marginally faster :o. Overall insignificant difference.
sync_step() timings per object type show citycars and people taking the biggest hit. When I shrunk stadtauto_t from 72 to 64 bytes there was a ~25% speedup. This change putting it back to 68 bytes is showing ~15% slower again. Obviously these results are from some unreleased work rearranging these structures - trunk is larger and slower...
Apart from wasser_t, it doesn't seem like Dwachs' image_id16 gains anything should it actually work. There might be a lot of water, though. On the other hand, I would expect this to affect rendering most. The differences are so small that they could be nothing but noise from other processes on the machine for all I know. I also don't know if this is a low, middle or high end computer. If it's in the higher end, the question of how low end computers are affected remains.
I think, we can forget my patch. prissi's brute force 32bit approach did not increase grund_t for my 64 bit builds. Nothiing to gain by going to 16bit image id's.
@TurftIt: can you post your patch for rearranging data structures?
btw, the 65534 image limit is still in simgraph16.cc ;)
I've done some more experiments, and see now that the size of a structure is rounded up to a multiple of the size of the biggest primitive field. Earlier, this didn't seem to be the case when I did a quick test. Must have been a stale file somewhere.
I also see from what has been committed that prissi seems to be thinking of the size of structures that are embedded in other structures, and that packing will indeed not only affect the padding between fields, but also at the end of the structure. My comments about __unaligned should still be valid, though, as long as we ensure that nested structures are properly aligned in the composite structure. Things only get problematic once these packed structures are also used by themselves in arrays.
pragma pack(1) works fine on Intel; but for instance a 16 bit word at an uneven address on Motorola or Arm will cause a bus error resp. address error under some processors/modes.
Anyway, since this has gone in, maybe under incorporated for now.
Quote from: prissi on December 19, 2014, 11:38:54 PM
pragma pack(1) works fine on Intel; but for instance a 16 bit word at an uneven address on Motorola or Arm will cause a bus error resp. address error under some processors/modes.
I just thought the compiler could remember itself that the structure has been packed without care for alignment. Furthermore, Simutrans should not put 16-bit words on uneven addresses. I figured the purpose of packing koord3d was to make its size 5 bytes rather than 6, so that a structure containing a koord3d, an uint8 and an sint16 (in that order) would be 8 bytes, not 10. Yet nothing in this structure would not be properly aligned, nor need forced packing to remove empty spaces. A packed struct of an uint8, and sint16 and a koord3d would however normally be bad, whether packed or not.
Those settings will be in the next version pak128 that will come out ?? I'm having this problem now. :P
QuoteThose settings will be in the next version pak128 that will come out ?? I'm having this problem now. :P
This is a simutrans problem and not one of its paksets that it uses. You can try downloading the nightly builds of simutrans now and testing if the problem has been fixed.
I see no noticeable performance drop on my Intel Pentium E2200 dual-core 64-bit Linux machine. On it, Simutrans is compiled as 64-bit, which means alignment is sub-optimal anyway. GCC is however allowed to use the full instruction set available on such a basic x86-64 CPU, which includes SSE3. It only has a 1280x1024 screen, which means Simutrans only deals with 1280x915 pixels. Yet when following a train through dense forest at normal zoom level in a well developed 1024x1024 pak64 map, things still move along at 25 FPS, almost 5 simloops, with idle time to spare and only about 50% utilization of a single core (no multi-threading).
I have applied the commits from the SVN to Experimental: I have been worried for some time about Pak128.Britain-Ex reaching the limit, as it has so many vehicles, many with multiple livery variants. Do the changes on the SVN actually extend the limit? That is not entirely clear from reading this thread. I see that Dwachs' patch goes further than that, but he later points out that the limit is still in simgraph16.cc, and his patch only changes an error message in that file.
I'm not sure what Dwachs was referring to. There are however several functions in simgraph16.cc, or perhaps rather simgraph.h, that do not use image_id as the type for image numbers. unsigned should however be a 32-bit type.
So, is the 65534 image limit now gone in Standard (in the latest SVN/nightlies)?
Quote from: jamespetts on December 25, 2014, 10:46:06 PM
So, is the 65534 image limit now gone in Standard (in the latest SVN/nightlies)?
This is in Incorporated patches, is it not? Some bugs might linger, though. I am also not sure anyone has tried loading more than 65534 images yet.
Ahh, I was unsure whether the patch incorporated actually had that effect given Dwachs' comments.
If people encounter any bugs they can be fixed there and then. For now it seems stable so any progress towards removing the limit is better than none.