News:

Want to praise Simutrans?
Your feedback is important for us ;D.

[Closed] New Simutrans-Experimental server (bridgewater-brunel.me.uk)

Started by jamespetts, January 14, 2012, 09:03:19 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

jamespetts

AEO,

it might be frequency, but it won't be comfort, since comfort is not taken into account in routing (it would be too computationally expensive to do so). It is waiting time that makes frequency affect passenger routing choices.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Hmm - by my testing this evening, it looks as though we might have a simrand desync issue connected to private cars, which has not shown up previously, as there were hitherto very few private cars; the desyncs that we have been getting are now confirmed as simrand desyncs (rather than the server/client timing desyncs that we were getting in the early days), and connecting locally to a game (Sdog's 173) in which private cars are in abundance causes a near instantaneous simrand desync, whereas connecting to an archive of the server map from 1857 is stable.

This will take some investigating. If anybody in the meantime would like to look into the code to find the source of this problem, I should be extremely grateful. A pointer for those new to the network code: simrand desyncs occur when the random number generator (simrand - which is actually deterministic for any given seed, which seed is shared between the server and clients) is called a different number of times on the client and the server in the same number of steps. This, in turn, is caused by the code path that the program follows diverging somewhere on the client and server - normally, they are supposed to run exactly in parallel (apart from the GUI).

Deviations in the code path are caused principally by two different sorts of things: (1) variables not being loaded/saved properly (so that a value saved on the server is not transmitted to the client, or if it is lost or improperly adjusted after being transmitted to the client such that the variable is different on the client and server ultimately leading to a code path deviation); and (2) undefined behaviour in the code (reading beyond the end of an array, etc.) causing truly random rather than the desired controlled psudo-random behaviour. This sort of bug is notoriously difficult to find, because it is very difficult to pin down where it happens.

In this instance, I have confirmed that it appears in versions of Simutrans-Experimental between the current 10.x branch and 10.8 inclusive. The code for private car journeys, where I very strongly suspect that the problem lies, is found in simcity.cc. The code for actually moving the graphics on the map is elsewhere, but I suspect at present that this element is not problematic.

To test network server/client connectivity (which works well for checking simrand desyncs), run both a server and client on your local machine, and connect to the loopback interface (127.0.0.1) in the network connect dialogue.

I shall work on this (as a priority) when I get time, but if anyone were able to narrow it down, I should be very grateful. In the meantime, apologies most sincere for the disruption to your games.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

Quote from: jamespetts on February 09, 2012, 12:13:47 AM
AEO,

it might be frequency, but it won't be comfort, since comfort is not taken into account in routing (it would be too computationally expensive to do so). It is waiting time that makes frequency affect passenger routing choices.
on closer observation, what seems to be happening is the express trains are too frequent and they do not wait for passengers to board before they turn around. The branch line on the other hand waits for a long time, which allows passengers to board it.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

prissi

Are you still using pointerhastables for sync_step or in any other location? Then this will almost inevitely lead to fast desyncs.

Dwachs

Quote from: jamespetts on February 08, 2012, 11:04:50 PM
Might I ask with respect to the server_frames_per_step setting - which is more conservative: a higher or lower number?
You mean server_sync_steps_between_checks ? The lower the number the more often the server sends sync information and the earlier a desync will be detected.
Parsley, sage, rosemary, and maggikraut.

jamespetts

A brief update on the desync issues: on a local test server, with the saved game loaded automatically when Simutrans-Expeimental starts, connecting to it does not desync. However, when loading a game manually, including that very same game re-saved manually, it desyncs almost instantly on connexion. This suggests to me an error in the load/save routine somewhere.

To answer Prissi's question - I vanquished the ptrhashtables from sync steps a long time ago, so it's not that, alas. Dwachs - no, I know what server_frames_between_checks is (I presume that that is what you meant by "server_sync_steps_between_checks"?); I was asking about this parameter:


# In network mode, there will be a fixed number of screen updates before a step.
# Reasonable values should result in 2-5 steps per second.
server_frames_per_step = 4
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

#216
Right, I think that I have narrowed down the problem: it appears to have arisen first in version 10.8, and seems associated with the fix for the quickstone errors. It occurs only in saved games which, when loaded with version 10.7 or earlier, fail with quickstone errors. Any game that 10.7 can load without quickstone errors will, from my testing so far, work without desyncing, even when loaded in later versions.

I wonder, therefore, whether I have not fixed this issue fully. I think that it was Dwachs who spotted the initial problem - do you have any thoughts as to what I might have missed in fixing this that might cause desyncs?

Edit: I am suspecting that this problem might be related to this crash: this is caused when, in the following code (which is largely the same as in Standard, apart from layout and the Experimental file version):


// will restore halthandle_t after loading
    if(file->get_version() > 110005)
    {
        if(file->is_saving())
        {
            uint16 halt_id = self.is_bound() ? self.get_id() : 0;
            file->rdwr_short(halt_id);
        }
        else
        {
            uint16 halt_id;
            file->rdwr_short(halt_id);
            self.set_id(halt_id);
            if(file->get_experimental_version() >= 10 || file->get_experimental_version() == 0)
            {
                self = halthandle_t(this, halt_id);
            }
            else
            {
                self = halthandle_t(this);
            }
        }
    }
    else
    {
        if (file->is_loading())
        {
            self = halthandle_t(this);
        }
    }


self = halthandle_t(this, halt_id); is called when halt_id == 0. The problem, then, seems to come from the fact that sometimes halts are saved with a zero id (which is explicitly permitted by the saving code: uint16 halt_id = self.is_bound() ? self.get_id() : 0;.

It would seem sensible for rdwr not to be called on a halt whose "self" halthandle is not bound; indeed, something has probably gone wrong to cause the "self" handle to be unbound in the first place (any thoughts on what that might be?), but perhaps the save routine should not do something that will inevitably cause a crash on loading in this case?

Any thoughts on this development much appreciated!

Edit 2: I wonder whether this problem might be related to this code to save waiting times:


for(short i = 0; i < max_catg_count_file; i ++)
        {
            if(file->is_saving())
            {
                uint16 halts_count;
                halts_count = waiting_times[i].get_count();
                file->rdwr_short(halts_count);
           
                inthashtable_iterator_tpl<uint16, waiting_time_set > iter(waiting_times[i]);

                halthandle_t halt;
                while(iter.next())
                {
                    uint16 id = iter.get_current_key();

                    if(file->get_experimental_version() >= 10)
                    {
                        file->rdwr_short(id);
                    }
                    else
                    {
                        halt.set_id(id);
                        koord save_koord = koord::invalid;
                        if(halt.is_bound())
                        {
                            save_koord = halt->get_basis_pos();
                        }
                        save_koord.rdwr(file);
                    }
                   
                    uint8 waiting_time_count = iter.get_current_value().times.get_count();
                    file->rdwr_byte(waiting_time_count);
                    ITERATE(iter.get_current_value().times,i)
                    {
                        // Store each waiting time
                        uint16 current_time = iter.access_current_value().times.get_element(i);
                        file->rdwr_short(current_time);
                    }

                    if(file->get_experimental_version() >= 9)
                    {
                        waiting_time_set wt = iter.get_current_value();
                        file->rdwr_byte(wt.month);
                    }
                }
                halt.set_id(0);
            }

            else
            {
                uint16 halts_count;
                file->rdwr_short(halts_count);
                halthandle_t halt;
                for(uint16 k = 0; k < halts_count; k ++)
                {
                    if(file->get_experimental_version() >= 10)
                    {
                        uint16 id;
                        file->rdwr_short(id);
                        halt.set_id(id);
                    }
                    else
                    {
                        koord halt_position;
                        halt_position.rdwr(file);
                        halt = welt->get_halt_koord_index(halt_position);
                    }   

                    if(halt.is_bound())
                    {
                        fixed_list_tpl<uint16, 16> list;
                        uint8 month;
                        waiting_time_set set;
                        uint8 waiting_time_count;
                        file->rdwr_byte(waiting_time_count);
                        for(uint8 j = 0; j < waiting_time_count; j ++)
                        {
                            uint16 current_time;
                            file->rdwr_short(current_time);
                            list.add_to_tail(current_time);
                        }
                        if(file->get_experimental_version() >= 9)
                        {
                            file->rdwr_byte(month);
                        }
                        else
                        {
                            month = 0;
                        }
                        set.month = month;
                        set.times = list;
                        waiting_times[i].put(halt.get_id(), set);
                    }
                    else
                    {
                        // The list was not properly saved.
                        uint8 waiting_time_count;
                        file->rdwr_byte(waiting_time_count);
                        for(uint8 j = 0; j < waiting_time_count; j ++)
                        {
                            uint16 current_time;
                            file->rdwr_short(current_time);
                        }
                       
                        if(file->get_experimental_version() >= 9)
                        {
                            uint8 month;
                            file->rdwr_byte(month);
                        }
                    }
                }
                halt.set_id(0);
            }
        }


This creates halthandles and assigns IDs to them at loading/saving time. When I comment out the code for if the Experimental version is greater than 10, the game will fail on loading with a quickstone error in the self = halthandle_t(halt_id, this) line complaining that number in halt_id has already been assigned. However, it will not always do this: if the game is loaded paused then saved without being unpaused, it works fine. Crucially, when I do this, I also do not get desyncs. Thoughts on the relevance of this would be appreciated, too.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Dwachs

Your exceprt regarding saving of waiting time seems to be unrelated to the quickstone errors. If you only access the id's of a halthandle, you only change an index into the pointer table hold by the quickstone. Only the constructors 'self = halthandle(...)' will touch the pointer table, too.
Parsley, sage, rosemary, and maggikraut.

jamespetts

So creating a new halthandle object then assigning an ID to it does not affect the pointer table (which is, I assume, where a list of all the IDs is kept such that the program knows if one tries to assign the same ID twice)?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Dwachs

It depends on the constructor. This code

halt = halthandle_t();
halt.set_id(23908);

would not do any harm to the pointer table and would not contribute to the fatal error you observe. On the other hand, the code

halt = halthandle_t(this);

would create a new entry in the pointer table and potentially create such errors.

During reloading of a savegame, all calls to halthandles to register station pointers should be of one of the types

halt = halthandle_t(this);

or

halt = halthandle_t(this, halt_id);

If you mix both types of calls during one loading operation you will get those errrors sooner or later.
Parsley, sage, rosemary, and maggikraut.

Junna

May I suggest the server be paused/off while this issue is worked upon to prevent time passing whenever someone actually manages to get on for a short while whereas most cannot? Lindley is likely to be going to go bankrupt otherwise.

wlindley

Indeed, I have been able to connect for a few seconds now and then, this week.  Enough time to say Arrrgh but not long enough to do anything about it.

jamespetts

Ohh dear. I'm sorry about this - I am currently staying with my parents and don't have the login password for the server with me. I shall just have to ask people politely not to log in until I get this fixed, I think.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

don't worry about lindley lines, I fixed that. It now makes positive income.

btw, it seems like it will desync on the first try all the time, but the 2nd and 3rd try, there is a chance that you will be connected for a long time. Also, on the 3rd try, it may crash. It seems almost guaranteed that on the 4th try the game will crash. Also, it does not seem possible for more than one player to play at once.

I was able to log in long enough to add a temporary line alongside lindley's where it was losing money the most, due to a build up of about 15,000 pax.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

Hmm - the desync bug is not caused by the quickstone bug after all. I have managed to fix the quickstone crashes in my 10.x branch (I think), with the exception of the one in Sdog's game no. 176 (in which the saved game is corrupted - hopefully, the game will no longer generate corrupted saved games, however); but the desyncs persist.

Testing back to version 10.5, loading Sdog's game 162 then connecting to it allows the game to run without desyncs. However, if I then save 162 locally on the server and re-load it again, a client connecting to it will desync on connexion. This is all without producing the quickstone errors that, it seems, entirely coincidentally correlated with the desync errors in my earlier tests. I shall have to continue to look at other possible causes rather painstakingly, but it might take a while. In the meantime, if anyone would like to assist by looking through the code for any possible sync-related errors, I should be most grateful!
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

wlindley

Quote from: ӔO on February 12, 2012, 03:31:22 PM
I was able to log in long enough to add a temporary line alongside lindley's where it was losing money the most, due to a build up of about 15,000 pax.

Much obliged, although I believe the cricket fans might rather object to an elevated railway right through the pitch. :o

Also, the latest compile (with the convoy ID update) seems to have solved the desync errors here.

jamespetts

Quote from: wlindley on February 13, 2012, 01:39:55 PM
Also, the latest compile (with the convoy ID update) seems to have solved the desync errors here.

Hmm, that's odd for two reasons: (1) it didn't solve the desync errors in my tests; and (2) I never pushed a version where this change had any effect, as it is only operative if the Experimental version number is 11 or higher.

Can you elaborate on how you tested this?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

wlindley

With the version of 5 February, I get a desync after 1 or 2 seconds.  The version compiled this morning stays connected for as long as I have left it running.

ӔO

Quote from: wlindley on February 13, 2012, 01:39:55 PM
Much obliged, although I believe the cricket fans might rather object to an elevated railway right through the pitch. :o

Also, the latest compile (with the convoy ID update) seems to have solved the desync errors here.
yeah, just tell me when you don't need it anymore. I don't have any interest in expanding, but would rather improve my lines for now.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

Quote from: wlindley on February 14, 2012, 01:00:58 AM
With the version of 5 February, I get a desync after 1 or 2 seconds.  The version compiled this morning stays connected for as long as I have left it running.

Compiled from the 10.x branch? You are able to connect to the server with that...? They're not the same version. Hmm - very odd.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

#230
I have made significant progress in tracking down the desync issue. Loading Sdog's game "173", setting a breakpoint at line no. 1167 in simhalt.cc (in the current 10.x code) with the condition: "TEST_id == 1039 && TEST_id_dest == 2018" reveals that, seemingly at random, sometimes on loading the game and re-calculating the routes with "reroute_goods()", passengers at Brighton Stop bound for Mansfield Small Cricket Ground Stop will route via Brighton Town Hall Stop, with a journey time ("TEST_ROUTE") of 4496 (i.e., 449.6 minutes, or 7:24h), but sometimes the passengers will route instead via Brighton Railway Station, with a journey time of 4175 (417.5 minutes, or 6:58h). When run as a client/server pair, the server will usually retain the longer route, whereas the client will usually re-route to the shorter route. The first leg of the shorter route is achieved on foot using the walking connexions setting, so pedestrians will be generated (using the random number generator) during the recalculating routes phase of loading, and the random seeds will be desynchronised before even the first step.

It is not currently clear why this divergence is occurring. Passengers should always take the fastest route. Checking those particular stations on the map, both routes appear valid: one involves taking a 'bus a short distance to a very nearby stop, and then taking a longer 'bus journey to a dock, taking a ship accross a large lake, and then another 'bus to the final destination; the other involves walking to a local station, taking a slow train to the same dock, and the same route as before thereafter.

The difficulty with tracking down the issue is that it only appears on larger maps, yet it is very difficult to isolate particular instances of anything in the debugger on a large map when there are so many different things. Any coding assistance in tracking this down would be most welcome.

Edit: Further testing has confirmed that the walking connexions feature is not itself to blame for this issue: disabling the feature entirely still results in near instant desyncs on the 173 map.

Edit 2: A copy of Sdog's game no. 173 is available here for testing.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

in fact i had some very strange routing of pax occuring. About 30k pax in Shrewsbury Station rerouted frequently to completely different via destinations seemingly random. It included Sheffield (1000 tiles to the east), Taunton (500+ tiles to the NW), a stop within Shrewsbury, Shrewsbury Abbey station and sometimes distributed over a very large number of destinations none exceeding a few thousand pax) They fluctuated within a couple of game seconds (this is every time i unpaused and re-paused again i saw a different destination).

James please bear in mind that likely the total travel time is determined by the wait time at completely overcrowded stations. If this changes eg. by a convoy departing somewhere it can instantly change the wait times. The map has a very densely interconnected network leading to a complex system. (see how it is initially described by simple equations?) My guess is we see chaos here!

jamespetts

#232
Ahh, but my tests should give deterministic results, because exactly the same saved game is being loaded on each occasion, and the calculation of paths is always performed immediately after loading. However complicated the network, therefore, the results should be identical every time.

Edit: Further testing appears to show that waiting times might be the issue, although it is not entirely clear, as I cannot immediately link the discrepancy to this particular route.

At Bristol station, the only route to Bristol East Stop is by line no. 208, a 'bus route. Loading the game, the waiting time for Bristol East Stop is sometimes unknown (which defaults to about 2 minutes) or 17 minutes and 18 seconds. This discrepancy correlates exactly with the discrepancy with passenger routing from Brighton Stop to Mansfield Small Cricket Ground Stop: when passengers walk to the station and take the train, the waiting time between Bristol and Bristol East is unknown, whereas when they take the 'bus to the Town Hall stop and change for another 'bus to the dock, the waiting time is 17:18.

Edit: Additionally, it seems, saving and loading has a predictable effect on this. Starting the game afresh (with the demo map automatically loaded), then loading 173.sve, the waiting time from Bristol Station to Bristol East will always be unknown. Re-loading 173.sve without quitting the game and starting again will always give the 17:18 waiting time for Bristol East.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Junna

Quote from: jamespetts on February 19, 2012, 11:37:39 AM
Edit: Additionally, it seems, saving and loading has a predictable effect on this. Starting the game afresh (with the demo map automatically loaded), then loading 173.sve, the waiting time from Bristol Station to Bristol East will always be unknown. Re-loading 173.sve without quitting the game and starting again will always give the 17:18 waiting time for Bristol East.

This would explain why it always desyncs the first time you join the online game and why you can manage to stay on when you re-load from there.

jamespetts

I think that I have - eventually - solved this: see the 10.x branch. The problem was with waiting times not loading correctly after being saved.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

Yes, you will be able to return to world domination once I have deployed this latest version! ;-)
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

looking at the finances of everyone, network scale and how crowded it is around the cities, I don't think it's possible for any one of the major three to out compete any of the others. Unless, maybe, there is cooperation between two of the three, but even then, I doubt it.

We all just earn so much interest, that we would have to suffer substantial loses for there to be any hope of bankruptcy.
by substantial, I mean something along the lines of $50 million yearly, at least.


Oh, for anyone starting up a new company. There are still plenty of cities and towns that need connecting, and it is possible to make a profit even with only $250,000 in starting money. There are quite a few clusters of cities in the north east and south east corners that can easily result in profits. The LMS push-pull sets are more than adequate for this and the diesel buses are not bad either.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

Milko

Hello

Quote from: jamespetts on February 19, 2012, 05:16:01 PM
I think that I have - eventually - solved this: see the 10.x branch. The problem was with waiting times not loading correctly after being saved.

James, the killer bug!  :) Great!

Giuseppe

jamespetts

The server has now been restarted with 10.10, and no longer seems to desync. Thank you everyone for your patience!
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

rsdworker


jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

what's the passenger level at the server game. (still afraid to even look at it, online play is too addictive to me)

I notice that even at my game with the old low passenger level there aren't villages small enough to have use for any of the low capacity trains, like that 1 unit dmu (class 101? can't check here right now).

jamespetts

It is currently 12, although it will be reduced to 11 for the next release of the pakset, as well as reducing the journey time tolerance (which will reduce passenger numbers further).
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

Oh yes, journey times don't play a role at all anymore. In late 2009 i think you had the best balance. It was a very narrow edge for profitability, local transport had to be very well optimised for low travel times, else massive refund could cause a profitable company to quickly drop to inprofitability. (i managed only with three in about 20 tries to succeed for more than a decade, and only one for a few decades.)


Furthermore saturation of lines is a much bigger problem than profitability. Car usership didn't change a lot, as population growth over-compensated it.


(sorry, this is not related to bridgewater brunel, but is related to the topic)