News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Log reports division by zero errors

Started by Matthew, May 12, 2023, 06:52:34 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Matthew

With logging turned on, you frequently see this message:

ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 5000 / 0
For help with this error or to file a bug report please see the Simutrans forum:
https://forum.simutrans.com

Over the course of 3 hours on B-B, I got the message 15,000 times for an average of roughly 80 per real-life minute.

I believe this has been occurring ever since ceeac kindly added this error message in this commit.

I know that it's a Very Bad Thing if a division by zero is sent to the CPU, and that floating-point discrepancies are a common cause of desyncs, but it appears that Simutrans is catching the zeros and stopping that happening (thank you Bernd Gabriel for this contribution!). So I don't know whether this is actually a bug to be fixed or a sign that the safety mechanism is working as intended.

If it was a normal function I'd add a parameter so that error message also reports the function that is calling this code. But I believe this error is triggered when an expression uses the overloaded operator/ with our custom floating-point type and tracing back from an overloaded operator is still beyond my level.

When I looked into this earlier I decided to leave the problem until I'd learned more C++, but since it's also puzzling Neroden I thought I'd start a thread with the very little I know.
(Signature being tested) If you enjoy playing Simutrans, then you might also enjoy watching Japan Railway Journal
Available in English and simplified Chinese
如果您喜欢玩Simutrans的话,那么说不定就想看《日本铁路之旅》(英语也有简体中文字幕)。

neroden

Thanks.  This assures me that these were pre-existing.  Tracing through an overloaded operator is definitely a PITA.
However float32e8 is not a widely used type, so we can search the entire codebase (with "grep") for places where it's used.

It's used in vehicle.cc with the age of the vehicle -- however, here only "pow" is used.  "pow" uses exp2 and log2.  log2 uses the division operator.  But it has a check for zero mantissa, though I am not sure the check is sufficient.  We would want to add a detection code in the log2 routine to see whether it's coming from here; I don't remember my floating-point roundoff well enough to know how to write that check.  This is in any case fairly unlikely.

It's used in simunits.cc, but that doesn't appear to call the division operator at all and never sets anything to zero.  Should be fine.

It's used in player/finance.cc, but this only divides by stated constants which are not zero, so that's OK.

It's used in dataobj/settings.cc, but these values should mostly never be 0, *except* possibly the output of ticks_to_seconds.  Nobody ever divides by the output of ticks_to_seconds, so this should be fine.

It's used in simconvoi.h in several places where values are set to 0.  This is worth looking into.
It's used in simconvoi.cc in several places where values are set to 0.  This is worth looking into.
It's used in gui/convoi_detail_t.cc in several places where values are set to 0.  This is worth looking into.
It's used in descriptor/vehicle_desc.h in several places where values are set to 0.  This is worth looking into.
It's used in descriptor/vehicle_desc.cc in several places where values are set to 0.  This is worth looking into.
It's used in descriptor/reader/vehicle_reader.cc in several places.  Likely to trigger during game loading?
It's used heavily in convoy.h, and again, values are set to zero.
It's used heavily in convoy.cc, and again, values are set to zero.

I would look at the convoy code. It's almost certainly the convoy code.  Find the parts of the convoy code (in the abovementioned eight files) which are using the float type and find all the divisions.  Instrument them with a check.

jamespetts

As Nathaneal may recall, the float32e8_t code was written by Bernd Gabriel in about 2012 after we discovered that we could not make floating point arithmetic synchronise over the network where the client and server are running different architectures (e.g., Windows and Linux respectively). It is used mainly in the physics code, although may have been introduced to other places where we need sync-safe floating point subsequently. I would strongly suspect that the division by zero errors are coming from the physics code, as this is likely to be the most intensive usage of this.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have tried testing with a debug build running a saved game from the Bridgewater-Brunel server from last month with a breakpoint on the divide by zero detection code, but this has failed: the breakpoint is never hit.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Matthew

Quote from: jamespetts on May 13, 2023, 12:30:23 PMI have tried testing with a debug build running a saved game from the Bridgewater-Brunel server from last month with a breakpoint on the divide by zero detection code, but this has failed: the breakpoint is never hit.

I tried to replicate this and discovered that I don't get the division-by-zero error logs in single-player mode, only when connected to a server.

Perhaps they could be caused by one of the calculations that is performed differently in network mode or by the checksum calculations?

If you need a server running a debug version, I could set up Williams-Webster for that.
(Signature being tested) If you enjoy playing Simutrans, then you might also enjoy watching Japan Railway Journal
Available in English and simplified Chinese
如果您喜欢玩Simutrans的话,那么说不定就想看《日本铁路之旅》(英语也有简体中文字幕)。

jamespetts

Quote from: Matthew on May 14, 2023, 08:35:46 AMI tried to replicate this and discovered that I don't get the division-by-zero error logs in single-player mode, only when connected to a server.

Perhaps they could be caused by one of the calculations that is performed differently in network mode or by the checksum calculations?

If you need a server running a debug version, I could set up Williams-Webster for that.
The only occasion on which I have been able to reproduce this is when Simutrans-Extended has just been started - the division by zero is generated when trying to set meters per tile (somewhere) before the paksets have been fully loaded. If this is the only place where it is occurring, this can probably be treated as a spurious report rather than a serious problem. Can anyone confirm whether this can be reproduced in other conditions?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

neroden

Ohhhh, if it's strictly on startup there's probably a way to fix it.