News:

Use the "Forum Search"
It may help you to find anything in the forum ;).

Experimental on Linux

Started by gyom, April 16, 2010, 08:10:57 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

gyom

Hi,

I tried to run Simutrans Experimental 7.3 on Linux (Ubuntu) but I keep getting a segfault.

gdb output:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()

I tried the i386 and amd6 versions, tried deleting the settings.xml, nothing works.

I use the build from http://www.43-1.org/~simutrans/simutrans-exp/i386/

Anyone managed to run STE on Linux ?

jamespetts

Gyom,

thank you for your report. Judging by your memory address, you are on a 64-bit system, yes? In that case, you should use the 64-bit version. I have not seen any crashes in that method before. Indeed, that is one that is not unique to Simutrans-Experimental (that part of the program is exactly the same as Standard). Have you tried running Standard on Linux?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

on my ubuntu 64-bit system the 32-bit version runs, unlike the 64-bit version.

neroden

Quote from: gyom on April 16, 2010, 08:10:57 PM
Hi,

I tried to run Simutrans Experimental 7.3 on Linux (Ubuntu) but I keep getting a segfault.

gdb output:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()

I tried the i386 and amd6 versions, tried deleting the settings.xml, nothing works.

I use the build from http://www.43-1.org/~simutrans/simutrans-exp/i386/

Anyone managed to run STE on Linux ?


I run it on Linux (Debian) all the time -- but I have a 32-bit system.

Looks like we may have a problem with 64-bit cleanliness in the code.  This is going to be tedious to find because there's lots of slightly sloppy integer type usage.  I'm trying to clean that up, but it will take a long time.

If you could give a full backtrace ('bt' in gdb) it might help.

sdog


Starting program: ~/simuexp/simutrans-exp-64-2010-04-11-fd32563
[Thread debugging using libthread_db enabled]
Reading low level config data ...
parse_simuconf() at config/simuconf.tab: Reading simuconf.tab successful!
Preparing display ...
Screen Flags: requested=10, actual=10
Loading font 'font/prop.fnt'
font/prop.fnt sucessfully loaded as old format prop font!
Init done.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()
(gdb) bt
#0  0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()
#1  0x000000000048e011 in button_t::zeichnen(koord) ()
#2  0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#3  0x00000000004a18b3 in gui_scrollpane_t::zeichnen(koord) ()
#4  0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#5  0x00000000004f9be3 in pakselector_t::zeichnen(koord, koord) ()
#6  0x000000000059ad39 in simu_main(int, char**) ()
#7  0x0000000000616664 in main ()


thanks for the howto in:
http://forum.simutrans.com/index.php?topic=4871.msg48080#msg48080

gyom

Thanks Sdog ! After reading your post I downloaded again the i386 version and indeed it is running fine !
I must have mixed it up while copying with the amd64  :-X
My backtrace for the amd64 is the exact same as Sdog.

Thank you all for your answers !

At last I can try the new version !  :)

neroden


#0  0x0000000000613063 in display_ddd_box_clip(short, short, short, short, unsigned short, unsigned short) ()
#1  0x000000000048e011 in button_t::zeichnen(koord) ()
#2  0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#3  0x00000000004a18b3 in gui_scrollpane_t::zeichnen(koord) ()
#4  0x00000000004d15c7 in gui_container_t::zeichnen(koord) ()
#5  0x00000000004f9be3 in pakselector_t::zeichnen(koord, koord) ()
#6  0x000000000059ad39 in simu_main(int, char**) ()
#7  0x0000000000616664 in main ()


Hmm, looks like these are compiled without debugging information.  :-(  A build with debugging information would give us *line numbers*.

It's not actually much slower, it just makes the files a bit larger.  To whoever does the automatic builds, could you perhaps build with DEBUG=1 to get us debug information in the official builds of experimental?  It seems to need a lot of debugging, so....

I'm guessing there's heavy inlining going on because there's nothing suspicious in the named routine, but there *is* suspicious stuff in the routines it calls: display_fb_internal and display_vl_internal.   The first thing to try is a recompile with -DUSE_C activated, because I bet that x86 assembly language code has embedded assumptions about the size of int.  If that fails, then the USE_C version probably has embedded assumptions too.

Standard is probably broken on amd64 too.

ansgar

Quote from: neroden on April 17, 2010, 08:45:04 AM
To whoever does the automatic builds, could you perhaps build with DEBUG=1 to get us debug information in the official builds of experimental?
Done. Debugging information should be included starting with the next build.

jamespetts

Ahh - I had taken debugging information out on the "official" builds because I wanted a clean release build for people to use that is as optimised as possible, without (for example) lots of assert checks slowing things down. Indeed, I specifically use the #IFDEF DEBUG flag to give additional (and not so user friendly) information in the GUI on occasions. Certainly, the Windows version is compiled as a "release" build in MSVC++.

What, I think, we could really do with is having nightly builds for Experimental on all platforms: the nightly builds would have the debugging information turned on, and the release builds would have it turned off.

As to the original poster's query - if I recall correctly, I think that there are 64-bit Linux builds of Standard. Perhaps some testing could be done of the Standard version to check whether this is a problem there, too? If it is, perhaps this topic needs to be moved out of the "Experimental" section of the board.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

neroden

Quote from: jamespetts on April 17, 2010, 12:26:18 PM
Ahh - I had taken debugging information out on the "official" builds because I wanted a clean release build for people to use that is as optimised as possible, without (for example) lots of assert checks slowing things down. Indeed, I specifically use the #IFDEF DEBUG flag to give additional (and not so user friendly) information in the GUI on occasions. Certainly, the Windows version is compiled as a "release" build in MSVC++.

What, I think, we could really do with is having nightly builds for Experimental on all platforms: the nightly builds would have the debugging information turned on, and the release builds would have it turned off.

As to the original poster's query - if I recall correctly, I think that there are 64-bit Linux builds of Standard. Perhaps some testing could be done of the Standard version to check whether this is a problem there, too? If it is, perhaps this topic needs to be moved out of the "Experimental" section of the board.

Agreed.  This would be very helpful.  Could one of the people with AMD64, where experimental is failing, try the "Standard" 64-bit build, so we know whether the problem is here or there?

sdog

as far as i remember it was the same issue in standard.
i'm just waiting for the nightly page to come online again, then i'll try.

sdog

QuoteCould one of the people with AMD64, where experimental is failing, try the "Standard" 64-bit build, so we know whether the problem is here or there?

sim-linux64_2010-04-27_v102.3_r3185

runs

jamespetts

SDog,

thank you very much for the test there. I don't currently have a Linux system on which to test Simutrans-Experimental, so this might be very hard for me to track down. Is anyone with a 64-bit Linux platform and the ability to compile from source able to assist here? If somebody could compile a build with all the debugging information turned on, it would be useful to have a full backtrace.

The difficult thing about this bug is that it appears to occur in code untouched by Experimental (and thus code with which I am entirely unfamiliar). Can anyone with 64-bit Linux confirm whether the 32-bit version runs satisfactorily? Simutrans does not, to my knowledge, benefit in any way from being compiled in 64-bit: indeed, there are no 64-bit Windows builds for this reason, but it has been suggested that a 64-bit build can be more stable on 64-bit Linux than a 32 bit build.

Finally, can anyone confirm whether this bug still occurs with the latest devel branch? If it works in the most recent versions of Simutrans-Standard, it is possible that some recent change to the code in Standard helped to fix the problem that was present earlier - I do recall that there have been some changes to the code recently that deal with platform/architecture issues.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

i suppose it's not very urgent, but would still good to track down the problem. if i can help depends mostly on the makefile. can you point me to the github url? (just found it, overlooked it two times.) Give me also some time to get used to git again.

jamespetts

Sdog,

thank you very much for volunteering to help - much appreciated  :-)
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

steffen

Hi,
"me too" lol. I'm running Gentoo amd64 and the 64bit version of STE segfaults whilst the 32bit version runs fine (ok so far I'm only at the start screen). Do you still need backtraces for this?

I have a little request too - any chance the names of the downloads could be made more informative? Ideal would be a name like "simutrans-experimental-amd64-2010-05-02" (same for the experimental Pak btw, the download filename is without version number). That way it would be much easier to track which version people used when encountering problems.

Here's the MD5 for the version that segfaults:
fd38cb53b873621aeba0c3dd2866ee22  simutrans-exp-latest

And thanks very much for this amazing game :)

jamespetts

Steffen,

glad that you enjoy Simutrans-Experimental! Ansgar is in charge of the Linux builds, so a request to change the names should be directed to him. As to the crashes - a backtrace would indeed be useful, as it would be good to get a 64-bit version running in case people have problems running the 32-bit version (although note that there is no performance advantage to the 64-bit version as I understand it: if Simutrans is using anywhere near 4Gb of memory, something is wrong in any case).

I should also be interested in your experiences of the stability of the 32-bit version on 64-bit Linux, to see whether a 64-bit version really is needed at all.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

neroden

Quote from: steffen on May 05, 2010, 03:56:10 PM
I have a little request too - any chance the names of the downloads could be made more informative? Ideal would be a name like "simutrans-experimental-amd64-2010-05-02" (same for the experimental Pak btw, the download filename is without version number). That way it would be much easier to track which version people used when encountering problems.
Actually, if you poke around a little, you'll find that the files are already available with dates.  simutrans-experimental-latest is a link to the dated one.

Perhaps we should simply point people to

http://www.43-1.org/~simutrans/simutrans-exp/i386/
http://www.43-1.org/~simutrans/simutrans-exp/amd64/

And ask them to get the files with the latest dates?

ansgar

Starting with the next build (simutrans|makeobj)-exp-latest should redirect to the files including a date.  This way the browser should save the files that way as well.

steffen

Quote from: ansgar on May 07, 2010, 03:09:54 PM
Starting with the next build (simutrans|makeobj)-exp-latest should redirect to the files including a date.  This way the browser should save the files that way as well.
Brilliant, thanks :)

Quote from: jamespetts on May 05, 2010, 04:10:36 PM
[snip] As to the crashes - a backtrace would indeed be useful, as it would be good to get a 64-bit version running in case people have problems running the 32-bit version (although note that there is no performance advantage to the 64-bit version as I understand it: if Simutrans is using anywhere near 4Gb of memory, something is wrong in any case).

I should also be interested in your experiences of the stability of the 32-bit version on 64-bit Linux, to see whether a 64-bit version really is needed at all.
Well so far the 32bit version is running fine for me. I did have reproducable segfaults when making a map with a large number of cities with large numbers of inhabitants but I want to a) use the git-version and b) track it down a bit more precisely before I report that.

As for performance, I agree on the 4GB issue, but doesnt x64 also have substantially more registers (and bigger ones at that) whilst disposing of at least a little bit of the ancient cruft that we carry around since the 8086? I'm no expert with C/C++ but I thought the compiler takes care of this more or less automatically so my gut feeling would be that making simutrans 64bit-safe would be good. Also there are some 64bit arches that do not have hardware support for running 32bit code. And whilst 4GB for a single program may seem obscene now, I think in a few years we'll look at the issue differently. So my vote, realising that I don't get one, goes to continuing the effort for a 64bit version :)
Stand by for the backtrace, this game is just so addictive..

jamespetts

Steffen,

ahh, yes, it would be good to make it 64-bit compatible for portability, if nothing else. It's not my main priority at present (and it's hard for me to test, as I only have a 32-bit machine), but if you (or anyone else) can find the problem and propose a sensible solution, I'll happily fix it :-)
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

neroden

We definitely want simutrans to be 64-bit clean.  The thing is, I have no idea what part isn't 64-bit-clean.  Nothing is jumping out at me, and I don't have a 64-bit machine to test on.  There are all kinds of sloppy integer conversions throughout the code, and sometime I'm going to go through and clean up as many as I possibly can, but that's going to take a long time, so if we can find the actual cause of the problem, that would be best.

steffen

Ok it seems the self-compiled version runs just fine. I confirmed with file that its a 64 bit version.
Can you confirm that the 2010-04-11-fd32563 is the same as the latest git from http://github.com/jamespetts/simutrans-experimental (last date of change is April10, the commit ID conspicuosly starts with fd32563)? I'm really confused.. does anyone have any ideas?

jamespetts

Are you compiling from the -devel branch or the -master branch?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

steffen

Good question, I never told git specifically so I would assume its the master branch. Here's the command I used: git clone http://github.com/jamespetts/simutrans-experimental.git
Which branch is used to create the automatic builds?

jamespetts

The Master branch is used to create the automatic builds - if you are building the master branch, you will get Simutrans-Experimental 7.3.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

steffen

Then I'm really confused, how can it be that the auto-build segfaults but my own build works? What exact settings (and GCC and library versions) are used to make the autobuild, I could try rebuilding with those to try and reduce the number of possible causes.

jamespetts

This is indeed odd. Perhaps Ansgar can assist...?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ansgar

The autobuild is done in Debian Lenny which has these libraries:

libsdl1.2-dev 1.2.13-2
libsdl-mixer1.2-dev 1.2.8-4
zlib1g-dev 1:1.2.3.3.dfsg-12
libpng12-dev 1.2.27-2+lenny2
libbz2-dev 1.0.5-1
gcc 4:4.3.2-2

And this config.simutrans-exp.

sdog

#29
i tried to compile devel on a x86_64

pulled it that way:
$ git pull <url> devel

$gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3


i'm using a config.default similar to ansgars (thanks your post helped me a lot)
same flags, no ccache,

it failed at dataobj/einstellungen.o
dataobj/einstellungen.cc:979: error: cast from 'const char*' to 'uint32' loses precision

changed that to uint64, that compiles
next error at

===> CXX simgraph16.cc
simgraph16.cc: In function 'void rezoom_img(image_id)':
simgraph16.cc:1290: warning: comparison between signed and unsigned integer expressions
simgraph16.cc: Assembler messages:
simgraph16.cc:2370: Error: suffix or operands invalid for `jmp'
make: *** [simgraph16.o] Error 1


(continued)
that's the same point where main branch didn't compile for me last time i tried.
it looks quite like asembler to me, i have quite some problems to read c++ already, but here i'm completely lost.

google is my friend, someone else found that too:
http://bugs.gentoo.org/show_bug.cgi?id=291285
didn't really help me though...

seems like prissi can:
http://archive.forum.simutrans.com/topic/06875.0/index.html

and i overlooked it in ansgars config.default

ifneq ($(shell dpkg-architecture -qDEB_BUILD_ARCH),i386)
 FLAGS += -DUSE_C
endif

i read if equal, and ignored fillilly ignored it.
so now, compiling with += -DUSE_C

which works

minor interruption, to get libbz2-dev
linked without error, testing now

it runs
no errors

that's enough for me today, it's late and i'll go chopping* some heads off in mount and blade,
please let me know what further testing of the binary you need.


here's the binary and the log file for download:
http://dl.dropbox.com/u/1876190/simexp-64-8.0-devel
http://dl.dropbox.com/u/1876190/simexp-64-8.0-devel.log


*i should better use the passive  here, get my head chopped off, still relaxing.


update
couldn't resist trying a bit and caused this:

Program received signal SIGSEGV, Segmentation fault.
0x000000000059e72c in vector_tpl<koord>::remove_at (this=0x7fffffffb910,
   pos=0, preserve_order=false)
   at boden/wege/../../dataobj/../tpl/vector_tpl.h:213
213 data[pos] = data[count-1];


error is reproduceable, just loaded my savegame (the one i uploaded previously), and waited.
gdb output:
http://dl.dropbox.com/u/1876190/gdb.txt

another update
most of the problems here have been discussed in detail in this thread, only a few days ago:
http://forum.simutrans.com/index.php?topic=5066.msg49722#msg49722

sdog

Quote from: steffen on May 09, 2010, 10:44:43 PM
Then I'm really confused, how can it be that the auto-build segfaults but my own build works? What exact settings (and GCC and library versions) are used to make the autobuild, I could try rebuilding with those to try and reduce the number of possible causes.
perhaps ccache causes the problem?

this bug "ccache sometimes returns 32bit objects to a 64bit build" is rather old, but still worth to check ccache?
http://bugs.gentoo.org/show_bug.cgi?id=196243



steffen

Good idea but I don't have ccache installed so that can't be it :(
I'm not sure if you understood me right, my own build appears to work fine (it loads with the experimental britain pak, it loads my save game originally made with the i386 autobuild, I tried running it a couple of minutes) but the autobuild segfaults before it even asks me which pak I want to use.
So we can exclude my run-environment, the actual code and the pak as sources of the problem with the autobuild. (I refer to the problem that it doesnt even launch, the problem in vector-tpl.h

Now of the top of my head I can think of these remaining suspects:
- lack of DEBUG in the 04-11 autobuild causes the problem
- some other difference between sdog's and mine vs. the autobuild config causes it
- the versions of GCC/libraries used in autobuild cause it
- the versions of GCC/libs used during compile is incompatible with the versions used by sdog and me.

Not knowing much of C the last two are my favourites right now, so I'm emerging the old libraries.

sdog: I think it might be worth opening a new thread for the segfault you discovered during running your own build. I think it's save to assume that that is a different fault then whatever is causing the autobuild to segfault on startup.

neroden

Quote from: steffen on May 11, 2010, 05:51:22 AM
Good idea but I don't have ccache installed so that can't be it :(
No, no, the autobuild might have ccache installed, and that may be causing breakage there.

steffen

Quote from: neroden on May 11, 2010, 06:20:45 AM
No, no, the autobuild might have ccache installed, and that may be causing breakage there.
Ah ok, yes that could be a cause. But the bug linked appears to be specific to Gentoo's handling of having two-in-one platforms, and was marked fixed (by update to an eclass, which are again Gentoo-specific) more than a year ago. But I guess until proven otherwise we should consider that an option.

Ansgar: Is ccache active on the autobuild system?

neroden

Quote from: sdog on May 10, 2010, 04:36:17 AM
it failed at dataobj/einstellungen.o
dataobj/einstellungen.cc:979: error: cast from 'const char*' to 'uint32' loses precision

changed that to uint64, that compiles
Is this bug still happening with -DUSE_C?  If so please rerun it with the latest devel and give me a new line number, since the old one seems a bit out of date.  This is the sort of thing I should be able to solve.

Quote
update
couldn't resist trying a bit and caused this:
Code:

Program received signal SIGSEGV, Segmentation fault.
0x000000000059e72c in vector_tpl<koord>::remove_at (this=0x7fffffffb910,
    pos=0, preserve_order=false)
    at boden/wege/../../dataobj/../tpl/vector_tpl.h:213
213 data[pos] = data[count-1];


error is reproduceable, just loaded my savegame (the one i uploaded previously), and waited.
Oh hell, I found a logic bug in vector_tpl in experimental -- nasty off-by-one error.  Don't roll your own container classes, folks!

Patch is on the jp-devel branch of my git repo (git://github.com/neroden/simutrans)

If that doesn't fix it, or if the bug is in standard (which doesn't contain the off-by-one error) then I need two pieces of information from gdb: the output of 'print count' and 'bt'.