Author Topic: Unicode paths  (Read 665 times)

0 Members and 1 Guest are viewing this topic.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 4806
  • Total likes: 191
  • Helpful: 108
  • Languages: EN, NO
Unicode paths
« on: April 02, 2017, 12:40:35 PM »
I've posted an earlier version of this patch here, but since that discussion is about something that has been completed, I start a new topic for this.

The purpose of this patch is to convert the internal UTF-8 strings Simutrans uses throughout to UTF-16 when compiling for Windows, and then calling the wide Windows API functions, or similar Windows specific extensions of the C and gzip API. There are some hacks in already, but it does things very odd and only if a few places.

I've been running Simutrans with these changed for a year, and I've never noticed problems. However, I have only done sporadic testing of running Simutrans in a directory with a non-ASCII name. There might also be recent problem due to merging in other changes, though. And that Simutrans reads Unicode paths correctly does not mean that it can display the names correctly, if the font does not contain the glyphs.

Although prissi was against it, the implementation is still located in simio.cc and not simsys.c, because some of the modified code is shared between simutrans and makeobj, and dragging simsys into makeobj just causes problems.

The patch is based on revision 8169.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 8795
  • Total likes: 319
  • Helpful: 229
  • Languages: De,EN,JP
Re: Unicode paths
« Reply #1 on: April 02, 2017, 02:50:42 PM »
Simutrans works well with unicode paths. I tested it with japanese (for instance with a japanese user name). It works, and also japanese file names work, they are saved correctly and will be loaded fine too. However, at least my version of bzip2 cannot open a file with utf16 and needs to use the short name anyway. And linux and mac use UTF8, as well as simutrans internally for all display actions. So where is the advantage?

Using windows specific extensions in the non-OS dependent part does not sound like a great idea to me, if it does not fix a real problem.

The display problem will be not solved by utf16, since the standard font has not the needed characters. Internally everything is utf8 already (which would allow even for more characters than utf16). Changing to freefont lib will solve the display problem, then you will see the correct name even in another language.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 4806
  • Total likes: 191
  • Helpful: 108
  • Languages: EN, NO
Re: Unicode paths
« Reply #2 on: April 02, 2017, 06:08:51 PM »
Well there is some strange code involving short path names and creation of non-existent files. My code just does things straight. And bzip2 does not open any files in Simutrans, it just operates on files already opened by fopen. From what I can tell, it has no idea what the file name is.

Yes, Linux and Mac uses UTF-8. I said this was for Windows only, however all path stuff goes through the new functions to keep the platform conditional compilation contained in one place (plus simsys_*.cc). Windows either uses "ANSI" or UTF-16 in its APIs. Since Simutrans is all UTF-8 internally, one must convert to one or the other. "ANSI" is deprecated. Most new parts of the API only get a Unicode implementation.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 8795
  • Total likes: 319
  • Helpful: 229
  • Languages: De,EN,JP
Re: Unicode paths
« Reply #3 on: April 03, 2017, 03:32:00 AM »
gzopen did not read unicode filenames when I tested it last time.  It chocked on real Japanese characters. Maybe I need to test it again. (It is rarely used nowadys, only for network games since per default savinf is bzip2. Are you sure it does the correct stuff with utf8 characters?)

Why the different code in win32_sound.cc?

And why changing the searchfolder? That worked well for a long time?

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 4806
  • Total likes: 191
  • Helpful: 108
  • Languages: EN, NO
Re: Unicode paths
« Reply #4 on: April 03, 2017, 06:12:13 AM »
gzopen did not read unicode filenames when I tested it last time.

Exactly! (That is, on Windows.) That is what this patch is all about. It uses gzopen_w instead.

Why the different code in win32_sound.cc?

The only difference is that I didn't bother going through dr_fopen since this file is platform specific anyway. However, I did use dr_fopen in simsys_w.cc, so it's not a big deal.

And why changing the searchfolder? That worked well for a long time?

All I did there was change the ifdefs to check for WIN32 and not MSC_VER. There is no reason why your Unicode improvements from 2015 should be for MSVC only.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 8795
  • Total likes: 319
  • Helpful: 229
  • Languages: De,EN,JP
Re: Unicode paths
« Reply #5 on: April 03, 2017, 08:26:07 AM »
Oh, searchfolder works only for MSVC indeed, the Mingw builds just display garbage names. That is another very longstanding bug in the 102.2.2 release. Must be there again for a long time. The Japanese community should have complained!

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 8795
  • Total likes: 319
  • Helpful: 229
  • Languages: De,EN,JP
Re: Unicode paths
« Reply #6 on: October 17, 2017, 02:01:32 AM »
Since the DSG patch was submitted, this is aso solved. Sorry, I lost it off the radar. Anyway, both patches were quite similar in it main function, i.e. providing dr_... functions for all file operations needed.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 4806
  • Total likes: 191
  • Helpful: 108
  • Languages: EN, NO
Re: Unicode paths
« Reply #7 on: October 17, 2017, 05:18:09 AM »
I'll be in for a fun merge job next time I fetch the latest code. Oh, well.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 8795
  • Total likes: 319
  • Helpful: 229
  • Languages: De,EN,JP
Re: Unicode paths
« Reply #8 on: October 17, 2017, 05:50:27 AM »
Sorry; but I think most diferences were rather in simsys.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 4806
  • Total likes: 191
  • Helpful: 108
  • Languages: EN, NO
Re: Unicode paths
« Reply #9 on: October 17, 2017, 04:06:34 PM »
That might be where I potentially have other changes. They may have been reverted earlier, I don't remember. Otherwise, I could have just reverted everything first.