Substitutions in makeobj

Leartin · December 18, 2020, 12:11:14 PM

EDIT: Since I'm still actively working on this, information in this first post is outdated.

A while ago I suggested placeholders for makeobj and was told to write a patch. I did. And it even works for me

But I have no idea what I'm doing, so it might be a terrible hackish mess that could blow up the computer at any time

WHAT IT DOES:
It allows to define several lines of text which you want to be in several different objects in the same dat file.
you write "substart=x" to start a substitution block corresponding to the character 'x'. (there can be more characters in the line, but only the first one matters)
Then, you write the lines you want to use repeatedly in the file.
Finally, you write "sub-end" to end the substitution block

If at any later time in the file, you write the line "%x" (again, more characters in the line are possible, but don't matter), it will be replaced by the lines you wrote in the substitution block.
Here is an example. You can define the characteristics of one real life vehicle in a substitution block. Then create several objects using that vehicle as a basis, but for different goods.

Code Select


substart=x
obj=vehicle
waytype=road
[...] //more lines like weight, power, speed, length,...
sub-end

name=coalvehicle
%x
emptyimage[s,w,e,n,ne,sw,nw,se]=./image/coalvehicle.0.<$0>
freight=coal
Freightimage[s,w,e,n,ne,sw,nw,se]=./image/coalvehicle.1.<$0>
---
name=mailvehicle
%x
emptyimage[s,w,e,n,ne,sw,nw,se]=./image/mailvehicle.0.<$0>
freight=post
Freightimage[s,w,e,n,ne,sw,nw,se]=./image/mailvehicle.1.<$0>
---
name=coolvehicle
%x
emptyimage[s,w,e,n,ne,sw,nw,se]=./image/coolvehicle.0.<$0>
freight=icecream
Freightimage[s,w,e,n,ne,sw,nw,se]=./image/coolvehiclef.1.<$0>

The words "substart" and "sub-end" are just what I picked for testing, this could be any string, and could also be the same string (eg. both are "substitution" or "placeholder")
There is no set limit for how many placeholder can be used.

CODE CHANGES
The changes are restricted to the tabfile. The tabfile class got a map added in which lines from the substitution block can be stored, which is cleared when the current file is closed.
In tabfile_t::read, after the line is read, it's stored in the map if we are in a substitution block. (unless it's the end of the substitution block, which isn't stored)
Otherwise, we first check if the line starts with '%' - if so, it's a substitution marker, and all lines stored for that marker are fed to the previously existing code, rather then the line we just got. If the line does not start with '%', it's just fed to the old code as usual.
Finally, after the old code seperated a line in parameter and value, we check if the parameter equals the string that starts a substitution block. In that case, we remember that we are in a substitution block and skip all the code about shorthands and putting the key-value-pair in the object.

What I would have liked to do, but couldn't
I would have preferred if the substitution markers were strings, but I couldn't figure that out. Since strings are harder to compare, I guess I'd need a hash table or something. I prefer chars over indizes, since they can convey more meaning.
I would also have liked to run the lines of the substitution block through the shorthand-stuff once and only stored the key-value pair. It's probably the same issue as with the strings above, I wouldn't know how to store them.

Things that work/don't break the code
substitution blocks are completely independent of objects. They don't need to be seperated via '-' and can even be within an object. That shouldn't ever be done, the idea is to have all substitutions at the top of the file, but it works.
If a substitution block was not defined, but a marker is placed in an object, it's simply ignored. Well, technically, all it's lines are fed to the next code part, which are 0. A warning could be added in this case.
As an inherent part of makeobj, the order of lines matters, since only one of two lines with the same parameter will actually be stored in the object. This means that depending on where the marker is in the object definition, either the object-specific values have priority or the values in the substitution block. Basically, one can use this to create custom default-values for objects.
if the end of a substitution block is marked even though it never began, that will be ignored with a warning.

Things that don't work and can cause issues
If a substitution block were to start within a substitution block, the code would see it only when placing it at a markers location, but would then abort that process in favor of reading following lines as belonging to that new substitution block. This could be prevented, but I'm not sure that would be better (whatever the creator intended wouldn't work, perhaps better to mess up big then to mess up unnoticable)
If someone chose '#' as their character, they presumably copy that substitution block after every regular line, since I chose it to represent regular lines. Of course it can be changed to any other character, I just didn't want to use a special one like '\n' in case that causes other issues.

What I'd like to know
On the one hand, it would be nice if people could play around with this and see if it works for them. Should add a makeobj, but I did this on Linux so if someone could provide a windows-makeobj with this patch for testing would be great.
On the other hand, people who know C++ could maybe check if what I did has any value and can be salvaged to an actually worthwhile patch.

prissi · December 18, 2020, 01:25:23 PM

I am not so sold on %, since this is used a commends in some dat files I saw, %s could even occur in the translation tab files. Looking for

Code Select


<replace name="a1">
text to be inserted
</replace>
...
<insert name=a1>

would be also quite straight forward and more human friedly.

Apart from these technicalities, can you elaborate more on a real usage case? Because your example on top seems the classic case for different freight images (or liveries in extended) instead of filling the depot with similar vehicles. (And it could be easily done with copy and paste or using a preprocessor. )[/code]

Leartin · December 18, 2020, 04:27:52 PM

Quote from: prissi on December 18, 2020, 01:25:23 PM
Apart from these technicalities, can you elaborate more on a real usage case? Because your example on top seems the classic case for different freight images (or liveries in extended) instead of filling the depot with similar vehicles. (And it could be easily done with copy and paste or using a preprocessor. )

I'd like to see you try and create a vehicle that can transport coal, post and icecream, three different good categories. In p192c, there are 16 Opel Blitz vehicles which share 11 lines. They are 8 pairs (front and hanger) for 8 different good categories (some of them special goods). Clearly, they SHOULD all share the same introduction date, length, weight,... and just differ in freight, payload etc. Of course it's a workaround - if Simutrans allowed for one vehicle to be refitted for different goods categories, there would only be 2 objects and a lot less copypaste.

The liveries you mention for Extended don't exist in Standard, so people often create duplicated of the same object with the same stats and just different images. Mostly as addons, as they indeed spam the depot. Still, since Simutrans Standard has no liveries, using this method to create a lot of very similar vehicles is a usecase, and if people want to fill their depots with them, why not?

Another is constraints. When you have a group of vehicles that belongs together, just have them all in one file and write the constraints only once. If you add another vehicle to the group, you only need to add it once in the constraints instead of once per vehicle already in the group. That could be especially useful for semitrailer trucks. Of course, a better way to solve that issue would be to change the game code and allow constraint groups, but that doesn't exist.

Another is freight definitions. There are 13 goods in the piece good category in p192c (after half was split off to container goods). If different graphics are used for each of these goods, then each good needs to be assigned a number in that vehicle. The images can be defined via shorthand, so that's just one line, but the number can't. So that's 13 repeated lines for each of those vehicles that one could save.

Another is buildings. They are not functional, but exist for visual variety. As such, almost every building in p192c exists in several colorshifted variants. Identical values except for name, graphic, and chance. Ideally, the game would allow for variants to just be alternative graphics within the same game object (similar to liveries), but since that doesn't exist, one needs several objects for the same thing.

Increased readability. Certainly, you can't know when looking at an object what exactly the substitution marker stands for. But chances are you don't need to, precisely because they are the same for many/all objects in the file. If it's the file for narrowgauge coal vehicles, then you already know from context that they are all narrowgauge vehicles for buik good with the bulk definitions. But the values you can actually see defined in the object are the ones that make it unique. You remove a whole lot of noise, like with the shorthand dat, to make it easier to see the important parts.

And yes - naturally, it's just a search and replace function which could be done with a preprocessor or copy-paste in a text editor. But Simutrans has no preprocessor, and would it help the community if pakset repositories started to have non-standard .dat files because they just happen to use some third party tool to preprocess them for makeobj? Imagine: You found Simutrans, you want to help, you download makeobj and try to pak a file from your favorite pakset - but it fails since you need a preprocessor you find somewhere else, might not be available for your platform etc. pp.
I'm very much against deviating from regular .dat files as the "source files". I think they should be the ones humans edit when they want to change something, not some other files behind the scene that generate the .dat files. 'cause if .dat files were generated and not intended to be manually altered, they shouldn't be readable to begin with.

If that's not yet clear enough: Think of translation files. They are humanly readable, so naturally people would just edit them. Now, most paksets would tell them not to do that and rather use the translator, so there is some unity in that. But if one pakset saw .dat as editable sourcefiles and another as something you shouldn't ever touch,it just becomes chaotic.

As for the syntax you suggest - sure, I can try to adapt to that. It makes some sense given that <> is also what's replaced in shorthand dats. But I hope someone could look at what I did so far to tell me if it's an acceptable approach.

prissi · December 19, 2020, 12:19:25 PM

Since I did not touch the tabfile ever (this predates my involvement), I am also not fluent with this part of code.

If you work with more than single letter (as indicated above), you will anyway arrive at the use of hashtables, so I would go for a hashtable.

Leartin · December 20, 2020, 03:00:01 PM

Alright, so I use hashtables now to work with words.

The syntax looks like this now:

Code Select


<substitution>=a1
text to be inserted
</substitution>
...
<insert>=a1

Two changes in comparison to prissis suggestion:
1) I reverted "replace" with "substitution". I'd like to have a noun here, since it refers to an 'object' - the following lines - rather than an 'action' (like "insert"). Not that there are any plans to include that, but "replace", to me, evokes the feeling that whatever follows should be replaced.

Code Select


<substitution>=a1
text to be inserted
</substitution>
<replace>=a1
sometext
</replace>
...
sometext

In this case, "sometext" would be replaced with the substitution a1, "some text to be inserted". Well, at least that's how it feels to me and why I chose it. Again: Not that such a 'replace' would come from me or should be done.

2) I changed the position of '>'. I see no technical purpose to it, such that all it did would be requiring an extra step to delete it, or it would just be part of the name ( it becomes a1> in both cases). I think making it part of the parameter is cleaner and more in line with .dat files. - and I already have a case where it matters.

It functions mostly the same, just with strings instead of chars.
In the code, I refer to them as directives to double down on the preprocessor comparison.

I added a bunch of warnings for illegal things (like anything starting with '<' inside the substitution block)
New functionality: '<forget>' clears the table. '<forget>=a1' removes a1 from the table. That was easy to do in comparison, and I can thing of some cases where I'd use it at least (but rather specific).

What I still need to do: I think I should replace the "default string" with a null-pointer for regular lines such that I don't need a comparison (since null is false).

Something else I might implement, since it should be easy now:

Code Select


<substitution>=a1
text <%> inserted
</substitution>
...
<insert>=a1; to be

Quote from: prissi on December 19, 2020, 12:19:25 PMSince I did not touch the tabfile ever (this predates my involvement), I am also not fluent with this part of code.

I think I understand what the code does well enough. I'm more worried about all the things any C-programmer would know, but I don't, since I never formally learned it. Eg. whether clearing the hashtable that contains char-pointer-vectors is actually enough, or if I'd have to iterate through and clear every vector, or whether I'd even have to manually delete the data pointed to within each vector.

Besides that: Could you point me to any piece of Simutrans-Code you'd consider well-documented? I think right now there are too many comments as I have to remind myself of basic things, if I'd know how much they should convey, I could comment the rest of the code in the tabfile as well.

prissi · December 21, 2020, 01:05:48 AM

The syntax is a little weird to the people used to html and the like, but your version will avoid any troubles with existing files.

I think parts submitted by Dwachs usually have better comments than mine. And I think there are rarely too many comments, as long as they are not wrong or require a lot of effort if the code is ever changed again.

Also, I never learned C++ either - Simutrans was started to learn C++ by Hajo in 1997 ...

Anyway, I will check it (and then commit, if fitting) this week.

Leartin · December 21, 2020, 10:23:13 AM

Quote from: prissi on December 21, 2020, 01:05:48 AM
The syntax is a little weird to the people used to html and the like, but your version will avoid any troubles with existing files.

I agree, but it's also weird to cap off a line in a file where linebreaks have meaning, since you never need the closing '>'. I don't think I would look it up and ignore everything after it, nor would I fail the code if it was missing, so it really has no business existing. In that regard, it might be better to use a different character where closure isn't expected.
It's better to discuss this beforehand, since it's really easy to change, but shouldn't be changed after it's implemented.

Quote from: prissi on December 21, 2020, 01:05:48 AM
And I think there are rarely too many comments, as long as they are not wrong or require a lot of effort if the code is ever changed again.

I cleaned up a bit, describing several lines of code at once rather than each individually.

The new version compares less strings by using an empty string as default, so only the first character is checked for regular lines - I suppose that should speed it up, given how often it happens.

I'm not quite happy with inline replacement. The placeholder I use now is <?> ($, % and * are used by the shorthand code, if everything works there is no conflict, but if you forget to specify a replacement it aborts with a misleading error message). So, something like this is possible:

Code Select


<substitution>=bboe1100
name=bboe1100_<?>
obj=vehicle
waytype=track
...
emptyimage[s,w,e,n,ne,sw,nw,se]=./image/bboe1100_<?>.0.<$0>
</substitution>
---
<insert>=bboe1100;player
---
<insert>=bboe1100;green
---
<insert>=bboe1100;blue
---
<insert>=bboe1100;brown
---
<insert>=bboe1100;orange

(But certainly not feasable for paksets, just as addons for model-railway players who want to choose their crocodile color.)

prissi · December 21, 2020, 12:10:01 PM

I think I would rather go with <insert a=xyz> instead of an extra ; The latter feels weird.

An then

Code Select

<substitute a=xyz>Text which should follow with leading break</substitute> for single line text entries, i.e. not ignoreing the text after >

On the other hand, you will use this featuremuch more than me, so it has to work for you.

Leartin · December 21, 2020, 01:55:54 PM

Right now, I just seperate at '=' same as the normal key-value-pairs. For replacements, I only need to do the same thing again with ';'. Simple, straight-forward code.

For what you suggest, I'd have to seperate at ' ' - which I'm not sure is a good idea. Otherwise, I'd have to check the 'second character' if I encounter '<' to find out which directive it should be to compare and skip the right amount of characters. eg. if(line[0]='<' && line[1]='s'), I expect "<substitute", so I compare the first 11 characters of line with "<substitute", and if that works out, I repoint to line[12] as the beginning of the name, THEN look for an equal sign from there and snip to get the name, then...
Not sure I want </substitute> at the end of a line either, since it means I'd have to check every line I put in there for that tag at the end.

Like, I could do it, but it feels hackish.

I have a different idea though. Leave "insert" with one parameter, and create new directives "<replace0>" to "<replace9>". Any instances of <?0> to <?9> would be replaced by whatever is currently stored via these directives. (I think <?> would automatically be <?0>, so perhaps also <replace> as synonym for <replace0>)
The shorthand code already looks up and finds '<', so that's where this would need to be placed - getting rid of my very weird way of finding <?>. Since I didn't check the shorthand code too closely, can't say how long that will take - but I'm certain such replacements shouldn't be too big a deal.

Leartin · December 21, 2020, 07:05:41 PM

Coding style question:
I'm currently outsourcing the replacement thingies. If I'm not totally wrong, the following two codes should be equivalent (let's assume they are, I tested neither yet).

Code Select


	for(char *s = line; *s; s++) {
		if(!*s == '<') {
			continue;
		}
		char *t = s;
		t++;
		if(!*t == '?') {
			continue;
		}
		t++;
		if(!*t) {
			dbg->warning("tabfile_t::replace", "Line ends in incomplete '<?'. Replacement not possible. \"%s\"", line);
			continue;
		}
		char x = *t;
		if(  !(  x=='>'  ||  (  x>='0'  &&  x<='9'  )  )  ) {
			dbg->warning("tabfile_t::replace", "Found '<?n' with n not in [0-9;>]. Replacement not possible. \"%s\"", line);
			continue;
		}
		x = (x=='>' ? '0' : x);
		t++;
		if(!*t == '>'){
			dbg->warning("tabfile_t::replace", "Found '<?n' without closing '>'. Replacement not possible. \"%s\"", line);
			continue;
		}
		[...more stuff...]
	}

Code Select


	for(char *s = line; *s; s++) {
		if(*s == '<') {
			char *t = s;
			t++;
			if(*t == '?') {
				t++;
				if(*t) {
					char x = *t;
					if(*t == '>') {
						x='0';
						found_placeholder = true;
					}
					else {
						t++;
						if(*t == '>') {
							if(x>=48  &&  x<58){
								found_placeholder = true;
							}
							else{
								dbg->error("tabfile_t::replace", "Found <?n> with n not in [0;9]; Replacement not possible. \"%s\"", line);
							}
						}
						else {
							dbg->error("tabfile_t::replace", "Found unclosed '<?'. Replacement not possible. \"%s\"", line);
						}
					}
				}
				else{
					dbg->error("tabfile_t::replace", "Line ends in incomplete '<?'. Replacement not possible. \"%s\"", line);
				}
			}
		}
		if(found_placeholder) {
			[...more stuff...]
			found_placeholder = false;
		}
	}

Personally, I think the first version looks much cleaner and has the benefit of warnings close to the comparisons that cause them. However, it makes excessive use of "continue"-statements, which seem to be condemned almost as much as goto on various internet sources. Disregarding whether the code works, which version would be preferable?

And since I'm here: Is a simple "x-48" acceptable to turn a char x between '0' and '9' to an int? Should it be written as "x-'0'" just in case?

Leartin · December 22, 2020, 04:56:12 PM

So, the current status of this project.
There are now two destinct, but similar functionalities.

One is substitutions, where several lines are inserted:
With <substitution>=xyz; all following lines are stored under the name "xyz", until </substitution> appears. With <insert>=xyz, these lines are recalled. <forget>=xyz clears xyz, just <forget> clears all substitutions.

The other is replace, which allows inline-replacements.
With <replaceX>=xyz, where X is a digit, xyz is stored at the value X. In each line, any instance of <?X> is replaced by xyz stored at X. Omitting X equalls X=0. <replaceX> places an empty string at X, such that <?X> will be replaced by nothing (=removal).

Terminology (how I think of these things, to ease further communication):
substitution block: Lines between <substitution> and </substitution>.
directive: Parameters in '<>', since they are similar in functionality to a preprocessor.
placeholder: any '<?X>'
insertion-marker: '<insert>'.

Issues:
If one were to use a placeholder at the start of the line, it would be mistaken for an unknown directive and ignored. Since placeholders are intended to be used for values, not for parameters, I don't mind this, and think the occuring warning is enough.
It's possible to define a placeholder in the replacement string, such that it gets replaced over and over again indefinitelly. Not sure if I need to do anything about it, since it seems hard to do that accidentially.

Unlike the info-dump of the shorthand code which prints out every line in it's extended form, I chose not to print anything except for warnings.
May I assume that it's actually a mistake in the shorthand code, which should have used dbg->message() for those lines and NOT print them unless in debug mode? In that case, I would add additional messages as well, eg. ("Line %i was replaced by substitution block \"%s\"", current_line_number, subname) [And we would finally be able to see warnings again without endless scrolling xD]

My tests so far were rather rudimentary. I have one file in which I try all kinds of things I can think of, but it's definitelly not enough for practical use.

Leartin · December 29, 2020, 10:33:23 AM

No functional change, just adjusted messages. The expansion of the shorthand files now only appears in verbose mode, along with lines whenever substitutions/replacements happen. (Even if substitutions are denied, this would be very much needed to make warnings useful again)

I won't change anything else for now, as I'm still unsure whether what I did so far is acceptable.

prissi · December 29, 2020, 01:17:23 PM

Sorry, I did not find time for code review. Still stuck with 30 min per evening ...

Leartin · January 01, 2021, 07:45:31 PM

Again, no functional change - But I noticed that the patch did not apply to the current state of the repository due to a change by DWachs 2 weeks ago, so I updated it. [No rush though]

News:

Substitutions in makeobj