AI bug fixing experiment - early stages

jamespetts · January 02, 2026, 11:46:28 AM

Hello. I have not been around here much recently, but I have been spending some time working with AI for coding lately and wondered how capable that it would be for a difficult and complex code-base such as Simutrans-Extended. I have been experimenting with the open weights model GLM 4.7. I asked it to identify and fix the bug causing the occasional thread deadlock on the server that I've been aware of but unable to track down for a long time. Here is a transcript of the complete conversation.

It identified the problem as an old bug from Standard (albeit more likely to affect Extended because it is more computationally intensive) that has alredy been fixed in Standard, and proposed the following patch based on the Standard fix:

Code Select

--- a/src/network/network.cc
+++ b/src/network/network.cc
@@ -XXX,Y +XXX,Y @@
    // Processing loop
    while(running) {

- // Old Extended Style: Hold lock during execution
- s->lock();
- network_command_t *nwc = network_read_command(s);
- if (nwc) {
- nwc->execute(welt);
- }
- s->unlock();
+ // Standard Style: Unlock immediately after read
+ s->lock();
+ network_command_t *nwc = network_read_command(s);
+ s->unlock();
+
+ // Execute command (World Lock only)
+ if (nwc) {
+ nwc->execute(welt);
+ // Cleanup command object immediately after execution
+ delete nwc;
+ }
    }

I have not had time to see whether this works yet, but, if it does, it is promising. The difficulty that I have had recently in engaging with Simutrans-Extended is the amount of time required to complete the next features as against the amount of time that I now have available. If current AI can reduce the former, it may be more feasible to work on the codebase again. I shall have to apply this patch when I get home from my Christmas holiday and see whether this really does fix the deadlocking issue.

Isaac Eiland-Hall · January 02, 2026, 03:34:00 PM

In my experience messing about with AI, there is a real limit to the amount of context you can give it to work with. You also need to know what you're doing. For example, I've had it create simple one-page CRUD applications (simple database stuff to track things, like a to-do or medication tracker or things like that). It puts in absolutely zero security.

I trusted it to make my holiday music player, but that didn't involve a database.

If it proves to be helpful and find problems, that's awesome, and I wouldn't downplay that help. I'd just be extremely cautious about using code it generates. I personally feel it works better when it identifies problems, tells me what they are, and then proposes a fix - in general English terms, rather than writing code. Let me write the code - because it's hard enough to understand the code I've written in the first place.

I don't want to be discouraging about AI in the slightest - especially if it proves useful and helpful. Just slightly cautious and sharing my actual experience with it.

prissi · January 04, 2026, 08:01:24 AM

In my experience AI can find glaring mistakes but with the typical context window sizes will introduce more bugs on codebases the size of simutrans (or even 10th of it). If a "bug fix" compiles at all. Taking a look at the reported code is a good idea, but that would be as far as I would trust it.

Especially things like deadlocks (which are a variant of the halt problem) cannot be solved by AI.

But it said all that.

Also, standard has no threading for network. However, if you assume a deadlock, both threads waiting for freelist seems much more likely, simply because that is called a lot all over the place and extension is pretty normal.

TransshipmentEnvoy · January 13, 2026, 03:45:37 AM

Will you try proprietary models as well? e.g, Claude Sonnet/Opus 4.5 or GPT-5.2? Maybe also coding agents like Claude Code / OpenCode.

prissi · January 13, 2026, 01:29:37 PM

These all have too small context windows to really fix a bug which may lurk somewhere. What the AI can do is add new functionality, with a related tool and probably even GUI code. A lot of it are just repeating patterns with different layouts, something the AIs should be very good at understanding.

jamespetts · January 14, 2026, 12:19:50 AM

Unfortunately, on further investigation, it transpires that the AI had hallucinated the code and the patch cannot apply because the codebase is entirely different from what it was patching. This particular AI is very inconsistent in what it can search. I will have to investigate Copilot when I have some more time.

I have had some significant success with AI assisted coding with writing a complex set of scripts for JMRI and actually modifying JMRI itself, but I was using Copilot for that.

News:

AI bug fixing experiment - early stages

jamespetts

Isaac Eiland-Hall

prissi

TransshipmentEnvoy

prissi

jamespetts