DSx86 - Blog

June 27th, 2010 - Self-modifying code & game-specific fixes

IRQ handling using self-modifying code!

Well, the big architectural change in the next version is that I finally got the self-modifying version of my IRQ handling code to work! This is something that I have wanted to do pretty much since the start of this project. I tried to use SMC back in December last year, but I could not make it work reliably at that time. Now at the beginning of my summer vacation I decided to look into this again, and now it looks like I got it to work reliably! The rewritten IRQ handling I did at the end of May probably helped to make this work, as the IRQ code was now in one place and easier to change.

Ever since the beginning the main opcode dispatcher loop of DSx86 has looked like this:

loop:
    ldr     r1,[sp, #SP_IRQFLAG]    @ Get the IRQFlag value (set to 0 if we need to handle an IRQ, else 0xFF)
    ldrb    r0,[r12],#1             @ Load opcode byte to r0, increment r12 by 1
    mov     r2, r3                  @ Clear segment override, r2 = r3 = physical DS:0000
    and     r1, r0                  @ AND the opcode value with the IRQFlag value, result is either just the opcode or 0
    bic     r9, #0xFF               @ Clear segment override flags from low byte of r9
    ldr     pc,[sp, r1, lsl #2]     @ Jump to the opcode handler (or opcode 0 if r1 == 0)
	
@ ------------------- 00 = ADD r/m8, r8 -------------------------------
op_00:
    @-------
    @ Check if we are to handle an interrupt instead of op_00
    @-------
    mrs     r1,cpsr                 @ Save flags to r1
    cmp     r0, #0                  @ Do we really need to handle opcode 00 instead of an IRQ?
    bne     IRQStart                @ Nope, we need to handle an IRQ instead.
    @-------
    @ Handle opcode 00. No need to restore processor flags, as the following "adds" will change them anyways.
    @-------
    ...
That is, I first read a mask from the stack which states whether the code should jump to IRQ handler next or keep on handling opcodes. Then I read the opcode byte, then make the effective segment register r2 point to the start of the DS segment, then mask the opcode byte with the IRQ mask, then clear the flags that tell whether we had a segment override prefix, and then load the program counter from the opcode handler address table (in stack), which causes a jump to the opcode handler. If the SP_IRQFLAG is zero, the jump goes to the opcode 0x00 handler, which first checks whether the opcode actually was a zero, and jumps to the IRQStart handler if it wasn't

The annoying thing in this code is that I need to perform a memory read for the IRQ mask and a masking operation for every single opcode, even though the IRQs happen extremely rarery (from the CPU speed point of view). That is 4 extra CPU cycles that go to waste for every single opcode. Really annoying and frustrating when coding an emulator that should run as fast as possible.

However, now that I finally managed to make the SMC version robust, the main opcode loop looks like this:

loop:
    ldrb    r1,[r12],#1             @ Load opcode byte to r1, increment r12 by 1
    mov     r2, r3                  @ Clear segment override, r2 = r3 = physical DS:0000
    bic     r9, #0xFF               @ Clear segment override flags from low byte of r9
    .global SM_IRQFLAG
SM_IRQFLAG:                         @ SELF_MODIFIED CODE!
    ldr     pc,[sp, r1, lsl #2]     @ Jump to the opcode handler (or load r0 register if IRQ)
    b       IRQStart                @ Jump to IRQ handler if we did not jump above.

I got rid of the IRQFlag in the stack and the mask operation, so the main dispatcher loop does not have any code relating to IRQ handling. Instead, I replace the opcode at the SM_IRQFLAG address with a different one when the code needs to jump into IRQStart. The two opcodes that can be at SM_IRQFLAG are as follows:

#define IRQ_ON      0xE79D0101      @ ldr       r0,[sp, r1, lsl #2]
#define IRQ_OFF     0xE79DF101      @ ldr       pc,[sp, r1, lsl #2]
That is, the opcode loads r0 register (which is used as a scratch register in DSx86) instead of the program counter when an IRQ handling should start, so the program flow continues to the "b IRQStart" branch instruction. The IRQStart routine then restores the IRQ_OFF opcode to SM_IRQFLAG and then performs other stuff needed when beginning an IRQ handling. This change removed the 4 extra CPU cycles from the handling of every opcode, so DSx86 became 8% faster than before just by this small change!

As you might remember, at the end of May the Norton Sysinfo displayed the DSx86 speed as 10.6 times original PC. After this change the speed is up to 11.5 times original PC, which is even faster than the original Nov 12th, 2009 Sysinfo measurement of 11.3 times original PC on real hardware and 11.6 on No$GBA. At that time the code was still missing most of the current features, so it is no surprise it ran much faster then that it has been running recently before this SMC change.

Game-specific fixes

I have been following the compatibility wiki closely, it is very interesting and motivating to see how the compatibility of DSx86 improves by each version, and how thoroughly the testers test and report the problems. So, I decided to focus on the games not working in version 0.15 as reported on the wiki, and especially on the games that have their own pages there. I believe if testers have spent time in creating game-specific pages, they would propably like to see those games actually working! :-)

However, I started by looking further into the problem in Gods, and found out that the flickering problem is caused by the game accessing the VGA VRAM with the ES segment register value of 0x9FFC, instead of something between 0xA000 and 0xAFFF which is how I detect access to VGA VRAM. I added some hacks to the opcodes that the game uses with this segment register value, which fixed the problem in this game, but the annoying thing is that these hacks will slow down these opcodes in every software that uses these opcodes. Luckily the opcodes in Gods were mostly some reasonably uncommon ones, so this should not be much of a problem.

I also looked into the graphics problem in Silpheed, and found out that it does pretty much the same thing as Gods, in accessing the VGA VRAM with ES segment pointing to 0x9F00. This game however used the absolutely most common opcodes, so I really hate to hack these opcodes and make them up to 10 times slower just to make this one game work properly. This actually was the thing that made me look into the IRQ handling again, I thought that if I could make every single opcode run faster than before, perhaps then I would not feel so bad about making the common ones run 10 times slower. I haven't yet made these hacks, as I am still looking into possible other options or workarounds to not have to slow down the most common opcodes. So, Silpheed might not work properly yet in the next version.

After those games I then started looking into the games with their own pages in the compatibility wiki, the first one being A-Train. It seemed to sort of work, but the scenery graphics were strangely monochrome, even though other things on the screen seemed to use the correct palette. After quite a bit of debugging I finally noticed that the game sometimes uses "EGA Register Interface Library" calls to change the EGA registers, and sometimes it accesses the registers directly. I had just ignored all EGA Register Interface Library BIOS calls in DSx86, assuming that if a game wants to use them it would first query whether such exist. A-Train did not query the existence, but instead blindly used the calls, which did nothing in DSx86 and thus it always wrote to the same bit plane in EGA VRAM and thus the graphics got monochrome. I implemented the EGA RIL calls that A-Train needed, and the scenery began to look correct. There are still some minor graphics issues that I need to look into, but it is mostly OK now. The game uses 640x480 VGA mode, so fitting it properly into the DS 256x192 screen will be rather awkward.

 

The next game I looked into was Alcatraz. It had some serious palette issues and also other graphics problems. It is actually quite curious how it feels like I have fixed the palette handling in DSx86 half a dozen times already, and still I constantly run into new games that use a broken palette! Very strange.. Anyways, I haven't yet figured out what the problem in Alcatraz is, so I will continue looking into this. At first I just wanted to see how it works, as the compatibility wiki only shows information for DSx86 version 0.14.

Next I checked Buck Rogers - Countdown to Doomsday, which looped in trying to write and read port 0x2BB. I don't know what the game thinks that port should contain, but in any case DSx86 has nothing useful in that port so I just ignored the access, and after that the game seemed to work fine. Well, it has the same "Unexpected save error 3" problem as the other Buck Rogers game if the save game is not setup in the config file, but this is not a problem of DSx86.

The next game I checked was LHX Attack Chopper. It turned out to need quite a few new EGA opcodes, but it did not have any other problems (not counting the horrible PC beeper sound effects) so it was pretty easy to fix. It should work fine in the next version, though I have only flown the chopper around a little bit and gotten shot down. :-)

 

Perhaps the most interesting game I tested was Ugh!, as it uncovered a problem in the DSx86 keyboard handling routines. The game did not recognize cursor key presses, and after I debugged the keyboard IRQ handler of the game I realized that DSx86 does not send the extended keyboard prefix 0xE0 which the game expects. DSx86 emulates the old 83-key PC keyboard and not the currently standard 102-key extended keyboard. However, since games (like Ugh!) might expect to communicate with the extended keyboard, I decided to add the 0xE0 prefix byte to the extended keys, including the cursor keys, to DSx86 keyboard routines. So, from the next version onwards, the correct key map in the DSx86.ini file for cursor keys should look like the following:

KEY_UP=E048
KEY_DOWN=E050
KEY_LEFT=E04B
KEY_RIGHT=E04D
However, the old plain 48, 50, 4B and 4D scancodes should work fine in the games they currently work, it just means that the keyboard does not look like an extended keyboard to those games. By the way, this change also enables some new keys to be mapped, like the Right Control key E01D, Right Alt key E038 and Keypad Enter E01C, which don't exist in the touchpad keyboard. Also, if for some reason some game stops recognizing the new cursor keys, you might try overriding the new extended keys in the DSx86.ini with the old one-byte versions for that game.

Finally, I started looking into the problems in Castle Adventure. The unsupported INT call was due to a missing FCB file handling operation, but when I added that the game still did not work properly. After some debugging I noticed that the FCB structure I used in my FCB handling routines was not correct, so I fixed that, but still the game has problems. The strange thing is that the problems differ in each environment I try to test it. In iDeaS the game progresses to the first room, but does not show the player character and does not take any commands. In my real DS Lite it states "String formula too complex in line 5055" when pressing 'p'. In the bundled version of DSx86 running in No$GBA it just states that the file "CASTLE.RAN" is missing and exits. All in all, something very strange is going on with this game, so this still needs some debugging. I suspect my FCB file functions in general are not very robust yet.

Anyways, my summer vacation is starting now, so next week I can work on DSx86 quite a bit. I haven't yet decided whether I continue fixing the games on the compatibility wiki or start implementing new features already, but we shall see. It is summer and I have no obligations, so I'll do whatever feels interesting at the time. :-)

June 20th, 2010 - Version 0.15 released!

This version has the following major changes:

Since the last blog post I have also debugged a couple of new games. I spent many hours (including the whole of yesterday) debugging the memory allocation problem in Abandoned Places, but still could not figure out why it skips allocating a 64KB block of memory (that it does allocate in DOSBox) and then crashes when it clears this block that it never allocated in the first place! It is quite difficult to trace something that does not happen, so I still need to think up new ways to approach this problem.

Another game that I only started debugging today is Gods. It seemed to hang at the beginning with a black screen, and it didn't even respond to the user breakpoint request. Luckily this was not a problem when running it in iDeaS, and with it I was able to find out that my VGA Display Status Register emulation was still not correct. The bit that returns whether the display is in Horizontal Blank should also return the Vertical Blank time. When I added that, I got Gods to progress up to the start menu, but the graphics were completely garbled. After some debugging I noticed that it goes to 640x350 EGA mode, but then directly writes new values to the CRTC registers so that the display size is actually 320x200! My screen blitting functions are based on the graphics mode byte, so the game used the graphics memory like it was 320x200 pixels while my blitting routines used it like a 640x350 layout. Btw, the game does not look correct on DOSBox either (at least on my rather old version), but as DOSBox uses the real CRTC registers it is not as broken as it was in DSx86.

I added a quick hack to the CRTC register setting so that if the mode is 640x350 but the new CRTC register value tells the horizontal size to be 320, I change the graphics mode to 320x200. This allowed the game to display a properly-sized screen, but there are still some severe problems with scrolling and paging, so the game is not playable yet. I believe I need to change my blitting method completely to read the current CRTC register values instead of the graphics mode, to properly handle situations where the game accesses the CRTC registers directly.

Anyways, this version is yet another fix version without any major new features. My summer vacation starts within a week (Yippee!), so after next week I'll start working on the bigger issues, like improving the AdLib emulation, implementing the missing SB DSP commands, adding better screen scaling methods, improving the mouse emulation etc etc. I am very much looking forward to being able to really focus on adding the missing features to DSx86.

Thanks for your interest in DSx86, and please send me debug logs and update the Compatibility Wiki as you test games on this new version! Here below are pictures of the Great Escape and Berlin 1948, which should now run in DSx86.

 

June 13th, 2010 - Fixes and improvements

No more LOADFIX needed in the next version!

The next version will have at least one major improvement: No need for the LOADFIX command any more! Or at least I hope it won't be needed. I'll still leave the command available in case some game works better with it, for some reason. This would be rather strange, though.

While debugging Moonstone a couple of weeks ago I noticed that the reason it hangs after the intro was that the MAIN.EXE it tries to run after the intro suffered from the "Packed file corrupt" problem. Usually running LOADFIX before running such a game will help, however with Moonstone it did not improve the situation, MAIN.EXE still did not launch but this time complained "Not enough memory". I stopped working on it for a while, but began thinking of ways to get rid of the need for the LOADFIX feature completely (see my Jan 17th, 2010 entry for more info about the LOADFIX problem, if you are interested).

The solution suddenly occurred to me last Friday, and yesterday I spent the whole morning reorganizing the internal memory emulation in DSx86 to use my new idea. Since the problem is caused by buggy EXEPACK decoders that wrap the segment address below zero (to 0xFFF0 and such), I thought that what about if I move the emulated BIOS segment (0xF000..0xFFFF) of DSx86 before the actual RAM memory area (0x0000..0x9FFF)! That way the segment wrap would not cause any problems, as addresses like 0xFFF0:0xFF00 will then automatically point to the correct place in the DOS RAM area without any need for a special handling for such a situation! Gotta love such sudden coding ideas that both make the code simpler than before and automatically handle difficult situations. :-)

I had earlier had the BIOS 0xF000 segment as a separate data block of exactly 64KB in size, with various BIOS tables and other static data already located in the "final" locations within the block. The DOS RAM in turn is a bss block of 640KB. So, implementing this new idea meant that I had to copy all the BIOS static data into the bss memory block after DSx86.nds has been loaded, and it also means that the memory footprint of DSx86 will increase somewhat (as the static data is in memory twice, once in the data area and once in the bss area). However, as I don't have the extra blank space in the 64KB data block taking space from the DSx86.nds file itself, the file size got smaller. This change also will improve the DSx86 internal reboot, as the BIOS area will be recreated during a soft reboot, so it won't get corrupted as easily.

This was a rather extensive and a little bit scary change, so there is a chance that some games that need something specific from the BIOS (that I had forgotten to copy to the new BIOS area) might not work in the upcoming version. I hope I remembered to copy everything (and did not make any typos!), though.

EGA smooth scrolling fixed

The EGA horizontal smooth scrolling issue will also be fixed in the upcoming version. Late last Sunday I already thought I got it fixed, after I changed my code so that the EGA starting address values that the game sets up will only get used after one full frame has passed. With that change both Supaplex and Heimdall scrolled very smoothly. However, on Monday I then tested Commander Keen 4, which had always scrolled smoothly (even though it does not sync to the vertical retrace signal), but it scrolled now so badly it was actually nauseating! So, back to the drawing board..

The actual problem is caused by the fact that on the real EGA/VGA card, the screen start position (which scrolls by 8 pixels at a time) can be setup whenever the display is active, and it will take effect after the next vertical retrace time. The pixelwise smooth scrolling register however takes effect immediately, so it needs to be set during the vertical retrace period. Both Supaplex and Heimdall used a system where they first wait for a vertical retrace period, then wait for the screen active period to start, then set the coarse start position, then wait for the screen retrace period to start, and then set the pixelwise scrolling register.

However, in DSx86 I need to use the current values of both of those registers when I start blitting the screen, and this happens during the vertical blank time of the NDS display. It finally occurred to me, that from the PC game's point of view, the vertical blank period when I blit the screen is actually the active display time! So, perhaps if I simply swap the VBlank bit of the NDS display status register when reporting the current status to the PC game... And voilą, that fixed the scrolling in all the games! Ha, yet another simple and clean fix, I just needed to do some thinking to figure it out. :-)

Game-specific fixes

Here is a short list of games that should run in the next version, and a description of the fixes I have made to DSx86 to make them run properly.

Miscellaneous

I have also fixed various other issues mentioned in the debug logs I have received. Currently for example History Line 1914-1918 progresses further than before. I have also tested various other games but haven't yet figured out the problems in them. I should also soon look into the save game problem that affects several games, including Supaplex. The problem might have something to do with my using the Unix-style file function wrappers over the libFAT functions, which are not fully compatible with the way DOS file access is supposed to work. I guess I should at some point start accessing the libFAT functions directly (or just include a custom version of libFAT), but I really would rather not have to do that.

By the way, I yesterday downloaded the new version 1.0.3.6 of iDeaS, and have been using it to debug DSx86. The new version has fixed the problems I had with it in the previous version, so now I can use it fully to debug these badly behaving games. For example the problems in Alone in the Dark were reasonably easy to find using iDeaS. It is rather slow, but the cool feature of it is that it has DLDI support, so I can run and debug any game on my PC's hard disk with DSx86 running in iDeaS!

Big thanks again to all of you who have been sending me the debug logs, and updating the Compatibility Wiki with new information! It is very fun and encouraging to notice how DSx86 has created a community of people who test games and work on the wiki pages. I really did not anticipate anything like this when I started working on DSx86!

June 6th, 2010 - Version 0.14 released!

This version has a lot of minor fixes and improvements. There are about 50 new graphics (and normal) opcodes supported, including the 386-specific long conditional jumps. The problematic debugger breakpoint interrupt INT 03 has been adjusted so that it is silently ignored if a DOS program does not handle it and if the DSx86 inbuilt debugger is not active. I also fixed the IRQ handling, and added a debugger E command, as mentioned in the previous blog post. If you are feeling very adventurous, you can (at least in theory) use the new E command to replace an unsupported opcode with a NOP opcode (hex code 0x90) and then possibly continue a game has crashed with an unsupported opcode. This however needs a lot of knowledge about the x86 assembly language, and is not "supported" by me, so do it at your own risk! I'm just mentioning that it is possible. :-)

Since last weekend I have been searching and downloading a lot of games that have been marked as not working (or partially working) on the compatibility wiki, and I have been testing many of them, mainly looking for easy-to-fix problems. I have left the more difficult problems for the coming weeks, and have focused on games that seem to need only minor fixes to run properly. So far I have been working on the following games, which currently seem to work:

I haven't actually played any of these any further, I've moved on to the next game when I have gotten to actual game to start. I'll leave the actual play testing to you who have reported the games on the compatibility wiki in the first place. Below are screen copies of Heimdall and Hocus Pocus, mainly because I thought they are two very pretty-looking games. :-)

 

This morning I tried to fix the jerky horizontal "smooth" scrolling in various games (Crystal Caves, Heimdall, Supaplex, etc), but I could not find the actual problem yet. I have studied the interaction of the VGA start address setting, horizontal panning register setting, and VSync/VBlank intervals in DOSBox, and have experimented with various timings in DSx86, but no change has so far solved the problem. I need to leave this fix for the next version, as it looks like I need to debug it more thoroughly than I had anticipated. It is interesting that the only game I know that does not suffer from this problem is Commander Keen 4, and it does not sync the scrolling to the screen vertical retrace signal at all!

Anyways, hopefully this new version again runs a few more games than the previous one, please send debug logs again and update the compatibility wiki for the games that have started working (or behave differently). Thanks again for your interest in DSx86!

May 30th, 2010 - Bug fixing

Firstly, in case you haven't noticed this yet, DSx86 now has a Compatibility WIKI page where each of you can mark games that work or don't work on the current DSx86 version. Special thanks to Master_Thief for setting up that wiki page and for testing the majority of games on it! With the wiki in place there is no need for you to report working games to me any more. :-) Please send me bug reports about games that don't work, though, especially with a debug log if applicable, so that I can attempt to fix the problems before releasing the next version.

The past week was mainly spent fixing various bugs in the emulation code. I also added several missing (mainly graphics) opcodes that have been mentioned in the debug logs I have received, thanks again for those! The biggest (and scariest) change was that I rewrote the IRQ handling (again!) to fix a problem that was revealed by a game called Galactix.

The problem in Galactix was that it usually only played the first note of the game music, and then later it would hang at a random time during the intro. I had noticed that when it did hang, it was always running a tight loop that checked whether a certain memory location had a zero. It obviously expected this memory location to change value (based on a timer, most likely), but when it had hung the memory location value never changed. After some debugging and tracing the memory access I got a pretty clear understanding of the timer handling behaviour of Galactix, and after some thought I suddenly got an idea (or rather a theory) about what might be wrong in DSx86.

Galactix uses a "Task Service Manager" (that is part of an error string very close to the memory location it checked, so I believe that is the name of the utility code that handles the timer interrupt in the game) which works by incrementing a variable every time a timer interrupt occurs, and then calling various functions in a function pointer table, each of which have their own incrementing variables, with adders that are not always one. For example, one of the functions might only get called every tenth timer interrupt. Now, the main timer interrupt increments the overall timer interrupt counter at the very start of the timer interrupt, and then each of these functions test if their counters are equal to the global counter value, and if they are, this functions get called and gets it's counter incremented by the adder value, to be ready for the next call etc.

Every time I use something similar in my own code, I have a habit of checking whether the counter value is equal or larger than the value I am waiting for, just to be sure I don't miss a timer increment. This then brings us to my theory, I thought that perhaps some of those functions take such a long time to run that a new timer interrupt happens and starts again running the timer code, first incrementing the global counter. In this situation it would be possible for the functions to miss their next timer tick, and stay waiting for it for the next (almost) 65536 timer ticks.

I decided to implement a new feature to the DSx86 debugger with which you can change the byte in the memory using a syntax E AAAA:BBBB CC where AAAA:BBBB is the segment and offset of the byte and CC is the new byte value. I used this new debugger feature to change the JE opcode to JAE opcode in Galactix, and indeed, after this change the music played fine and the intro did not crash. (Well, not immediately, it crashed after a couple of loops, but that was not unexpected as the IRQ handlers are not meant to be re-entrant.) This test did prove my theory that I did not handle the "End-Of-Interrupt" command properly, but instead allow new interrupts to re-enter the interrupt routine.

So, I spent some time implementing proper IRQ handling to avoid this situation in the DSx86 IRQ code, and I also (finally) moved all the interrupt-related stuff to a new source code file. Now it is not scattered around the emulator code but in a single place, so if I still need to make changes to it, it will be easier (and not as scary!). After this rewritten IRQ handling Galactix looks to be running fine.

I also fixed the graphics problems in Catacomb Abyss. I had not implemented the split screen handling in the EGA modes, only in the Mode-X graphics modes, and Catacomb Abyss wanted to use a split screen for the low part of the screen. I also found a problem in the EGA Write Mode 2 handling with 16-bit opcodes. After these fixes Catacomb Abyss began to look correct. Also, the LOADFIX command I added into the previous version only worked with EXE programs (silly me), which is why it did not help with Ultima 4, for example. I implemented it also for COM programs and now Ultima 4 goes to the actual game properly.

 

Finally, I managed to fix the problems preventing the Sysinfo program from Norton Utilities 8.0 to start. I had used Sysinfo 5.0 in my own tests, which did not need some of the additional DOS functions that Sysinfo 8.0 needed. In the next version Sysinfo 8.0 will also run. The picture below is actually here so you can see why I don't plan to support 386/486 opcodes with DSx86. :-) There is not much point when the games will run that slowly. Well, I might support some of the 386 opcodes that can "accidentally" be left in the 286 programs, like long conditional jumps, but I won't add support for 32-bit registers, for example.

I have recently been using the iDeaS emulator to test DSx86 on my PC, with the R4 DLDI patch so I can actually run the DOS games in it. The emulator is pretty slow, but I have now learned to use it's debugger so it has helped me in debugging the problems in DSx86. It does have some bugs, though, which I have reported to the author. Hopefully the next version will run better.

Finally, in case you are interested (and even if you aren't :-), here is a picture of the access statistics for my DSx86 pages. Quite a steady increase of visits, and as I have a 15GB monthly bandwidth limit by my web provider, I soon need to start keeping an eye on the bandwidth, and then perhaps start zipping the files hosted here if the limit starts approaching. Looking at the logs is of course one motivation boost to keep working on DSx86, it is nice to see how more and more people find DSx86 interesting. :-)

May 23rd, 2010 - Version 0.13 released!

This version actually has the longest list of changes in any version yet, but the great majority of the changes are new graphics opcodes and support for new INT calls and port I/O (most of which are actually just silently ignored, as in DOSBox). However, there are also some bigger changes.

I still did not manage to fix the problems causing some games to execute data instead of code. Debugging and finding these problems seems to be more difficult by every release, as I get the easy problems fixed. I will continue looking into these in the future version.

I also did not have time to improve the screen scaling features, I'll see if I can add those to the next version. Please send me again the debug logs, those have been very helpful in my improving the compatibility of DSx86!

May 16th, 2010 - New Commands & Bug Fixing

The past week I have been focusing on fixing bugs that make some games behave erratically. I have also added many missing opcodes based on the debug logs you have been sending, thanks for those! I have downloaded various games that seem to have some more severe problems, and have been debugging and testing those. Oh, and I also noticed that my version of Windows 2.03 had been hacked to run on DOS versions above 4.00, which is why it runs fine here, but if you use the original version it won't run in your DSx86. Sorry about that.

The next version of DSx86 will have several new built-in commands:

So, Windows 2.03 should run properly in the next version if you give "VER 4" command before "WIN". Or, if you want to make Windows 2.03 run in the current DSx86 0.12, you can also hack it yourself. This is what the DOS FC /B command reports for my hacked version:
Comparing files WIN200.BIN and ..\WIN2\WIN2\WIN200.BIN
0000A926: EB 76
That is, if you replace the byte 0x76 at offset 0xA926 with byte 0xEB, Windows 2.03 will run on DOS versions above 4.00.

Last Thursday was a holiday, so I spent it working on DSx86. I had gotten Microsoft Flight Simulator 3 mostly working, but it had a problem where it never cleared the previous position of the instrument panel needles, so that the instruments got full of the needle images after a little while. I debugged and debugged this problem, compared the drawing between DSx86 and DOSBox for hours! At some point it looked like the EGA Read Mode 1 was not working properly, but I had just fixed this in the previous version, so I ignored that and looked for the problem elsewhere. After almost 6 hours(!) of debugging, I finally got to a point where the problem could not sensibly be anywhere else than in the EGA Read Mode 1 handling. And when I finally figured it out, it was one of those hit-your-head-with-your-palm moments: When I did the Read Mode 1 handling fix, I had fixed the 8-bit version, but forgot that I had a separate 16-bit code version, which also had the same problem! So, if you are a programmer, don't copy the same code for two different routines like I did, but make it a separate function, or at least a macro!

Well, after this fix FS3 seems to be working fine now. It is one of the nostalgic games for me, as way back when I had an Amstrad PC1640 (which was not quite fast enough to run it) my father had a laptop with a 286 processor, so I used to play FS3 on his machine whenever he did not need his PC for something more useful. :-)

This weekend I have continued studying games that behave erratically. I know I have sort of promised to improve the mouse features and possibly implement better scaling functions for the next version, but I have been somewhat distracted by these misbehaving games. I find it more interesting finding problems in my core emulation code than adding or improving features that already mostly work. I'll try to focus on these mouse and scaling issues during the next weekend before I release the next version.

Couple of other games that had mostly graphics problems are Star Control 2 (which had a wrong palette) and God of Thunder which displayed a garbled title screen and had graphics issues in the game itself (after I had added the missing graphics opcodes). The palette problem in SC2 was caused by it using the VGA Palette Read Index register to setup the VGA Palette Write Index! I only noticed from the DOSBox sources that the read index actually affects the write index as well. So, now the palette is OK in SC2 intro.

God of Thunder in turn used several different Mode-X modes, for example the title image uses mode 320x400, for which I hadn't coded support for the VGA Offset register (as the game actually sets the screen up as 328x400 pixels). The main game screen uses 320x240 mode, where I had a bug in the VGA Line Compare Register handling so that the split screen handling (the lower part with the health etc) did not work properly but showed a copy of the top of the actual game area. These both will be fixed in the next version. To keep the aspect ratio somewhat sensible I actually prescale the 320x400 screen down to 320x200 when displaying it (which is why the title screen image below is smaller than the game image).

 

I am currently looking into problems in Abandoned Places and Jumpman 2, which both attempt to execute data instead of code. I haven't yet figured out the problem, but as this type of a problem is pretty severe and can affect various games, I really want to get this fixed. After those fixes I'll probably start working on the mouse support and scaling stuff.

May 9th, 2010 - Version 0.12 released!

This version has a long list of changes, but they are mostly minor fixes and improvements. Here is the list:

RyouArashi has created a small DSx86 configuration program that runs in DOS, so you can run it inside DSx86! Makes it easy to configure the settings for various games when you can do everything on your Nintendo DS. The link to the Google Code page for this software is on my Download page.

I have spent a lot of time debugging some more difficult problems, especially on Saturday, but was not able to fix most of them. It seems that DSx86 is currently not compatible with R4 Slot-1 DS cards, for some reason. I tried to use iDeaS emulator with DSx86 patched with the R4 DLDI driver, and I get the same problems as some R4 users have reported. The most frustrating thing is that I can not get neither the iDeaS or the DSx86 inbuilt debugger to work properly, so I can not debug the problem! So, looks like this version of DSx86 will still not work properly with R4 devices, sorry.

I also debugged various other games that have been reported misbehaving, like Jimmy Whites Whirlwind Snooker and Moonstone. The first one has a similar problem to the one that causes the "Packed file corrupt", message, that is, the segment register wraps downward to 0xFDE0 or something like that, and it will then start writing data to invalid locations in memory, corrupting some DSx86 internal data areas as well. This would probably need a proper LOADFIX implementation to work. I haven't yet figured out the problem in Moonstone where it hangs the touchscreen. Usually such hanging is caused by ARM7 crashing, but it still plays audio fine after this, so I'm not sure what happens there. I have narrowed down the situation where this happens, so a couple of hours worth of debugging should tell me where the problem is.

There are many issues with the mouse emulation, especially with the touchpad mouse emulation, but those I have not looked into at all yet. I hope to make a lot of improvements to the mouse emulation in the next version. Also, Windows 2.03 wants to use PS/2 mouse, which I don't support yet, so don't be surprised that mouse does not work if you try running Windows 2.03 on this version.

Thanks to all of you who sent me debug logs, this time I think I actually managed to implement fixes from nearly all of the logs I have received. There are still issues needing fixing from the earlier logs I have recieved, but I'm starting to catch up so that my TODO list does not keep growing larger and larger by each release!

May 2nd, 2010 - Bug fixes and minor improvements

Last week I decided to finally look into the INT03 anti-debugger technique used by games like Castle Master. It took a lot of debugging and comparing the emulation progress opcode by opcode between DOSBox and DSx86, but I finally figured out the problem in DSx86. The major issue was that I did not support the Trap Flag properly, and after I added that (which was not all that simple), there still remained some issues with the possible hardware IRQ happening in the middle of a trap interrupt etc, but I finally managed to implement all the features that anti-debugger trick needs to work. Thus, Castle Master should be running in the next version, same as many other games that have so far given "Unsupported Opcode" errors where the opcode is "INT 03".

This weekend was spent fixing various bugs and adding missing graphics opcodes and some INT functions. Nothing major, but a lot of minor fixes. I did start work on supporting VGA 640x480x16 mode, though, and I remembered that my earlier tests with Windows 2.03 halted when it wanted to use that graphics mode. So, I coded just the essentials for that mode and used the 640x200 -mode screen refresh routine (so only the top part of the 640x480 screen will be visible) and then tested how far I can get with Windows 2.03. Somewhat to my surprise, after adding a couple of new EGA graphics opcodes I got Windows to load!

It was interesting that Windows did not actually need any more files than what you see in the picture above (the TEST.COM is my own small tester program that I use with the actual test programs), so I could test it in No$GBA with the bundled version of DSx86. I doubt Windows will run in any usable way yet in the next version, but it should start at least. Certainly it sounds funny to state that "I can run Windows on my Nintendo DS"! :-)

After I got Windows to start I decided to study the SB digital audio problems in Master of Orion. I got the audio to play correct data instead of noise (I had assumed a certain order of the DMA commands that start the SB digital audio DMA transfer, and MOO used a different order, so the start address got corrupted) and I also spent quite a bit of time getting the Port I/O problem fixed. The game actually reads the DMA ports to determine how far the current DMA transfer has progressed, so after some thought I decided that it might be best if I use the FIFO to send that information from ARM7 (which actually performs the the SB DMA audio transfer emulation) back to ARM9 at some suitable intervals. That was a bit difficult to get working reliably, but now it seems to work. There is still a strange problem where MOO stops the animation in DSx86 waiting for the complete audio to finish, while in DOSBox it starts new digital audio playing as soon as the animation reaches the suitable phase. I suspect it has something to do with the DMA progress check, but that works fine now as far as I can tell. This I still need to debug further.

I have also received many new debug logs, thanks for those! Many of the issues will be fixed in the next version, but some will still remain for future versions.

Apr 25th, 2010 - Version 0.11 Beta released!

This version includes the major changes mentioned in the previous blog post, and also various other improvements and fixes. Here is a list of the changes:

There are still a lot of issue in my TODO list which I did not have time to fix for this version, so I will continue working on these. Please send me again log files for this version for games that have issues!

Apr 18th, 2010 - Master of Orion, more EMS, and Touchpad Mouse

The next version 0.11 of DSx86 will run Master of Orion! Last week I worked on Mode-X and EMS handling improvements, and I got MOO to start the actual game. I haven't played it much yet, but it seems to run fine.

 

After I got MOO to run, i noticed that the current D-Pad mouse emulation is rather awkward. It looks like MOO is very mouse-oriented and has clickable buttons on the screen for all the commands and other options, so on Saturday morning I decided to see how much work it would be to swap the screens and use the touchscreen for the mouse handling. MOO is such a big game that I could not easily test it in No$GBA (by including all it's files into the DSx86 project), so I searched for other mouse-specific games in my DOS test game directory. I found Electranoid, which I had tried to run some versions ago, but it needed mouse emulation and more EMS memory than I had made available back then. However, now I got it to run, as I have increased the EMS memory shown to DOS.

I had used a static block of 512KB for EMS emulation until now, and I noticed that I have about 1.5 megabytes of heap space available even after that. So, I decided to remove the static EMS block, and instead use 1.5MB of the heap for my EMS memory, which still leaves about 512KB of heap free for system and libFAT stuff. I don't want to use all of the heap space just in case those functions allocate memory for their internal buffers or stuff like that. Anyways, the total EMS memory will be 1.5MB on DSx86 version 0.11, of which about 1.3MB is free after 4DOS has taken it's swap space.

With that much EMS memory I could still fit Electranoid into the DSx86.nds memory space and was able to test my new touchpad mouse functions with No$GBA. I got the new mouse emulation to work pretty well after just a few hours of work. This made Master of Orion much more playable, as you can just click on a UI button without needing to move the mouse cursor first.

I made the SELECT button behave as a toggle between the keyboard and mouse usage of the bottom screen, so you can swap the screens easily during gameplay, whenever you need to type your name or stuff like that into the game. If you have mapped the SELECT key to a PC key, the SELECT key will emulate the mapped PC key when in keyboard mode, but will swap the screens back when in touchpad mouse mode. The current D-Pad mouse emulation mode is still available, for those games that might work better with that kind of mouse handling.

Today (Sunday) I have been refactoring the internal DOS file functions of DSx86 (a scary task!) to use a proper System File Table. This is needed for file handle duplication, among other things, which in turn seems to be needed by 4DOS when running more complicated BAT files. I just got the new functions to work sufficiently so that the bundled 4DOS starts when running in No$GBA, but I haven't yet dared to test it on my real hardware, using libFAT. I'll continue this work during the next week, as such an extensive change requires quite a bit of testing. I have also received quite a few debug logs from version 0.10, thanks for those! Fixing the issues in those will keep me busy for the next week and weekend.

Apr 11th, 2010 - Version 0.10 Beta released!

This is not version 0.08 alpha, but version 0.10 beta! I decided to increase the version number and move DSx86 forward from the Alpha state with this version, as it has quite a lot of changes and new features. This also means it probably has quite a lot of new bugs as well, so everything might not work quite right yet. But, this version does run Wolfenstein 3D, at least! :-)

Here is the almost complete list of changes I made into this version (direct from my updated TODO list). If you recognize the game name (in parentheses), feel free to test the game again in this version, you should get further in the game now!

The new Mode-X support requires it's own complete set of graphics opcodes, pretty much similarly to the EGA handling. I have only added very few of those opcodes, so it is extremely likely that any other games besides those I have tested that use Mode-X will crash with an unsupported opcode error. Please just send me the log file for all such games, and I'll add those opcodes to the next version! Btw, if a game displayed several small screens in the previous DSx86 version, it most likely used Mode-X, so please test it again in this version.

The past couple of days I have been working on improving the EMS memory support, as it is currently the only major obstacle in making Master of Orion (which uses Mode-X) run in DSx86. I actually tested MOO by giving it the full megabyte of EMS in DSx86 and got it to display all of the intro and go into the initial menu, but the graphics were pretty badly garbled as it kept receiving wrong data from my broken EMS handling. Sadly, so far I haven't been able to fix my EMS implementation and am having a hard time figuring out what exactly goes wrong there, so this version still has the limited amount of EMS enabled. Also, some games (like MOO and Wolfenstein 3D) want to check the availability of EMS using the EMMXXXX0 device, which is not yet supported in DSx86. That's why Wolfenstein 3D does not see the EMS memory (in case you were wondering :-).

I still have quite a lot of issues remaining in my TODO list, and I'll keep on working on the EMS memory problem, but feel free to send me your log files and information about other isues in this version again! Also, let me know if you find the above style of my reporting fixed issues by game useful.

Apr 4th, 2010 - Mouse work

The last couple of days I have been working on the mouse support. I decided to try emulating the mouse cursor using a sprite, and as I had not done anything with sprites on a Nintendo DS before this, it took me a while to learn the stuff needed to work with the sprite engine. I got the mouse cursor sprite to work, but I am not quite sure yet whether using a sprite is the best method to do this, as I am using a scaled background and I would need to scale the mouse sprite as well. Also, it looks like most games use their own software mouse cursor, so my spending a lot of time on the sprite cursor might go somewhat to waste. But, at least I learned how to do stuff with the Nintendo DS sprites. :-)

Most of the mouse operations are currently supported, both in graphics modes (using the sprite) and in text modes (where I draw the mouse cursor similarly to how a real mouse driver does it, by changing the character cell background color). Both of these still need a little bit of work, especially for the scaled screen modes, but in principle the mouse handling works, so it will be included in version 0.08.

After I got the mouse to work somewhat, I moved on to the other big issue I want to work on during my Easter vacation, which is adding support for higher-resolution graphics modes. I started with the CGA 640x200 monochrome mode, as it is the easiest and I can use it to develop the new screen zoom and scale methods that I need. My current plan is to continue using the current 512x256 pixel background even in the high resolution modes, as all of these modes use screen blitting from the virtual screen memory to the physical VRAM. In the zoomed mode I can simply copy a 256x192 pixel window from the virtual screen memory, and when the screen is scaled I think I'll interpolate from the input 640 pixels in a row to 320 output pixels in the DS VRAM, and then scale this to the 256-pixel wide LCD screen. I could perhaps also setup a 1024x512 pixel background and then copy the whole 640x480 pixels (in the highest-resolution VGA mode) and then use the hardware scaling to scale this to 256x192, but I don't think that will result in a very clear display and might also be slower.

Anyways, I'm on vacation for the whole next week, so I can experiment with different methods to make the high-res modes supported. I am also testing Windows 2.03 (as it uses the 640x200 CGA mode in the logo page), but I doubt I'll get Windows actually running in the next version yet. It uses very low-level DOS and EMS-memory calls, so I need to improve those quite a bit before Windows will run.

Oh, when I debugged Windows I noticed I had a pretty severe bug in the mov reg16,[BP+SI] and mov reg16,[BP+DI] opcodes, they used BX register instead of BP for the memory access! Strange that my tester program had not detected this, but this may have caused all sorts of weird behaviour in various games. This will be fixed in 0.08 version.

Mar 28th, 2010 - Version 0.07 released!

This version has the following major changes/improvements:

I added all the DOS functions that Norton Sysinfo needs, so I can now run it on the real hardware and check the CPU emulation speed. I also added (partially faked) disk parameter block handling, which reads the data from the SD card partition and presents it like it was a hard disk (I have a 2GB SD card in my DS Lite). It looks to me like Sysinfo reads the media descriptor byte at the wrong offset, it should be 0xF8 and I have put 0xF8 to the offset I believe is correct, yet Sysinfo displays it as 0xFF.

 

The CPU speed bar changes between 10.6, 10.7 and 10.8 depending on whether the screen update mode is 60 FPS, 30 FPS or 15 FPS. Back in November last year the CPU speed showed 11.3 when running on real hardware, so my CPU core has gotten a little bit slower while I have been adding features to it. I'll probably look into optimizing it further after I have added the most important missing features.

Plans for the next two weeks (which include my Easter vacation) are to add mouse support, and then I would like to implement more graphics modes. Probably the VGA ModeX would be the most useful, although the higher-resolution EGA/VGA modes might be somewhat simpler to implement, as I can use the same opcodes, only the screen blitting function needs to be changed.

Mar 21st, 2010 - DOS improvements & bug fixes

This weekend I worked on the various unsupported DOS interrupts that I have noticed on the debug logs I have been receiving. So far I have added about half of the dozen or so unsupported DOS features on my TODO list. During the past week I debugged some games that behaved strangely in DSx86, and found and fixed several bugs in the code while doing this:

I haven't started on the mouse emulation yet, and I doubt I'll add that for the next version. My first priority is to get more games running properly and without weird crashes. Perhaps I'll work on the mouse emulation on my Easter vacation. For now, I'll continue adding the missing DOS features and fixing other problems in the debug logs I have received.

Mar 14th, 2010 - Version 0.06 released!

This version contains some user interface changes in addition to various internal changes:

I planned to have mouse support in this version, but it turned out to be a much bigger issue than I had thought. I can't add a partial support, as that might make games that currently run crash into debugger with an unsupported mouse INT 33 function. So I'll need to code this properly, and I think that would take a couple of weekends. So, perhaps in the next version, but no promises.

I also tested a couple of new games, Swap as was mentioned in the previous blog post, Simcity demo, and WORLD CLASS LEADER BOARD GOLF by Access Software. Simcity still has a "division by zero" problem, which does not happen in DOSBox, so that still needs some work. The golf game seems to work fine, though. It uses "REALSOUND" speaker sounds, which I believe means digitized sounds, and those are not supported properly in DSx86 yet. I might add support for those if I find a simple way to do that, but currently it just plays static.

There have been quite a few unsupported INT call problems in the debug logs that I have received, however for these I have not done anything in this version. I looked at the types of INT calls they were about, and noticed that a great majority of them are using various DOS features that I haven't supported yet in DSx86. These will be my focus for the next version, along with the mouse support.

Mar 7th, 2010 - EGA refactoring

First off, I just yesterday noticed that I had made a mistake when I prepared version 0.05 for release: I had forgotten to disable the conditionally included Memory Watch code from the CPU emulator! I had used the Memory Watch code to break into the debugger if an opcode writes to the BIOS F000:0000 address (which my buggy EGA code did at some point), but then forgot to turn the feature back off after I fixed the EGA code. This Memory Watch code causes a noticeable slowdown to the whole emulation, as it checks a memory address after every single opcode is executed! So, if you are wondering why 0.05 feels slower than 0.04, this is the cause. Sorry about that, I'll try to be more careful in the future. Oh, and another thing, it seems Commander Keen 4 has an unsupported EGA opcode in the actual game. I only tested the demo, and it does not have this issue, so I did not notice it. This should be fixed in the next release, though.

During the past week I did not have time to work on DSx86 at all, as we got about 30cm more snow here, and all my free time went to shoveling snow from my yard! I only got to working on DSx86 this Friday, and immediately began a complete rewrite of the EGA graphics code. I decided to code proper EGA read and write subroutines, which handle all the various read/write modes, functions, and Set/Reset features of the EGA hardware. The separate EGA opcodes then call these subroutines as needed. This code will run somewhat slower than the previous EGA code that had everything inlined to each of the opcodes, but now the subroutines are in ITCM so perhaps the difference is not all that great. And adding all the features to all the separate opcodes would have made the code quite huge. This improved EGA implementation now finally got rid of the graphics glitches in Duke Nukem 2.

I used DOSBox sources as a reference, and noticed that DOSBox actually has a buggy implementation of Write Mode 3: It should rotate the host byte before all the other operations, but DOSBox does not have the rotate code in the Write Mode 3 handler. It is possible that no game ever uses Write Mode 3, so this bug might never actually manifest itself. In fact, I decided to copy the bug into DSx86, as that saves a reasonable amount of code. :-)

I have now added pretty much all of the unsupported EGA opcodes from the debug logs I have received for versions 0.04 and 0.05. I also debugged the game Swap, in which the cursor left a trail of corrupt graphics when it was moved. I found out that the game used opcode mov al,[si] to read the original data below the cursor from the EGA memory. This was a problem as I have only supported EGA memory access with the ES segment (or with the DS segment in the string opcodes). I coded a special case to the mov reg8,r/m8 opcode for when the DS segment points to the graphics memory, and that fixed the problem in Swap. However, I'll need to rethink my whole approach to graphics memory handling in the future, as there will surely be many more cases where the DS segment is used with various opcodes to access the graphics memory.

I also downloaded Street Fighter II and SimCity demo, which both have graphics issues, and will continue debugging them to see what causes these problems.

Currently I am working on the INT 03 problem that happens in games like Castle Master. Simply supporting that opcode did not fix the issue in Castle Master, so I am investigating this further. It looks to me like the INT 03 (which is the debugger breakpoint interrupt) is used in the game to detect whether it is running inside a debugger, which in turn is most likely some sort of anti-hacking feature in the game.

I'll continue working on these issues for the next week, and hopefully get 0.06 ready for release then next weekend. Thanks again to all of you who keep testing DSx86 and sending me the debug logs!

Feb 28th, 2010 - Version 0.05 released!

Finally I got version 0.05 released! The biggest changes since 0.04 are the improved EGA support (so that Commander Keen 4 runs), and I also added the blitted screen update mode for CGA graphics mode. In earlier version only the Direct mode was supported, now you can select whether you want to use blitted (60 FPS, 30 FPS or 15 FPS) or direct screen update.

Speaking of screen update modes, I don't think I have properly described what the differences between those direct and blitted modes are, and how they relate to the different graphics modes that DSx86 supports, so I think it is time I do that. First, the difference:

Next, let's look at the different graphics modes supported by DSx86, and how they use those screen update modes:

In the future I will continue working on the EGA mode, and at some point I'll expand this to the VGA Mode-X style graphics modes (as used by Wolfenstein 3D and Master of Orion, for example). Those modes also need a combination of direct and blitted mode, as they use the VGA registers and the full 256KB of VGA VRAM. They are faster to blit to the screen though, as they use the same 8-bit pixels as the Nintendo DS VRAM, so no conversion is needed. I'm also thinking of switching to a "dirty buffer" approach with the text mode support, so that I could drop the direct mode support and switch to blitted mode, without killing all the performance.

Anyways, hope you like the new version, and sorry I haven't had time to go through all the debug logs you have sent me. I'll continue working on those reports for the 0.06 version, and please keep on sending the log files from 0.05!

Feb 21st, 2010 - Commander Keen 4

Okay, during last week I got my PC issues solved, so that by Thursday I was back in business with coding DSx86. I had managed to get Commander Keen 4 to start, but the display was completely garbled as soon as it went into EGA mode. This weekend I concentrated my coding efforts to improving the EGA support enough to make CK4 running properly.

The first big issue was the initial two-way horizontally scrolling huge "COMMANDER KEEN" text at the very beginning of the game intro. I debugged the EGA register access the game uses, and noticed that it changes the EGA Offset register (which determines the number of bytes per screen row, with a normal value of 0x14) to 0x7C, which corresponds to 1984 pixels wide display! No wonder my fixed 320x200 display blitting code could not handle that. After I fixed the blitting code to properly handle that register, I got the scrolling text to display, but there were quite a bit of additional problems remaining. The vertically scrolling "An ID Software Production" text jumped around and left extra pixels around the screen. When starting the game, every second frame was completely garbled. The whole system hang when the StarWars-style scrolling text should have appeared, etc.

It took me pretty much the whole weekend to get most of these problems fixed. CK4 used a lot of various EGA tricks that I hadn't coded at all yet, for example:

All these are standard EGA/VGA features which I would have needed to add at some point in any case, so CK4 was a pretty good testbench for these somewhat advanced EGA features.

 

However, when I tested the version of DSx86 that runs CK4 fine, I noticed that none of the EGA games that ran on 0.04 version run any more! This is why I can not release this version yet, I'll need to check which of my changes broke some earlier working code.

It'll probably take me all of next week to make sure the new EGA code works properly, and I also need to fix more of the issues in the log files I have received (thanks again for those!). I hope to have version 0.05 available during the next weekend, unless something unexpected happens (again).

Feb 14th, 2010 - New PC

I received the parts I ordered for the new PC last Friday, so this weekend was spent on building it. Sadly it is still not quite finished, and thus I did not have any time to work on DSx86. I did not have any big problems hardware-wise when building the new PC, but various software and driver issues turned up which required me to change my original plans many times over.

I'm hoping to get the PC issues fixed during the next week, so that I could get back to working on DSx86 next weekend. There are some larger bugs in my EGA routines which turned up when I tested Commander Keen 4, so those I'll then continue working on when I get back to DSx86 coding.

Feb 7th, 2010 - INT and CGA fixes

Since my main development PC is somewhat broken, I installed the latest devkitARM on my laptop. I have been using the same old devkitARM version for DSx86 development since I started the project (as I hadn't dared to upgrade in case DSx86 stops working), but I had only a couple of very minor issues when compiling DSx86 with the latest development kit (r27). Also, it seems that the disk access (SD card access, actually) using libFAT is much faster in this version than in the old version I had been using, so the 0.05 version of DSx86 should load games faster than before.

My laptop is quite old and not very ergonomical to work on, so I didn't spend as much time coding DSx86 as I usually do on weekends. I concentrated my efforts on finding a couple of problems in the old Sopwith 1 CGA game. I had noticed that it does not go to graphics mode at all on DSx86, so I debugged it in DOSBox in trying to find out what method it uses to go to graphics mode. Turns out it uses IRET opcode to call the BIOS INT 10h address! I hadn't coded any support for such a weird calling method, I only checked the INT and JMP opcodes to determine if the address is in the reserved area in the BIOS where all the emulated interrupt handlers of DSx86 are.

This problem made me realize that my current method of trapping the interrupt addresses is not very robust, so I changed the method completely, now I use the undocumented opcode 0xD6 (SETALC) as a flag to mark a software interrupt entry. If the CS:IP is between F000:0000 and F000:00FF when this opcode 0xD6 is handled, it means an entry into an emulated software interrupt, else the opcode behaves like SETALC opcode normally does. This method is quite similar to how DOSBox handles the software interrupts, except that DOSBox uses the unassigned modrm bytes of opcode 0xFE to flag software interrupt entries. Having a single-byte opcode as a flag suited the DSx86 architecture better.

After that fix the game went into CGA graphics mode fine, but there was still a problem with a REP MOVSB command it used to draw the land. I noticed it used segment address B900 instead of the CGA graphics memory start address B800 to move the data, and this made me realize that I had a problem in the way I handle the graphics memory addressing. I assumed (stupidly) that the segment register always points to the start of the graphics memory and the actual location is determined solely by the index register (SI, DI or BX). Of course this does not need to be the case, so I had to fix all my CGA memory addressing so that it is based on the correct physical "segment*16+offset" address. I believe I still have this same problem in the other graphics modes besides CGA, so this I need to fix before releasing version 0.05. Anyways, after those fixes Sopwith 1 runs on DSx86, however the game does not seem to have any adjustment for the speed of the PC, so it runs really fast and is pretty much unplayable, same as in DOSBox. I don't think that is much of a problem, though, as there is a Nintendo DS version of the game.

I also went through a couple of the log files I have received from 0.04 (thanks for those!) and fixed the easy issues. I noticed a couple of unsupported graphics opcodes in the logs where I could not determine which graphics mode this happened in, so I need to add a current graphics mode indicator to the log information for the next version. Most of the issues in the latest log files are not fixed yet, though, so this work I'll continue during the next week, while waiting for the new parts to arrive for my development PC.

Feb 3rd, 2010 - Unscheduled version 0.04 release

Last weekend, actually about an hour after I wrote the blog post below, my development PC started acting up. All the programs crashed one after another, until I got a blue screen. When I tried to reboot, it wouldn't start at all. I removed the extra SSD RAID pair I had in it, and then it booted, but again got a blue screen after a little while.

I ordered parts for a new PC (as this one is about 5 years old already) as these symptoms have not improved since. This is also why I decided to release the 0.04 version today ahead of schedule, in case this PC stops working completely.

The new version has the changes mentioned on the previous blog post, but they are not very well tested due to my PC problems. I hope this version works better than 0.03, though.

Jan 31st, 2010 - Maintenance work

This weekend I went thru most of the log files I have received since releasing the 0.03 version, and added or fixed a lot of the issues in the log files. Thanks to all of you who have been sending me the log files, they are very useful in my attempt to improve the compatibility of DSx86!

I also worked on support for two games, Duke Nukem 1 and the original CGA Elite. I've had requests to support both of those, so I decided to work on those two specifically. Both are currently running, DN1 was only missing a couple of minor things, while Elite needed a bit more work. I haven't supported the DOS "buffered input" call (which Elite needs for it's copy protection system) at all yet, and adding it was rather difficult. It needs an active keyboard interrupt to feed it the keypresses, and the interrupt architecture in DSx86 is not re-entrant, so I could not use the existing DOS interrupt handler coded in ARM C language. I actually had to code parts of the DOS interrupt handler in x86 ASM to handle the buffered keyboard input. It felt a bit silly to code stuff in x86 ASM for an x86 emulator, but that was the easiest way to get it to work. The routine is still missing some standard features (like INS key handling), but it is working well enough that I was able to type the copy protection words from the manual and go into the actual game.

  

I was thinking of releasing the 0.04 version today, but then decided against it as I haven't had time to test the new changes properly, and I also haven't had time to go thru every log file I have received. I'll use the next week to make sure I didn't break anything with the new changes, and possibly test a few more new games, and then release the new version next weekend.

Jan 24th, 2010 - Version 0.03 released!

For the last week I have been working on the EGA support and also on the few remaining unsupported modrm bytes. For the EGA support testing I have been using Duke Nukem 2, which is currently quite playable. My EGA support is still far from complete, and thus the game has still some visual glitches, but they don't seem to affect the gameplay.

For some reason Duke Nukem 2 seems to take FOREVER to load, and it has a tendency to crash at the end of the inital intro (I haven't yet figured out why this happens), so you might want to hit Enter at the "NEO LA: THE FUTURE" screen to skip the intro and go to the main menu immediately. Also the music lags pretty heavily at the intro, and it feels quite sluggish overall, but luckily the actual gameplay seems to run at a reasonable speed.

I spent a long time debugging a problem with the EGA palette, until I realized that DN2 uses the VGA palette registers also in the 16-color EGA mode. I hadn't realized the VGA graphics work like that, so I kept debugging the wrong places in the code. I also had to add two new SoundBlaster DSP commands, for playing 2-bit ADPCM samples and for playing silence. A bit strange that there is actually an SB command for playing silence for a certain amout of time, but I guess it can be used for some audio synchronization stuff. Anyways, DN2 is now working well enough that I thought it was time to release version 0.03.

I have also added nearly all of the 8086 opcodes and their modrm bytes, there are only a few remaining. Many of the 80286 opcodes/modrm bytes are still missing, though, and you might also get "Unsupported opcode" errors if a game uses the ES register to point to the graphics screen, as I have only coded support for those modrm bytes that I have encountered in the games I have tested. If the "Unsupported opcode" error has an opcode with "es:" in it, then this is the cause.

The next version should have practically all the normal opcodes/modrm bytes supported, and I hope to improve the graphics support as well. The current 16-color routines should work with only minor changes on all the 16-color modes, including 640x350 and 640x480 modes, so adding support to those modes for the next version might also be possible. Whether games that use those modes will be playable within a 256x192 window is another matter, though.

Jan 17th, 2010 - LOADFIX, EGA and opcode work

Packed File Corrupt handling

Today I got fed up with the Packed File Corrupt problem, and decided to see if I could handle the required "LOADFIX" functionality automatically within DSx86. I looked at a couple of games that cause this behaviour, and used a hex editor and debugger to see if I can find a pattern in their header that would detect the use of the buggy /EXEPACK linker switch that cause this problem. I found the following things in common with the EXE headers of the problem games:

Practically all EXE packers have zero RelocationItems, so that alone is not a sufficient indicator of a buggy EXEPACK code, but I didn't find all of the above header settings in any packed EXE file that don't have the "Packed File Corrupt" problem.

I made a small code change into DSx86 where it detects the above EXE header signature, and allocates 64KB of RAM before allocating the memory for the program and running it. I also added new code to the FreeProcessMemory() code so that when the process exits it checks whether an extra block of memory was allocated and frees that as well when freeing the actual memory of the process. The end result was that the programs that have that EXE header signature only get 580KB of RAM, and don't give the "Packed File Corrupt" message any more.

I might need to adjust this detection algorithm in the future, to check also the start of the actual code (which seems to always be nearly identical to this):

1C59:0012 8CC0          MOV     AX,ES
1C59:0014 051000        ADD     AX,0010
1C59:0017 0E            PUSH    CS
1C59:0018 1F            POP     DS
1C59:0019 A30400        MOV     [0004],AX
1C59:001C 03060C00      ADD     AX,[000C]
1C59:0020 8EC0          MOV     ES,AX
1C59:0022 8B0E0600      MOV     CX,[0006]
1C59:0026 8BF9          MOV     DI,CX
1C59:0028 4F            DEC     DI
1C59:0029 8BF7          MOV     SI,DI
1C59:002B FD            STD
1C59:002C F3            REPZ
1C59:002D A4            MOVSB
1C59:002E 8B160E00      MOV     DX,[000E]
1C59:0032 50            PUSH    AX
1C59:0033 B83800        MOV     AX,0038
1C59:0036 50            PUSH    AX
1C59:0037 CB            RETF
Detecting this code before actually loading the EXE into memory is slightly more difficult than just looking at the EXE header, so I hope the current change will fix at least most of the problems.

Galactic Battle

I had a plan to work this weekend on adding support for EGA, specifically mode 0x0D (320x200 with 16 colors). Last week I searched for a small game that would use that mode, and I downloaded a couple of games that weren't suitable (had a lot of extra files) until I found Galactic Battle which seemed to be just what I was looking for. It is a small Space Invaders clone that uses mode 0x0D and PC Speaker sounds.

It did have the "Packed File Corrupt" problem, and I didn't have the above fix in the code at that time, but that was easy to work around by starting 4DOS without swapping. Anyways, this Friday I then began working on emulating the 16-color mode. During the last week I had been thinking about ways to emulate this mode, and I did come up with a solution that I thought might work, so I began coding it. I managed to get the code working pretty well already on Saturday, and I took the screen copy above from the Galactic Battle running in DSx86. It was quite playable, perhaps just a little bit slow.

EGA emulation

The 16-color modes are a lot more complex than any graphics mode I have so far supported. MCGA 320x200 with 256 colors is the easiest, as each pixel is simply a byte that is an index to a palette, exactly like the bitmapped background modes in Nintendo DS. The CGA mode was a little bit more complex, as it has 2 bits per pixel, but that was easily handled via a look-up table (LUT).

However, EGA and VGA 16-color modes are a different beast entirely. They use four separate memory planes, a byte in each plane has 8 neighbouring pixels, and each plane contains one bit of the 4-bit color. All these four planes share the same memory address (in segment 0xA000), and the plane that is being read/written is determined by writing a certain mask to a certain EGA/VGA I/O register. Thus, writing for example 4 neighbouring pixels of different colors might need 4 writes to the same memory position, and most likely also four reads and some bit masking so that the other 4 pixels in the same bytes don't get overwritten. A pretty complex scenario to emulate (and especially to do it fast!).

The real EGA has 4 times 64KB planes totalling 256KB of RAM, and I didn't want to spend more than this 256KB of RAM in my emulator. I also didn't want to assign less than this amount of RAM to the EGA/VGA memory, as many games use page flipping and assume that this much memory is available. So, I needed some way to make the memory behave like four different planes of 64KB each, but I also needed a way to blit this fast into the Nintendo DS VRAM, which is organized as 8 bits per pixel (that is, each byte is a separate pixel).

The straightforward method to emulate this might have been to allocate 64KB of RAM for each of the four planes, and use these planes like the original EGA/VGA uses them, each byte contains 8 pixels and the combination of the planes having a pixel set would determine the output color. However, I thought that building each output pixel (while blitting the screen) from a single bit in four different memory locations would pretty much kill the performance. There would be no practical way to use the ldmia opcode to load several words from the source buffer, and splicing each input bit to a separate output byte sounded like a really slow operation as well.

The idea I had during the week was that perhaps I could swap the way the memory is organized in the emulated EGA/VGA memory. I wanted to have all data that is needed to build an output pixel as close together in the source RAM as possible, and I also wanted the source data (when blitting) to have at least some resemblance to the output byte-per-pixel organization. So, I thought that keeping the 4 bits that are used to determine the color together might make everything faster. In the real EGA/VGA it takes 32 bits to contain 8 pixels, and I could also fit 8 pixels into 32 bits (a word) even if I used 4 bits for each pixel. So, in my current implementation each byte is actually a word, and each bit is actually a 4-bit color value.

I use a LUT to convert from an input byte (for example during a write to EGA/VGA RAM using a stosb opcode) to a word, which is then masked with a write mask based on a value written to the EGA/VGA register that controls which planes are accessed when a byte is written to EGA/VGA VRAM.

To make this emulated RAM fast to blit to the screen, I also interleaved the pixel positions so that the 4-bit pixels in a word are organized as 73625140. That allowed me to easily reorganize the word to two separate words containing 4-bit pixels 03020100 and 70605040 (or 8-bit pixels numbered 3210 and 7654) which can then be written to Nintendo DS VRAM. I copied the EGA palette to all the 16 16-color blocks so that I don't even need to clear the extra bits from the bytes, I can write the data as-is like ?3?2?1?0 and ?7?6?5?4 (after a shift right by 4 bits).

This a snippet of the blitting code. I read 4 words and write them to 8 words, as each 4-bits-per-pixel input value is converted to an 8-bits-per-pixel output value:

	ldmia   r1!, {r3,r5,r7,r9}		@ Load 4 words = 4*8 = 32 pixels
	mov	r10, r9, lsr #4
	mov	r8, r7, lsr #4
	mov	r6, r5, lsr #4
	mov	r4, r3, lsr #4
	stmia   r0!, {r3-r10}

Here is an illustration of the changed memory layout. Hopefully you can make sense of it, it is pretty difficult to explain clearly.

    Memory organization
    -------------------

    EGA/VGA:
    - Pixels in bits of a byte: 01234567 (where leftmost is the highest bit)
    - Colors:
        - Plane 0: BBBBBBBB (each bit set means the corresponding pixel has a blue component)
        - Plane 1: GGGGGGGG (each bit set means the corresponding pixel has a green component)
        - Plane 2: RRRRRRRR (each bit set means the corresponding pixel has a red component)
        - Plane 3: IIIIIIII (each bit set means the corresponding pixel has an intensity component)

    DSx86:
    - Pixels in bits of a word: 77773333666622225555111144440000
    - Colors in bits of a word: IRGBIRGBIRGBIRGBIRGBIRGBIRGBIRGB

    Example 1, setting two middle pixels to bright white:
    -----------------------------------------------------

    Input byte: 0x18 = 0b00011000 (highest bit is the leftmost pixel on screen)
    Write Mask: 0x0F = 0b1111 (all color components active)

    Original EGA memory result:
    Plane 0 (blue):      0b00011000
    Plane 1 (green):     0b00011000
    Plane 2 (red):       0b00011000
    Plane 3 (intensity): 0b00011000

    DSx86 memory result:
    Emulated RAM: 0b00001111000000000000000011110000

    Example 2, setting the two leftmost pixels to red:
    --------------------------------------------------

    Input byte: 0xC0 = 0b11000000 (highest bit is the leftmost pixel on screen)
    Write Mask: 0x04 = 0b0100 (red color component active)

    Original EGA memory result:
    Plane 0 (blue):      0b00000000
    Plane 1 (green):     0b00000000
    Plane 2 (red):       0b11000000
    Plane 3 (intensity): 0b00000000

    DSx86 memory result:
    Emulated RAM: 0b00000000000000000000010000000100
	

Opcode work

During the week I added many of the missing modrm bytes. I started from the beginning (opcode 0x00 = ADD r/m8,r8) and systematically added every single modrm variation. I am currently at opcode 0x38, all the smaller-numbered opcodes have all their modrm variations handled. This was mostly copy/paste work, as for example all and, or and xor opcodes behave exactly the same, only the actual operation in my opcode handlers differ.

This copy/pasting meant that the size of my CPU emulation source code increased quite a bit. Currently it has 46.507 rows and is about 1.38 megabytes in size. In version 0.02 it had 35.287 rows and was 1.08 MB, so it has grown by over 10.000 rows since then, and I still have a lot of modrm variations to add. I'm starting to worry about possible macro or label limits in the GNU Assembler, but I can of course split the file to several smaller files if needed. I'd like to keep the file as a single entity, though, as it currently has only a few well-defined dependencies to other files.

Jan 10th, 2010 - Version 0.02 released

Okay, I decided to release version 0.02 now that I have fixed the most annoying problems in 0.01. There is still a lot to do with the opcode support, and then the next big project is to add support for the EGA graphics mode. I also need to improve the current graphics mode support, as there seems to be problems with the palette handling, at least. Space Quest 4 is running but for some reason it thinks DSx86 has a monochrome display so it uses a greyscale palette, and Bad Blood is showing only a few of the colors, all other colors are black.

This version seems to be running Leisure Suit Larry 3 fine, I haven't had the save game corruption problem recently either. As you can see from the picture below, I've played it for long enough that I have over 500 points, so it will run at least that far without problems. I took the picture from the About page, as I thought it was funny to actually play and enjoy a game that is just a bit over 20 years old. :-) Also Solar Winds runs now with Sound Blaster digital sound effects enabled, as I added the ADPCM sample support.

You can download the new version from the Downloads page. Check out the release notes for more info about the changes, and please send me the debug logs when you ran into problems in the games you test. Thanks!

Jan 3rd, 2010 - Happy New Year!

Alpha version 0.01 feedback

Big thanks to everyone of you who tested the 0.01 alpha version and reported the problems you found, either by sending me the full DSx86dbg.log files or by posting on the GBADEV forum thread! The logs showed pretty clearly that DSx86 is still missing too many opcodes for it to support other games besides those I have specifically supported. Sorry about that, I thought it might be able to run some other simple games already. Also, the logs showed that quite many games want to use the graphics mode 0x0D (320x200 16-color EGA mode). I guess this should be the next graphics mode DSx86 will support.

Opcode tester program progress

Pretty soon after I released the 0.01 version, I noticed some serious problems with Leisure Suit Larry 3. After I fixed the missing opcode in 0.01, it still would not recognize any user input, instead it always said "Bad Said Spec" whatever I tried to type. I could not find the problem by debugging the code (and I tried for hours), so I then decided that now would be a good time to finally improve my opcode tester program. Until now the program had only tested for typos in the memory access, so that my opcode handlers don't change the wrong memory location (addressing with [SI] instead of [DI], for example).

I worked on the tester program for two full days, and added pretty complete tests for all arithmetic, logical, and move opcodes. It tests all modrm byte variations that DSx86 supports, and is smart enough to skip the variations I haven't yet coded. It checks for the correct result of the operation, and also for correct CPU flag changes.

When running the improved tester program I found about half a dozen bugs in my opcode handlers, including bugs in "TEST" opcode, which the original tester program had skipped completely, as it doesn't change any memory.

Now I feel much more confident that the new opcodes and modrm byte variations I add will work properly even when I don't currently know of a game that would use them. Thus, I have been adding quite a few new opcodes these past days since the 0.01 release, for example all "INC", "DEC" and "LEA" variations are now fully supported, same as all arithmetic 16-bit operations with immediate values, and most of the "MOV" opcodes as well. Still a lot of work to do with the remaining missing opcodes, but it is much nicer to keep adding them now when I can test the new variations immediately.

Auxiliary Carry and Parity flags

One big difference between the ARM and x86 processors is that ARM does not have the Auxiliary Carry (which works like the normal Carry flag, but for the bottom 4 bits of a byte) nor the Parity flag (which tells whether the least significant byte of the last arithmetic operation has an even number of set bits). I had hoped that games would not use these flags, but now I have noticed them both being used in games.

The AC flag is used in Leisure Suit Larry 3. When drawing data to the screen, LSL3 uses both the Carry flag (for the upper 4 bits) and the AC flag (for the lower 4 bits) to determine whether the new data is in front of or behind whatever is currently on the screen. It uses code like this:

	CMP	BL,[DI+7FF0]
	LAHF
	TEST	AH,10
That is, it compares the current value in BL with the value in memory, then loads the AH register with the resulting flags, and then tests for the bit in AH that has received the AC flag. A somewhat complex method, but since the x86 instruction set does not contain a conditional jump that would check the AC flag, this is the most straightforward method available.

Since the ARM processor does not have anything similar, I had to code a game-specific special case to handle this. I added some code to the LAHF opcode (which is quite rare) handler, so that it checks whether the next opcode is "TEST AH,10" (which could have no other purpose but to test the AC flag), and if so, it determines if the previous opcode is "CMP BL,[DI+7FF0]", and if so, it performs the compare again, but this time by shifting the values 4 bits left, so the resulting real carry will get the value that the AC flag would get in x86. This new carry is then stored into the AH register bit 0x10, so the following test will give the correct result. This seemed to solve the problem completely for Leisure Suit Larry 3.

Based on a log file I received, The Incredible Machine uses the parity flag. However, on closer examination of the code I believe this is actually a bug in the game. The game has code like this:

1E30:60EB 83F90A          cmp  cx,000A
1E30:60EE 7C0C            jl   60FC ($+c)
1E30:60F0 0BFF            or   di,di
1E30:60F2 7A02            jpe  60F6 ($+2)
1E30:60F4 AA              stosb
1E30:60F5 49              dec  cx
1E30:60F6 D1E9            shr  cx,1
1E30:60F8 F3AB            repe stosw 
1E30:60FA D0D1            rcl  cl,1
1E30:60FC F3AA            repe stosb
The way I interpret that is, that it first check if it has less than 10 bytes to move. If so, it jumps to simply moving them byte-by-byte. If there are more bytes, it first checks whether the low byte of the target address has an even number of bits set (?!), and if so, it jumps to moving the data by word access, else it first moves the leading byte, and then the remaining bytes using word access.

What the author probably meant, was to check whether the target address is even, and jump to word access if so. That would have been done by a code somewhat like this:

	test	di,0001
	jz	60F6
That is, by testing whether the lowest bit of the address is set or not. Luckily for the author, x86 works fine with unaligned word access (unlike ARM), so this is a benign bug that causes no ill effects. What is annoying, though, is that I need to add a special case into DSx86 to handle a code that shouldn't be there in the first place!

SB ADPCM support

I also added the 4-bit ADPCM and 2.6-bit ADPCM audio support to DSx86 SoundBlaster emulation, both of which are needed by Solar Winds. It was reasonably easy to do, again using the DOSBox sources as a reference. My version does not sound quite similar yet, though, so perhaps I still have something wrong in my implementation. At least it does not crash, which is always the main thing when coding for ARM7. :-)

Annoying save game bug

I have been testing Leisure Suit Larry 3 pretty extensively, as it used to be one of my favourite games back in the early 90's. That was mostly because I had a Roland LAPC sound card (basically an MT-32 synth built into a PC ISA card), which the game supported perfectly. It even loaded new patches to the synth to make the music sound really really good (much much better than what the music sounded on any General Midi synth, for example, not to mention AdLib/SoundBlaster cards). I think the music plays surprisingly well even on my poor AdLib emulation in DSx86. It obviously does not sound great, like with the LAPC card, but it is close enough that it brings back memories. :-)

Currently there is a really annoying problem with save games of LSL3. For some strange reason DSx86 occasionally corrupts the save game directory file "LSL3sg.dir". This is pretty much the worst file to get corrupted, as losing it means you lose all of the possible several save games.

What is curious is that usually when this corruption happens, I can see the save game names at the top row of the DSx86dbg.log file, and the LSL3sg.dir file contains only "---------------", which should be at the start of the DSx86dbg.log file! I don't see how it could swap the file pointers, especially as the DSx86dbg.log file had not been open when I saved the game! This still needs a lot of looking into.

Future plans

Well, today my two-week Christmas vacation ends, so tomorrow I need to again start working "for real". I would like to release version 0.02 with a lot of added opcodes reasonably soon, but I don't yet know when that might be. Probably in a week or two. The next big thing to add would then be the support for the 320x200 EGA 16-color mode, but that will be for version 0.03 at the earliest.

Previous blog entries

See here for DSx86 blog entries from 2009.


Main Page | Downloads | Credits