During the past week I have finally had some time to work on DS2x86, I have spent an hour or two almost every day on it. My goal was to improve the graphics transfer and blitting code so that also the high resolution modes reach at least 30fps screen refresh rate. I managed to reach speeds of over 60fps in Zoom mode in all the standard graphics modes, but in 640x480 scaled mode and in the weird Mode-X resolutions the blitting speed will sadly still not reach 60fps. But in any case all graphics transfer is now faster than what it was in the original DSTwo transfer system. Below is a table showing the current graphics refresh rates in the various resolutions, in both Zoom and Scaled modes.
| Mode | Zoom | Scale | Notes |
|---|---|---|---|
| 80x25 Text | >200 fps | >200 fps | Varies by the number of changed characters |
| 320x200 CGA | 229 fps | 229 fps | |
| 640x200 CGA | 222 fps | 229 fps | |
| 320x200 EGA | 128 fps | 128 fps | When logical screen width = 320 pixel |
| 320x200 EGA | 119 fps | 119 fps | When logical screen width > 320 pixels |
| 640x200 EGA | 156 fps | 70 fps | |
| 640x350 EGA | 156 fps | 80 fps | In Scale mode 320x175 bytes transferred |
| 640x480 VGA | 156 fps | 58 fps | In Scale mode 320x240 bytes transferred |
| 320x200 MCGA | 70 fps | 70 fps | |
| 360x240 Mode-X | 56 fps | 56 fps | Used in "Settlers" |
| 320x480 Mode-X | 58 fps | 58 fps | Used in "LineWars II", 320x240 bytes transferred |
To reach these speeds in the >=350 scanline Scale modes, I had to combine two adjacent scanlines on the MIPS side before sending them as a single scanline. This will cause the graphics quality to suffer (since I don't want to go to 16-bit color mode where I could do some palette averaging, but which would again drop the refresh rate down to around 40 fps). In the Mode-X 320x400 and 320x480 modes this joining of two scanlines is done also in the Zoom mode, to improved the aspect ratio, so those modes will suffer the most. Luckily, those are rather uncommon screen modes, so not many games are affected. In the 640x??? Zoom modes I only transfer the 256x192 pixels of the visible screen area, so those will run much faster than the corresponding Scale modes.
While testing various games with the new graphics transfer routines, I noticed a problem in Chaos Engine. It went into the game fine, but then immediately the screen got filled with seemingly random multicolour pixels. I tested the game with the previous 0.23 version, but the problem was present in there as well! So, it was not caused by the new transfer system, but something more serious. I tested also with DS2x86 version 0.22, in which the game worked fine, so the problem was obviously caused by the "major internal rewrite" that I did for version 0.23.
Next I tried to stop the game into the debugger immediately after the problem began to occur, and almost by chance I managed to get inside some graphics drawing routine, where I immediately noticed that the game uses the FS segment register. It is rather uncommon for real-mode games to use the FS and GS segment registers, and since in version 0.23 I had changed the FS and GS registers to be handled differently, this immediately made me suspect the new handling for those segment registers. The FS register had a value 0x03D2, and the graphics code tested whether a byte in that segment was 0xFF. When I looked at the address of that byte, I noticed that it actually seems to point to code and not data! This was very suspicious, so I checked what the FS register value was in DS2x86 version 0.22 when running that code, and there the FS register value was 0x3D2F!
So, now it was just a matter of determining which of my opcode handlers cause the FS segment register to shift 4 bit positions to the right. Pretty soon I found that the problem was in the software interrupt (INT opcode) handler. It was supposed to shift both FS and GS registers 4 bit positions right (to adjust from effective to actual segment register value when calling the interrupt), but due to a copy-paste bug I shifted FS register two times and did not shift the GS register at all! So, whenever a software interrupt was called, both FS and GS registers got invalid values!
Since those registers are rarely used in real mode, and the protected mode handling for those registers was not broken, this only caused problems in a few games, Chaos Engine being one of them. After I fixed that, I tested also Elder Scrolls: Arena, which had also started to misbehave during the same time, dropping back to DOS with a "Memory list blown" error message. This FS and GS segment register fix did seem to fix that game also, so in the next 0.33 version Arena should again be playable.
I haven't yet had time to recode the SoundBlaster support, so that is what I plan to work on next. I would like to be able to implement all the ADPCM digitized audio modes, and fix the timing and skipping problems in the current system. This will be rather lot of work, so I am not sure if I have time to do all of that during the next week, but we shall see. I would also like to soon have time to work on the game compatibility again, but it looks like that will have to wait for a little while longer. In any case, thanks for your test reports for the previous 0.32 version, I will eventually get around to fixing the problems!
This version only has a few minor improvements. I have only had a couple of days to work on DS2x86 during the previous two week period. As it will probably take me another two weeks to get the graphics blitting improved, I decided to release a new version today, even though it has not had much work done compared to the previous version. This is what is new:
The lack of progress is mostly due to an extended electrical blackout that occurred on last Monday, and which (indirectly) caused two of my three computers to die. My laptop (which is almost 9 years old and still had the original battery) ran it's battery empty during the blackout, and it looks like this finally caused the battery to fail completely. The machine does not run without a battery, so I had to order a new battery for it. Somewhat surprisingly some online stores still sell (and have in stock) batteries suitable for a laptop that old!
The bigger problem was that the motherboard in my HTPC also died. It was less than two years old, so it should have lasted longer, but of course it only had a one-year warranty. So, I had to spend all my free time last week with first familiarizing myself with the current status of suitable hardware for a HTPC machine, then online shopping for parts, modifying my silent cooling system to fit a new socket type, and then finally building and configuring the new machine. That left no time for me to work on DS2x86, as I only got the new machine up and running yesterday evening.
I hope to finally get back on track with working on DS2x86 during the next week. I first want to improve the screen blitting speed and quality in the higher-resolution modes, and then I really need to re-implement the SoundBlaster digital audio support. Sorry that it will take so long for me to get around to these, but sometimes unexpected complications arise.
The last week has again seen only some slow progress with DS2x86. There were a couple of snow storms, causing blackouts and the need to shovel snow again last week. That meant I only had two evenings where I was able to work on DS2x86 at all. I spent those fixing the mouse behaviour in the new transfer system, so that now both the D-Pad mouse and the Touchscreen mouse seem to work in DS2x86 just like they do in the original DSx86.
Currently I am attempting to improve the screen blitting in the high resolution modes (640x???). I will need to move some of the blitting code back to the MIPS side to be able to avoid sending unnecessary data via the slow card interface. This will cause a slight slowdown to the CPU emulation, but since most of the more CPU-hungry games will use the 256-color modes (where the 640x resolution is not available), this should not cause much of a problem.
There are still problems in the SoundBlaster emulation, but it looks like those will be very difficult to fix using my current transfer method. I think I will still need to rethink the SB digital audio transfer system one more time, to be able to handle all the different methods that the SB card can play digital audio. I have not yet figured out a system that would support all the needs, so I will probably leave this improvement for the later versions.
Thanks for your test reports about the problems in the previous version again! Sorry I probably won't get around to improving the specific problems in the various games before I get the new transfer system working better. This new transfer system was a major architectural change, so it will still take a few releases to get it working properly.
Okay, this version has various fixes to bugs introduced in the previous version. Sadly I did not have time to fix all the bugs in the SoundBlaster handling, as it took me many days to find the and fix a problem that made Supaplex lose digital sounds immediately when the game began. I managed to finally fix this problem, but it still loses the digital sounds occasionally for a little while. Anyways, here are the fixes in this version:
I will try to continue improving the SoundBlaster features and other still missing features during the upcoming weeks. My time to work on DS2x86 has been somewhat limited recently, as it is winter and we get quite a lot of snow, which also sometimes causes electrical black-outs. So, I need to spend time shoveling snow and just waiting for the electricity to return instead of working on DS2x86! Hopefully things improve in a couple of months as spring comes. :-)
Since the release of 0.30 last week, I have been working on the remaining problems in the new transfer system. These are the problems that I have now managed to fix, and which will be included in the 0.31 version. The current plan is to release DS2x86 version 0.31 next Sunday. I hope to still add some additional fixes and improvements during the next week.
The new EGA mode 0x0D blitting code now has two working modes. If the logical screen layout is 320 pixels wide (so no horizontal scrolling or additional trickery is used), the blitting speed is 6.7 ms (149 fps), and if the logical screen width is more than 320, a separate transfer code is used on the MIPS side. This code only sends 8 extra pixels per screen row (to handle possible smooth pixel panning function), and thus the blitting speed drops only slightly, to 7.9 ms (126 fps). This change got rid of the screen tearing problem in Supaplex and Commander Keen 4 intro.
The EGA and VGA graphics cards have an option to jump back to the beginning of the graphics VRAM memory at a certain scanline (when the card is drawing the image on the monitor). This is activated by giving the EGA/VGA line compare register a scanline number that is less than the number of screen rows. There is also a bit in another register that tells the graphics card to reset the pixel panning to zero in this situation. The pixel panning register is used to shift the screen image 0..7 pixels left during the graphics VRAM scanning and drawing onto the monitor. Since the screen image start address needs to be at a byte boundary, and each byte in the 16-color modes contain 8 adjacent pixels, using the pixel panning register is needed when smoothly panning the image horizontally by less than 8 pixels at a time.
In DS2x86 version 0.30 I simplified the screen blitting code (compared to DSx86 and previous DS2x86 versions) so that I don't handle the pixel panning value in the code (by shifting the pixels before blitting them to the screen), but instead I use the Nintendo DS graphics background registers to emulate the pixel panning, much like the actual EGA/VGA card does it. However, it only occurred to me last weekend that I can also handle the line compare pixel panning reset using Nintendo DS hardware! Since the NDS graphics features include a VCount interrupt, I can use that to get an interrupt at the line compare scanline, and reset the NDS background register horizontal position to zero! The end result is exactly similar to the EGA/VGA card behaviour, with much of the functionality done by the NDS graphics hardware! This is a change I plan to port back to the original DSx86, as it will simplify the EGA blitting code there as well. This change made the Supaplex bottom score panel stay put while the upper area is panning.
Supaplex also helped me in finding the problem in my AdLib emulation. The music seemed to skip a lot of notes during the beginning. I logged the AdLib notes in DOSBox to a file, and also wrote code to log the notes that the MIPS side sends to the ARM9 to a file on the SD card, and noticed that there were no differences. The exact same notes get sent with nearly identical timing from the MIPS to the ARM9 side. And since the ARM7 uses the same code as the original DSx86 (which works fine), it was easy to figure out that the problem must be in the new ARM9 code. And there the problem indeed was. I had a minor bug in the buffering scheme, where the last command in the buffer was never sent from ARM9 to ARM7 until the buffer got additional data from MIPS to ARM9. In Supaplex music there are places where only one instrument is playing, and in these places the game sends only three commands: Note Off, Note Frequency, and Note On. So, when the last new command never got sent, this in effect made the ARM7 see the music as a sequence of Note On, Note Off, Note Frequency commands, so there was no sound output.
After those fixes I then began debugging the Warcraft BSOD crash problem. It is somewhat weird, as the location of the crash (as reported by the BSOD texts) seems to jump all over the MIPS code area. What is even more strange, the location seems to often point to a code that can not crash, that is, it has only some simple aritmetic operations or such. So, my first theory was that perhaps this is some interrupt routine re-entrancy problem in the new more accurate SoundBlaster IRQ emulation. I checked the Warcraft SB emulation code (which I reverse engineered some time ago, when an earlier DS2x86 version had problems with it), and noticed that it sets up an auto-init DMA audio transfer with a buffer length of 2 samples! That is, the SoundBlaster will send an IRQ after every 2 samples have been played! As the playing frequency was 22 kHz, this meant that my emulation code began getting over 11000 IRQs per second!
I experimented by forcibly limiting the auto-init IRQ frequency, but rather annoyingly, even at an IRQ frequency of 366 Hz (24000000/65536) the BSOD problem remained. Only at a frequency of 183 Hz (24000000/(2*65536)) I got rid of the BSOD problem. This made me realize that the IRQ speed itself can not be the actual cause for the BSOD, as for example Windows sets the PC timer to run at 1000Hz, which is also emulated similarly using a hardware IRQ at that speed, and it does work fine. Finally I then realized that my buffer copying code inside the IRQ handler expects the pointers to be word-aligned, and with the transfer buffer length of only 2 samples, the pointer was actually only halfword-aligned! Since Warcraft only uses this buffer setup when testing for a SoundBlaster, it is not so important to play the correct samples, and thus I forcibly aligned the pointers to be word-aligned. This got rid of the BSOD, but still the SB audio does not work quite correctly in Warcraft. I'll continue working on this problem during the next week. There are still various other problems in the new transfer code as well, which I also hope to be able to fix and/or implement during the upcoming weeks. But, you can expect at least the above fixes to be included in the next version.
Happy New Year! It is again time to start a new blog page, as I like to have my blog pages not contain more than half a year's worth of blog posts. Makes it faster for you to read/download the latest entries as well.
This is the first DS2x86 version to use my own completely rewritten transfer system between the MIPS and ARM processors for the DSTwo flash cart. Pretty much nothing in the new transfer code is copied directly from the SuperCard SDK sources. I have used the ideas from their sources, but the actual code is quite completely rewritten. The main differences are that now it is the ARM side that controls when and what data to transfer, the graphics are transferred in the native format of the emulated PC graphics memory and drawn on the ARM9 side, and the AdLib sound is fully generated on the ARM7 side.
I have not had time to implement all the features I had coded for the original transfer system, so not everything is fully working yet. Also, note that there are no compatibility improvements in this version, so if a game did not work in version 0.25, it most certainly will not work in this version either. Rather the opposite, if a game did work on 0.25, it might not work in this version! Here below is a list of features still missing from this version, so any game that needs one or more of these might not work properly:
However, there are obviously also some advantages in using the new transfer system. Here is a list of the current advantages, and as I get around to improving the still missing and buggy features, this list will hopefully get longer and the above list will get shorter:
The original SDK transfer system always transferred the graphics data using 16-bit color video buffers, so that transferring one screen frame needed 256x192x2 = 98304 bytes to get transferred. Since the card interface runs at 4.2MHz speed, transferring this much data took 23.4 milliseconds, and the frames could be transferred at a maximum speed of 42.7 fps. These numbers are the theoretical maximum, the real speed was somewhat less because of the need to transfer also audio data and various commands.
In the new system I transfer only as much data as is needed, so the amount is usually much less than in the original system, except in the high-resolution modes like 640x480, where I might need to transfer more data per frame than in the original SDK. Also, since the data needs to be transferred in 1024-byte blocks without direct random access, I can not currently skip bytes that are outside of the visible area if the game has set up the graphics memory to have a logical screen width larger than the visible screen width. For example, Commander Keen 4 uses a logical screen width of 1260 pixels in the EGA 320x200 mode during the intro scroller, so that I need to transfer almost 1000 extra pixels for each screen row! This will drop the blitting framerate down to 24 fps, which will cause visible tearing of the screen image. Luckily the game itself uses the normal 320x200 mode.
Here below is a table showing the most common graphics modes, and the maximum (theoretical) frame rate in each of them using the new transfer method:
| Mode | Original SDK | New method | Max FPS | Notes |
|---|---|---|---|---|
| 80x25 Text | 23.4 ms | 0.8..4.5 ms | >200 fps | Depending on changed characters |
| 320x200 CGA | 23.4 ms | 4.1 ms | 244 fps | |
| 640x200 CGA | 23.4 ms | 4.2 ms | 238 fps | |
| 320x200 EGA | 23.4 ms | 6.7..40.9 ms | 24..149 fps | Depending on logical screen width |
| 640x200 EGA | 23.4 ms | 13.1 ms | 76 fps | |
| 640x350 EGA | 23.4 ms | 22.9 ms | 43 fps | |
| 640x480 VGA | 23.4 ms | 31.2 ms | 32 fps | |
| 320x200 MCGA | 23.4 ms | 13.1 ms | 76 fps | |
| 360x240 Mode-X | 23.4 ms | 17.8 ms | 56 fps | Used in "Settlers" |
| 320x480 Mode-X | 23.4 ms | 35.0 ms | 28 fps | Used in "LineWars II" |
For the coming week or two, I plan to still work on the new transfer system, fine tuning it and adding the missing features. First, I hope to come up with a faster way to transfer the screen graphics in the high resolution modes, and in special EGA modes that use a very wide logical screen (like Commander Keen 4 or Supaplex). After that, I need to improve and fix the SoundBlaster audio handling, and then look into the hardware mouse cursor and other mouse features. Also, I can now add all the keyboard-related features (visible upper/lower case changes, key flash when clicked, etc) that are in the original DSx86 but have so far not been possible to implement into DS2x86. I don't plan to work on any game-specific compatibility improvements until I have improved the new transfer system still quite a bit further.
Please report games to seriously misbehave using this new transfer system, or if some core functionality that I have not mentioned above is missing, though! It helps me in improving the new system if I have a list of games to test, so I don't need to guess and hope that my change fixes something. Thanks again for your interest in DS2x86!