Happy new year 2013! As I hinted at in my previous blog post, I have now started work on an Android port of DSx86. It has a working name of ax86, and thus I created a subdomain ax86.patrickaalto.com for it. I will be writing blog posts about the development of ax86 on those pages. This will probably be the last Dsx86 blog post, at least for now. I released the first public beta version of DSx86 on the 29th of December 2009, so just a few days over 3 years ago. Big thanks to all of you who have been following my blog and progress of DSx86, and I welcome you all to the ax86.patrickaalto.com site!
Merry Xmas and a Happy New Year! To celebrate Xmas, Raiden Legacy for Android is now on sale (40% off) until December 27th! Get it now from Google Play!
I have a two week Xmas vacation starting now, and during this time I plan to make up my mind about what I will do with DSx86. At the moment it looks like it might make sense to port DSx86 to Android. I looked at the various x86 emulators available for Android, and there does not seem to be a free and fast version available. There is a free aDosBox, which however is not optimized for Android at all, so it runs really slow. The other DosBox ports and other x86 emulators seem to not be free, so there might be room for a free DSx86 port. My port should be considerably faster than aDosBox (and might even be faster than the commercial ports), but it will most likely lack in compatibility. But in any case there seems to exist a niche where my x86 emulator might fit into nicely.
Another interesting hardware for porting DSx86 to would be Raspberry Pi. It is considerably slower than current Android devices, so there would be a greater demand for a fast x86 emulator. This device would also need no keyboard (or mouse) emulation via touchscreen, so I would not need to spend time working on those. This device runs Linux (which is not all that far removed from the Android platform), so it might also be possible to target both Android and Raspberry Pi (and possibly even other Linux-based hardware running on an ARM processor) using mostly the same emulator core.
In any case, the first step would be to rewrite all hardware-specific stuff (mostly timer and interrupt-related) in DSx86 to be compatible with an underlying operating system that prohibits direct hardware access. I believe it will take me several weeks just to do that, so only after that I will need to look into actual hardware to port DSx86 to. After I make the decission, I will probably create a new subdomain and begin writing a completely new blog under that subdomain. The name of the port might be something like ax86 (for Android) or Pix86 (for Raspberry Pi) or something similar.
Happy New Year to all of you reading this blog, next year I will probably have something more specific to tell about the future of DSx86. :-)
Today DotEmu is releasing Raiden Legacy for iOS and Android! What has this got to do with DSx86, you ask? Quite a bit, actually. DotEmu has licensed my x86 emulation core from DSx86, to be used in their Raiden Legacy mobile game to run the Raiden Fighters series arcade games.
DotEmu originally contacted me in April of this year, asking whether the x86 emulation core in DSx86 would be open for commercial licensing. I thought this was a very intriguing idea, and thus we began discussions about what exactly they need and how good a match my emulation core would be for their needs. It turned out that they needed to emulate Seibu SPI arcade machine hardware on iOS and Android mobile phones (meaning on ARM processors). The Seibu SPI hardware consists of an Intel 386 processor (running at 25MHz) that handles the actual game logic, a Zilog Z80 processor that handles music and audio processing and timing, a Yamaha YMF271 audio chip that generates the actual audio, and a custom graphics processing chip. My DSx86 emulation core only supported 286 processor at that time, so it was not well suited to this project. However, DotEmu did not need my emulation core immediately, so we agreed that I could spend up to two months porting the 386 emulation features from my MIPS core back to the original ARM emulation core.
I began this porting project in April (this was actually the "additional project" I mentioned in my Apr 29th, 2012 blog post). I again wanted to use the test-driven development (TDD) method when porting the actual 386-specific features from MIPS to ARM, so my first step was to port my improved unit test program from DS2x86 back to ARM architecture. While porting it I further improved it quite a bit, so that it now has very thorough tests for both the 16-bit and 32-bit opcode versions, and also for the 16-bit and 32-bit memory addressing modes. For several opcodes it now contains so exhaustive tests that practically all possible input combinations and values are being tested. As I was porting and testing this, I found several problems in my original 286 emulation core, which I then later fixed in the original DSx86 version and mentioned in my Jun 24th, 2012 blog post.
After the unit tests seemed to work fine with the new 386-enabled core, I spent some time creating a version of DSx86 that could run in 386 protected mode. The Raiden Fighters games did not use any advanced features like paging or task switching, so I was able to leave all such features out of my new core. This also meant that I could only test some very simple 386-mode programs in "DSx386" (as I called my test version). I decided to again use my Trekmo demo, as it takes very little memory and only needs pretty much the same features as the Raiden Fighters games. On the 26th of May (so still well within the two months period) I finally got Trekmo running in DSx386!
Trekmo of course ran horribly slow, as adding the 386-specific features meant that I had to remove some of the speed hacks I had used in my original core. The emulation speed dropped to about 70% of the original DSx86 speed, and Trekmo ran at 3.5 frames per second. The core is also so much bigger than the 286-only core that there is practically no extended or expanded memory free. Luckily Trekmo does not need any.
The next steps after I got Trekmo running were to adjust my emulation core so that it will run on iOS and Android environment, and then integrate my core with the emulation framework that DotEmu uses. This work took pretty much the whole of June, as there were various issues that we needed to solve (like the hardware timers not being available, ASM syntax differences between iOS and Android, and so on). By the end of June my core was finally emulating the Raiden Fighters games! At that time my summer vacation was just starting, so I asked whether DotEmu needs me to work on any other aspect of the Raiden Legacy project. We decided that I could take a look at the YMF271 audio emulation, to see if I could optimize that code.
The Yamaha YMF271 chip is practically an AdLib audio chip on steroids, so I thought that I could probably use many of the same ideas that I used in the AdLib emulation I had coded for DSx86. In DSx86 I can run 9 audio channels, each with up to 2 operators (so the total operator count is 18), on the 33MHz ARM processor. As YMF271 has a total of 48 operators, I thought that it should be possible to keep the CPU usage needs below 100MHz, which means that audio emulation should not take more than 10% of a 1GHz ARM processor of a mobile phone.
I began by running Raiden Fighters in a Windows version of MAME (Multiple Arcade Machine Emulator) with a debugger attached. This way I could check how those games actually use the YMF271 chip, and get an understanding of how that chip behaves. I found out that the games mostly use the PCM audio features (playing 8-bit samples from ROM), but they also use quite a few of the different FM audio algorithm versions. The YMF271 chip can generate FM sounds using one of four different 2-operator algorithms, or one of 16 different 4-operator algorithms. The differences come from the different ways that these operators are connected to each other to produce the final waveform.
After I understood how the YMF271 chip works, I began coding a test framework in devKitPro, to be able to test my optimizations easily using the No$GBA emulator. I could also have used the Android development environment and Android emulator, but working with those would have been a lot slower. I ported the PCM algorithm and each of the FM algorithm C routines from MAME to my test program. I then used some example sounds (from Raiden Fighters) as input to the algorithms, generating a 1024-byte sample buffer. When that was working and I was able to generate some sample data, I then began implementing my ASM algorithms. I used many ideas from my AdLib emulation code, and at each step I tested the resulting sample data from my ASM implementation against the sample data from the C implementation. This way I could make sure my algorithm generates exactly the same output as the original C code.
Here below is a table showing the CPU cycle counts for creating 1024 stereo samples (on ARMv5 architecture, when running the test program in No$GBA), for various algorithms. The PCM algorithm is the simplest, and I was able to determine that playing PCM audio never uses either amplitude or frequency modulation (LFO), so I was able to skip that code completely. That is why the table shows the same 39,168 CPU cycles for each LFO variations (ams meaning amplitude modulation and pms meaning frequency modulation, named after the corresponding variables in the C code).
|Algorithm||no LFO||ams=0,pms=0||ams>0,pms=0||ams=0,pms>0||ams>0,pms>0||C-code min||C-code max||Speedup|
|update_pcm||39 168||39 168||39 168||39 168||39 168||1 310 080||1 310 080||32x|
|2fm_alg0||92 416||109 952||163 200||159 104||212 480||2 400 384||2 643 328||12x..24x|
|2fm_alg3||118 912||126 336||179 584||176 512||228 736||2 586 240||2 835 200||12x..22x|
|4fm_alg0||169 088||187 904||294 528||291 072||396 288||4 557 568||5 046 528||12x..27x|
|4fm_alg14||202 368||220 672||327 296||318 976||429 312||4 930 432||5 424 640||12x..24x|
As you can see, I was able to improve the PCM algorithm so that it runs 32 times faster! Or to put it in another way, my ASM code can generate one stereo sample for every 38.25 CPU cycles used, while the C code in MAME takes over 1279 CPU cycles to accomplish the same task. These measurements are taken on ARMv5 architecture, which does not have a floating point coprocessor. The ARMv7 architecture used in many Android and iOS mobile phones has floating point support, and as the original C code used some floating point calculations, the real life improvement is not quite as big. But even with that taken into account, my code is still considerably faster also on ARMv7 architecture.
The FM algorithm speedup was between 12 and 27 times, depending on the LFO usage. I also added a compile define that commented out all the LFO support (the no LFO column in the table) in case DotEmu wanted to get the best possible performance at the expense of some accuracy. It is not very easy to hear the difference whether the LFO is in use or not, especially when using the (usually low quality) built-in speakers of a mobile phone, so commenting out the whole LFO will allow the other more important parts of the emulation to have more CPU power.
After I had speeded up the audio emulation, I asked whether DotEmu needs my help with optimizing the graphics routines. They had already created their own heavily optimized routines in C, and after looking at them I realized that converting them to ASM would only bring very modest speedup. I decided to try optimizing them anyways, mostly just as a learning experience. In the end I was able to improve the speed of these routines only by something like 20% (so that my ASM code spent 0.8 times the CPU cycles of the C code to perform the same task), so this improvement was not terribly important. Since even a small improvement is still better than nothing, we decided to use my improved graphics routines anyways.
By September my work on Raiden Legacy was pretty much done, so I began focusing on my other hobbies, while DotEmu continued with the user interface and such work on Raiden Legacy. However, then at the beginning of November DotEmu reported that they had found a problem in my core that only affected a few mobile phones, including some versions of Samsung Galaxy S2 and S3. After a short time of gameplay, the Raiden Legacy process would crash, and the crash always happened inside my 386 emulation core. This was a bit of a nasty surprise, as the release date of the project was looming near so the timing of this problem was pretty bad.
I began debugging the problem on one such affected device that DotEmu loaned to me, and indeed, for some peculiar reason my core always crashed after a short time in the game with the crash message signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000001. The fault address pointed to a special flag I have used in my core for the first RAM page memory access, and indeed it looked like a function pointer in the x86 code was pointing to a null address and thus the game jumped to zero address, which was obviously not correct behaviour.
I added various trace features to my core, and also ran the same game in the Windows version of MAME, comparing the register and memory values before the crash happened. This took a while, as the game executed the same routine quite a few times without any problems, before then suddenly there was an invalid value in a certain memory address. I hunted for the location where this memory address gets it's value, and found that it uses the 32-bit x86 XCHG opcode to set the memory value.
My 32-bit XCHG opcode handler looked like the following (where \reg is the ARM register emulating one of the eight x86 general purpose registers, and r2 is the actual memory address):
1: swpb r0, \reg, [r2] @ r0 = first byte swapped add r2, #1 lsr \reg, #8 swpb r1, \reg, [r2] @ r1 = second byte swapped add r2, #1 lsr \reg, #8 swpb r3, \reg, [r2] @ r3 = third byte swapped add r2, #1 lsr \reg, #8 swpb \reg, \reg, [r2] @ \reg = highest byte swapped orr r0, r1, lsl #8 orr r0, r3, lsl #16 orr \reg, r0, \reg, lsl #24When looking at this implementation, I remembered that I had read in the ARM Architecture Reference Manual that the SWP and SWPB opcodes can be used for semaphores and other such hardware-specific stuff, so I thought that perhaps they have some special restrictions. I googled for "swp swpb armv7", and the first hit was about adding SWP/SWPB emulation for ARMv7 processors to the Linux ARM kernel. Now that was interesting! I had not realized that the SWP and SWPB opcodes were deprecated already on ARMv6 architecture, and they are disabled completely on ARMv7 architecture (as implemented by for example the Samsung Exynos SoC in the Samsung Galaxy S2 and S3 mobile phones)!
I decided to replace the SWPB opcodes using plain LDRB/STRB opcodes in my core, as it is not nice to use deprecated CPU features even if they seemed to work most of the time. My new XCHG implementation looks like the following:
1: ldrb r0, [r2] strb \reg, [r2] lsr \reg, #8 ldrb r1, [r2, #1] strb \reg, [r2, #1] lsr \reg, #8 orr r0, r1, lsl #8 ldrb r1, [r2, #2] strb \reg, [r2, #2] lsr \reg, #8 orr r0, r1, lsl #16 ldrb r1, [r2, #3] strb \reg, [r2, #3] orr \reg, r0, r1, lsl #24I tested this new implementation on the problem device, and my core did not crash any more! So, it looks like the SWPB opcode was emulated on some level (either by hardware or by the Android Linux Kernel) on the Samsung Exynos processor, and there seems to be something wrong with this emulation. The lesson to be learned here is that you should not assume that CPU features and opcodes that work on earlier architecture models are available on the newer architecture. And also, if you are coding for ARMv7 acrhitecture, make sure you don't use SWP/SWPB opcodes, or you might experience weird problems on some hardware.
All in all, it has been very interesting and fun working on this project, and getting an inside glimpse at how mobile games are being made. This project suited me very well, as I was able to focus on the areas of the project that I found interesting (namely assembly optimizations), while letting DotEmu handle the (in my mind) boring stuff. :-) I hope you enjoy playing this game as much as I have enjoyed coding various bits and pieces of it!
This is just a short update about my VHS digitizing project. I am currently digitizing the eight VHS tape (of about 30 tapes I plan to digitize), so it will still take a couple of months before I have digitized everything. With the first ten or so tapes I have a pretty good idea about what I want to save and what not, but the remaining tapes I have not yet looked at properly, so it is possible there is a lot of stuff I can simply skip. But even so, I doubt I will get back to my programming hobbies before the end of the year.
I do have some DSx86-related news waiting for a proper time (which is not up to me) to announce, so I might have some interesting news to write about even before I have finished my VHS digitizing project. Also, I have not yet made up my mind about whether to simply continue with Nintendo DS programming, or try porting DSx86 to some other platforms, but it looks like I still have a couple of months to think about this. :-)
I am still on my coding break. I decided to write this blog post just to quickly let you know what it is I am doing currently, instead of DSx86. I am digitizing my large collection of old VHS tapes. Many of them of course are quite irrelevant as they contain movies that I have since purchased in DVD or Blu-Ray format, but there is a lot of material that I would like to digitize properly.
I got my first VCR (a Sharp VC-488) back in 1985, and have been recording stuff since that time. My second VCR was a JVC HR-D530EH, which I got in August 1988, to be able to copy stuff from tape to tape. Thus many of my tapes are actually second-generation recordings, sadly. After those VCRs I have had a couple of other machines (JVC HR-D980 purchased in April 1992, Panasonic NV HD 670 in April 1998) up to the latest Super-VHS JVC HR-S8600EU that I bought in May 2001.
I got my first Digi-TV card in May 2004, so after that time I pretty much stopped recording new VHS tapes from TV programs. Then in the summer of 2005 a thunderstorm killed my S-VHS VCR, and I decided not to have it repaired. I still had the Panasonic VCR and could use it to view the great majority of my VHS tapes. I only had a couple of S-VHS tapes and thought that I won't miss those that much. The time of VHS tapes was pretty much over and all the TV channels had switched over to digital.
Then in 2006 I borrowed a friend's "one-click VHS-to-DVD" system and digitized the most important tapes using my Panasonic VCR, but sadly the quality left much to be desired. I wanted to quickly get the tapes digitized and return the device, so I did not spend much time with the digitizing. After that I gave the Panasonic VCR away, so I did not have any working VCR any more (or so I thought).
Then this summer, inspired partly by a friend of mine who inherited a lot of VHS tapes from his father and is in the process of digitizing them, I got interested in an idea of re-digitizing my old VHS tapes, hopefully this time with a better quality. After visiting DigitalFAQ I found out that JVC HR-S9600 (which is a sister model of my thunderstricken HR-S8600) is one of the best VCR machines to use for digitizing old VHS tapes. So, I called a video repair shop and asked whether they still repair old VCRs, and they agreed to take a look at it.
In the mean time I found my old VCRs (both the Sharp VC-488 and the JVC HR-D530EH) from my attic, where they had both spent more than 15 years, exposed to temperatures of -30 degress Celsius during winters. I decided to see whether the 24 years old JVC VCR would work at all (or whether it would simply blow a fuse or something when connected to mains electricity), and to my surprise it still worked fine! I originally moved it to the attic because it began eating tapes when fast-forwarding with picture, but other than that problem it was still working fine at that time.
The best thing about this old VCR still working is that between 1990 and 1992 I used this machine to record a lot of music (mainly from borrowed CDs) in LongPlay format (6 hours per E-180 VHS cassette). I have not been able to digitize them properly because the VCR head noise/distortion has been quite noticeable when playing the tapes in any other VCR machine. However, now that I can use the same machine as the one the tapes were recorded with, there is no noticeable distortion!
The video repair shop did not manage to fix my dead JVC HR-S8600 S-VHS VCR, but I just found a used one on eBay and purchased that. My plan is to possibly combine the electrical components from the eBay machine with the mechanical parts of my own little-used machine, especially if the machine I purchased turns out to be very worn-out. I had only used my machine for about 3 years before it died, so the mechanical parts should be in good condition.
Anyways, this is what I am currently doing, digitizing old VHS tapes, some of them containing only music and some also video. It is very time-consuming work, trying to come up with optimimal settings and filters, and fixing various dropouts and such from the digitized audio. No time to work on DSx86 while I am doing this. :-)
Nothing much to report, I am still on my "coding break" and focusing on my other hobbies that I have been neglecting during the past three years I have been working on DSx86. I do have some DSx86 -related news, though. Sverx, who already optimized my smooth screen scaling algorithms once before, came up with even faster method of handling this scaling. I will update the DSx86 screen scaling algorithm to use his latest invention in the next DSx86 version (when I get back to coding :-). Meanwhile you can read more details about his scaling method in his blog. Thanks again Sverx!
I have not done anything to DS2x86 during the past week. I am rather busy at work again, and I have to admit that I am starting to lose interest in working on DS2x86. It seems that I am currently pretty much the only homebrew developer programming for the SuperCard DSTwo environment. As other programmers have moved on to other platforms, also the users either have already moved on or will move on soon. Looking at the recent posts on the Supercard SDK forum it is pretty evident that without BassAceGold and myself there would not have been much happening in that scene for the past several months.
The whole NDS homebrew scene is also long past it's peak, I think it was actually on the decline already 3 years ago when I began working on DSx86. Hard to imagine I have already worked on it for more than three years.. I suppose smart phones have largely replaced the dedicated handheld gaming devices nowadays. That is somewhat sad as architecturally Nintendo DS is a pretty neat device to program for. I do want to continue working on DSx86, but I am starting to think that perhaps the Nintendo DS hardware does not offer any more the kinds of interesting challenges and learning experiences that keep me interested in my hobby projects.
It might be interesting to look into Android programming, and possibly even port DSx86 to Android smartphones at some point. The potential user base would be huge, and my x86 ARM assembler code would be simple to port to ARM-based Android phones. I'm not sure how useful an x86 emulator would be on a smart phone that does not have a keyboard, and also any reasonably new smart phone has enough power to run a port of DOSBox, so perhaps my porting DSx86 to that platform would be rather redundant.
But in any case, I feel that after three years of working on DSx86 and DS2x86 I need to take a break. I have been toying with the idea of attempting to port my old LineWars II game to Android environment. After all, I began my Nintendo DS homebrew coding "career" by first porting LineWars II to it, and only after that I started on DSx86. Perhaps it would be a good idea to move into Android programming using a similar approach. If I indeed want to start coding for Android, which I have not decided yet. I'll take a break from hobby programming, at least while I am very busy at work, and then see what would be my next hobby programming project. Or whether I find some new interesting things to do for DSx86, as that is certainly also possible.
Again a week with nothing much happening on the DSx86 front. I am rather busy at work after my vacation, and after getting home from work, watching the Olympic games interests me more than coding DS2x86. I did however do some tests as to the changes I would need to do to make Wing Commander Armada run. It looks like the biggest problem with it is that it uses VCPI (Virtual Control Program Interface), which is a part of EMM (Expanded memory Manager) features. My current EMS support in DS2x86 is missing the VCPI features, so currently Wing Commander Armada simply drops back to DOS with a message "EMS driver is not VCPI compliant".
To make my inbuilt EMS driver support VCPI, I will need to change the way I currently handle the EMS and XMS memory. Currently I have reserved a separate memory area for EMS memory (the memory that for example 4DOS uses for swapping, accessed using the EMS page frame at 0xE000 segment) and for XMS memory (the extended memory above 1MB). Almost the first thing that Wing Commander Armada does with the VCPI features is to call function 0xDE06 (Get Physical Address of Page) for the EMS page frame 0xE000. This is currently a problem, as my EMS memory (being completely separated from the XMS memory area) does not have any "physical address" that I could return to the game!
Another problem with my inbuilt EMS/XMS manager is that it is very stupid in the way it allocates memory. Currently I only have a value telling how much of the memory has been allocated, so if any game allocates many memory blocks and then frees some other blocks besides that last one, these don't actually get freed. I want to fix this problem as well when combinining the EMS and XMS memory handling to use the same memory area. This is quite a big rewrite of several routines, and it also changes many low-level memory access routines, so I will need to do this very carefully not to break anything. I hope to start working on this during the next week, when there are no more Olympics to distract me. :-)
It would also be interesting to experiment using an actual EMM386 or similar driver instead of building this functionality inside DS2x86, but I fear this might be even more work. This would easily escalate to my having to support full config.sys and autoexec.bat handling, and that is quite a big change. At some point I might try to do that, but currently I feel this is a bit too much work to start working on while I am busy at my daytime job.
The last week was my first working week after my summer vacation, and I did not have much time to work on DS2x86 during the evenings. The extra project I worked on before my summer vacation also needed some additional work done, so that too decreased my free time. Thus I only got to work on DS2x86 this weekend, and there has not been much progress. I have managed to find the code that somehow sets up the page table with wrong values, and have been comparing the behaviour with that of DOSBox. The problem here is that the behaviour seems to differ quite a bit, and I'm not sure which differences are supposed to be there (as the memory organisation is slightly different between DOSBox and DS2x86) and which are symptoms of something going wrong. The problem is that it needs a huge amount of work to determine the cause of every single difference.
It is starting to feel like I would spend my time more productively trying to get some other software besides Windows 3.11 working in DS2x86. I will probably work on some games for a while before getting back to Windows 3.11 support. One game I would like to get working is Wing Commander Armada. It does need paging and is running the actual game in Virtual 86 mode, so it will need some of the same enhancements I have already done for Windows. I think I will spend the next couple of days trying to get it running. There are also other games that would need various fixes, and in any case it would be nice to be able to release a new version at some point!
Sadly, today marks the last day of my summer vacation. The four weeks went by pretty fast again. During this week I have not worked much on DSx86 or DS2x86. I had planned some other things I wanted to do during my summer vacation, and of course I had not gotten around to those before it suddenly was the last vacation week and I really had to do them!
One interesting task I did was to replace the CMOS battery of my old Acer Travelmate 803 laptop. It had started to behave erratically with the system time, so I assumed that the CMOS battery was dead. Strangely the clock had not completely stopped working, instead after I set it to the correct time, it kept time for a little while (for a few hours), then suddenly jumped a couple of hours backwards and then stopped running.
I had purchased the laptop in May 2003, so it is over 9 years old. It just has been working so well that I have not had a need to replace it with a more modern machine. I have replaced the hard disk with a 32GB SSD disk, and I have also upgraded the RAM to 1.5GB, so the machine is reasonably fast (and what is more important to me, it is dead silent when running in Max Battery mode, regardless of CPU usage).
Anyways, I had found a blog post describing some important information about the CMOS battery, like the type (CR1220) and location (underneath the motherboard!) of it. Replacing it meant pretty much disassembling the whole thing, changing the battery, and then trying to put everything back together again. Surprisingly, no screws or other parts were left over after I had done this, and the machine even seems to work (and the clock keeps correct time)!
I was also given a heads up that NeoTeam is holding a Neo Coding Compo 2012, and it looks like also existing projects (like DSx86) are allowed to participate. Thus, I am thinking about possibly taking part in that competition. I would need to add a splash screen to DSx86, and I also would like to do some enhancements to it (I don't think the point of the competition is to simply add a splash screen and be done with it). I have not yet decided whether to take part, but I might.
Anyways, from now on I don't have all that much time to work on DSx86-related stuff, but I still try to continue working on the Windows 3.11 support for DS2x86. It would be fun to get that working, even though at the moment it seems very difficult and frustrating. But, the more difficult it is to achieve, the sweeter it feels when you finally get it working!
Thanks again for your interest in DSx86 (and for reading my blog!) :-)
Yet another week where I have been slowly trying to get Windows 3.11 to progress further. The progress just seems to get slower and slower, this time all the progress I have managed to get done fits within a few lines of ASM code! Here is the part of the code (in the KRNL386.EXE of Windows 3.11) I have been working on, with the problem locations numbered (1., 2. and 3.):
8DC8:C357 push cx push es mov ax,1687 int 2F DOS Protected-Mode Interface (DPMI) - INSTALLATION CHECK or ax,ax jnz C43C Jump to error if DPMI not installed xor bh,bh cmp cl,03 CL = processor type (02h=80286, 03h=80386, 04h=80486) jb C43C Jump to error if CPU < 386 mov bl,04 je C373 mov bl,08 8DC8:C373 mov ,bx Save processor type flag to variable mov ,di ES:DI = DPMI mode-switch entry point mov [114A],es pop ax add ax,0010 mov es,ax add si,ax xor ax,ax call far word  1. 2. Call the DPMI mode switch entry point: Switch to protected mode, ring 3 009F:C38D jc C43C Jump to error if mode switch failed mov ax,cs and al,07 cmp al,07 jne C411 Jump to error if code selector is not ring 3 LDT selector mov bx,cs mov ax,000A call 2B42 ($+67A1) DPMI 0.9+ - CREATE ALIAS DESCRIPTOR 009F:C3A1 mov [05B0],ax Save returned alias descriptor selector for CS mov bx,ds mov ds,ax mov ,bx Save original DS selector to new alias data segment mov ds,bx push es push si mov ax,168A mov si,114C DS:SI = "MS-DOS",0 int 2F 3. DPMI 0.9+ - GET VENDOR-SPECIFIC API ENTRY POINT 009F:C3B8 cmp al,8A je C411 Jump to error "KERNEL: Inadequate DPMI Server" if call not supported ...
Here are the problems I have been having listed, with the numbers corresponding to the source code above:
Okay, I have finally got a little bit forward with the Windows 3.11 support. Still no end in sight for the changes I still need to do, but at least I have managed to get some progress done. I found out the reason for the invalid VxD dynamic link call. I found a list of the device numbers from a Microsoft Knowledge Base article, and the device number 1 means the Virtual Machine Manager. The service number 0x2484 (9348) is larger than the number of services available in VMM, so this is why the blue screen occurred. That had nothing to do with VGA registers. The reason why I suspected some VGA register problem was that I got unsupported I/O port calls that did not happen in DOSBox, but it turned out that those happened while Windows was abruptly switching back to text mode to display the blue screen error message! So the VGA register problem was a symptom, not a cause.
Anyways, after quite a bit of debugging I then found where the invalid service number call happened. Windows 3.11 uses INT 20 software interrupt to perform those dynamic VxD calls. The calls are coded so that the interrupt opcode CD20 is followed by first the service number (in two bytes) and then the device number (in two bytes). Here below are two screen copies from the debugger illustrating what the problem was in DS2x86. On the left is the problem situation, where the INT 20 opcode at offset 8028DFFD is followed by the service number. Here the service number happens to be split into two 4KB pages, with the low byte being at offset 8028DFFF and the high byte at 8028E000 (which is in a different physical memory page). As Windows uses virtual memory, these two pages may not be adjacent in the physical memory, but my movzx opcode (which Windows uses to read the service number and device number within the INT 20 handler) did not handle this situation. It simply calculated the physical start address of the 16-bit value (from the offset 8028DFFF) and then read two bytes from that address. This caused the high byte to be read from whatever page happened to physically follow the current page in RAM, and this page happened to have byte 0x24 in the first offset of the page. Reading the other parameter (the device number 0x0001) in turn worked correctly, as it calculated the physical address from offset 8028E001 and correctly read the value from there, as shown on the right hand debug screen copy.
By the way, the screen copies above display another interesting (or annoying, depending on whether you are attempting to debug it!) behaviour in Windows 3.11. In many cases Windows replaces this (slow) interrupt call with an indirect function call after it has been executed once. In other words, Windows seems to use a lot of self-modifying code! For example, the opcode at offset 8028DFCE was originally a similar INT 20 call to device 0001 service 0084 (opcode bytes CD 20 84 00 01 00) but it has been replaced by a call near word [800118D0] (opcode bytes FF 15 D0 18 01 80) after it was once executed. Both of these seem to be some simulated DOS interrupt calls (as they follow a mov eax, 00000021 opcode, which loads EAX register with 0x21, which is the DOS interrupt number). You can perhaps imagine how difficult and frustrating it is to debug code that keeps changing itself while you run it!
In any case, adding a check for page split into movzx opcode handling fixed this problem, but the bigger issue remaining is that there are still a lot of other opcodes that may have the same problem. This is the reason why paging is so difficult (or more accurately, slow) to support. I would need to have a check for this split page handling in every opcode that accesses more than a single byte of RAM, but this will of course make the code much slower. Perhaps eventually I will decide to have two versions of DS2x86, one which does not support paging but is much faster, and another with full paging support but running much slower.
The next problem I ran into was that the code jumped to a real-mode address 8C80:0000, but the processor was in protected mode. So when the code there began with opcodes PUSH CS followed by POP DS, the DS register was loaded with an invalid selector. After some more debugging I realized that the address 8C80:0000 is jumped to when the WIN386.EXE loads and executes KRNL386.EXE (using the DOS LOAD AND/OR EXECUTE INT21 call). The DOS calls can not be run in actual protected mode, only real mode (or VM86 mode). So, if the processor was in actual protected mode after that call, there was certainly a problem in my implementation of it. The call should clear the condion flags when launching the new program, but I had mistakenly made it clear all flags. This meant that also the VM86 flag got cleared, and the processor went into actual protected mode. After fixing that problem the code progressed a little bit further.
The current problem is that Windows 3.11 hangs with the screen in text mode. Looking at the code where it hangs, this seems to be some sort of serious error handler, as the code drops into text mode, prints a message (which in this case is simply an empty string) and then goes into a tight loop. There is an opcode JNE that jumps into itself, so there is no chance that the code will progress further from that point. So, what I am currently trying to determine, is where and why KRNL386.EXE determines something is so badly wrong that it needs to halt the system. Again I need to compare the behaviour with DOSBox starting from the beginning of KRNL386.EXE loading and executing, so this will again probably take many days to figure out and solve.
The past week was my first summer vacation week, and after spending much of my free time with the extra project I have been working on, I wanted to have some actual vacation time for a change. So, I did not code anything for DS2x86 until today, when I finally got a bit bored with simply being lazy. Today I then began again working on the Windows 3.11 support for DS2x86. No major progress yet, I am still debugging it to see why the VGA graphics register handling differs between DOSBox and DS2x86. I believe I first need to determine the cause of this difference before I can progress further, as the problems I am currently having seem to be related to the way that Windows 3.11 accesses the graphics card.
I did however hack together a simple text-mode screen copy routine, mostly just to be able to get some screen copies of my progress to my blog. :-) This is what the actual error message looks like. I believe this is caused by some VGA register handling difference.
So, I will continue working on this issue, hopefully I will eventualy get Windows 3.11 to actually start in DS2x86!
Okay, here is the new fixed version of DSx86, which has the problems I mentioned in the previous blog post fixed. A couple of the problems were actually not in DSx86 but in the tester program, so this list is a little bit different to the list in the previous blog post. Anyways, here are the changes:
My summer vacation is starting now, so I should now have more time to work on DSx86 and DS2x86. I am still working on getting Windows 3.11 running in DS2x86, but I am somewhat stuck with it. I need to compare the behaviour to DOSBox, which is rather tedious and time-consuming work. But, I hope to now finally get some progress done with that, as I can focus on it properly. In any case, thanks again for your interest in DSx86 and DS2x86!