DSx86 - Blog

Feb 8th, 2019 - Status update

Wow, it has been a while since I last posted a blog post here! Regarding my 3DSx86 project, sadly there has not been any progress since 2016. I have been busy with my other projects. In case you are interested in learning what I am currently working on, I decided to announce my current project in this blog as well.

I am working on porting my old LineWars II game into mobile virtual reality! The game is called LineWars VR, and it will be released within this year (2019) on Oculus Go and Samsung Gear VR. If you have been following my DSx86 progress from the start, you may remember that it was the launcher program LW2.COM for my LineWars II game that I used as a test software when originally developing DSx86, and there is still the DSx86_LW2.nds program available for download here, which is running my LineWars II game within my emulator core.

You can follow my progress with LineWars VR from my dedicated web pages for that project at http://linewars.patrickaalto.com/LWVR.html. I just released an introductory trailer video for LineWars VR on YouTube. Thanks again for your interest in my projects!

Feb 21st, 2016 - 3DSx86 progress

For the past couple of weeks I have been very busy with cleaning up my old apartment, so work on 3DSx86 has been on the back burner. I did manage to implement MCGA mode blitting last weekend, and noticed that indeed writing directly to the frame buffer is too slow to be usable. The framerate was not the full 60fps, and looked like it might not even be 30fps. So, this weekend I have been working on switching to GPU-assisted screen blitting.

The problem I am facing is that I can not use the latest citro3d library for accessing the GPU, because that relies on the event system (the popInterrupt() routine I mentioned in my previous blog post), which is not usable from syscore. So, I began rewriting much of the citro3d functionality into my program. However, after spending many hours with this I still did not manage to make it work, I only got black screen and most of the time 3DSx86 did just hang.

Next, I noticed that blargSnes does not actually use citro3d, but a simple blargGL wrapper over the low-end GPU code in the ctrulib library. This would be closer to what I need, so I began porting my GPU calls to use blargGL. However, i did not manage to make this work either. There are no errors but the GPU commands just seem to have no effect, the screen stays black.

My plan for the immediate future is to create a separate small test program using blargGL, to make sure I use it correctly, and when I get that to work, I will then copy that code to 3DSx86 and call it from the syscore, to see if calling the GPU stuff from the syscore is the problem or if I simply made some other mistake in my code.

This project will not progress further during this weekend, as tomorrow my old apartment needs to be empty so I will still need to spend today cleaning it up. Hopefully starting tomorrow I will have some more time to work on 3DSx86.

Feb 7th, 2016 - 3DSx86 progress

During the week I have worked on 3DSx86 whenever I have had some free time. It has been quite interesting working on my emulation project again. After I got the text mode screen blitting routine working last weekend, the next step was to get 4DOS.COM to actually start up in my emulation core. This is what I began working on Monday as soon as I got home from work.

Using the syscore

The architecture of my emulation core is such that the main CPU emulation runs asynchronously as fast as it can in a never-ending loop, expecting outside events to tell it when to quit or run things like IRQ emulation. On bare metal environments I have used hardware IRQs to handle this, and on systems that run under a multitasking operating system I have used a separate thread to handle these outside events. I had read that on 3DS the threading is co-operative, so that my main thread should periodically call some svc-routine to allow other threads to run. However, I don't have any location in my code where it would make sense to call some thread yielding routine, so instead I decided to look into using the syscore to handle all the other stuff besides the main CPU emulation. I had read that the syscore only gives about 30% time to the user thread, but even that would mean an effectively 80MHz CPU, much faster than the 66MHz processor that handled both CPU emulation and screen blitting in DSx86.

I moved my screen blitting code to a new thread that I configured to run on the syscore, and that seemed to work otherwise fine, except that calling gspWaitForVBlank() seemed to hang the system. It did not hang completely, though, as pressing Start (my exit key) curiously did exit the program fine. Without syncing to VBlank everything worked fine, except that the cursor blinked slower than I would have expected. Running on an 80MHz CPU my text mode blitting routine (which uses a dirty buffer approach to only draw the characters that have changed) should be able to handle a 400x240 frame buffer (or more specifically a 80x25x2 dirty buffer) easily at least hundreds or even thousands of times per second.

I experimented with gfxFlushBuffers() and gfxSwapBuffers() calls, and noticed that if I leave both of them out, my routine blinks the cursor really fast, but the cursor does not always get completely drawn. Adding gfxFlushBuffers() back slows down the blinking somewhat, but the cursor looks correct. However, adding the call to gfxSwapBuffers() slows everything down to a crawl. That is strange, as I don't even use (or need) double buffering! I have no idea what the gfxSwapBuffers() does, but it seems to take a lot of time. I ended up leaving it out, as everything seemed to work nice and fast without it. It remains to be seen if I need to add it back at some point. Anyways, now that I had managed to move the screen blitting to the syscore, I was ready to work on the main thread, trying to get 4DOS.COM to run.

Getting 4DOS.COM to start up

I first coded calls to all the environment setup routines (memory and BIOS emulation initialization and such) and checked that these worked without crashing the system. That was pretty easy. Next I FTP-transfered the 4DOS.COM to my SD card and added the StartShell() call, which loads the program into memory and sets up all the emulated x86 registers ready for starting the core to run. However, all I got was a "file not found" error.

It took me a while to track down the problem. The first issue was that I had given the parameter to StartShell() as "\\3ds\\3DSx86\\4DOS.COM", as I had forgotten that my emulation core always uses Linux-style paths for the host environment ("/3ds/3DSx86/4DOS.COM"). Silly me.. But even after fixing that, my program simply crashed. I had to keep adding debug printing until I found out that it crashed when it called my DosExeLoader() routine. I had added debug prints both just before the call and as the first line in this routine, and the first print showed up but the second did not. That was a bit strange, until I looked more closely at how I had coded my routine. It started like this:

int DosExeLoader(char *namep, EXEC_BLK *exp, u16 mode, u16 fd)
{
    u16 mem, env, start_seg, asize = 0;
    u16 exe_size;
    u16 RelocTbl[32768];
    u16 fcbcode;
    int i, bytes;
    u8  *phys;
    int val;
	
    ...
}

It allocates a 64-kilobyte temporary memory block from stack for the relocation table handling. This made me wonder how much stack space the devkitARM (or ctrulib) gives programs by default. After some searching in the ctrulib sources I found out that the stack size is only 32 kilobytes! That is really too small a stack for 3DSx86, as my core puts things like the main opcode table and the page mapping table, in addition to temporary variables like this 64-kilobyte RelocTbl table, on the stack. So, how do I change the default stack size?

I noticed the ctrulib sources used a .weak directive for the __stacksize__ variable. I had not seen such a directive before and did not know what that meant. I looked it up from the GCC documentation, and learned that it tells the linker to use a "strong" symbol instead of this "weak" symbol if the object modules contain duplicate definitions of the same symbol. Normally all symbols are strong so defining duplicate symbols would cause an error. Well, I decided to define my own __stacksize__ variable and gave it a value of 128 kilobytes (for a quick test), compiled my program and tested it again, and it did not crash! It even loaded 4DOS.COM into memory without errors, great!

The next step was then to start up my emulation core. First, I moved the exit key press check to the syscore thread, so that I had a way of exiting my program cleanly. Next I added the call to run_core(), FTP-copied my new .3dsx file to the device, and launched it. Yay, 4DOS.COM started up! That was easier than I expected, it worked on the first try! I guess I have coded my pax86 core to be pretty robust and portable, if it is this easy to get to run on a new platform (as long as the CPU architecture is supported). Of course I wanted to take a picture of 4DOS.COM starting up in 3DSx86 on my old Nintendo 3DS!

Getting SYSINFO to run

Okay, now that 4DOS.COM runs, the next step would be to get Norton Sysinfo to run, so that I can get a sense of the emulation performance. The SYSINFO.EXE has been my first test program for a new platform ever since I started porting DSx86 to other platforms. It needs the emulated timer interrupts to work in order to measure the CPU speed, so this was the next logical step in my 3DSx86 work.

At first I looked into svcCreateTimer() and related calls, but I could not immediately figure out how to use them. Next, I checked the values that svcGetSystemTick() returns, and noticed that it seems to increment at the native 268MHz speed, so it would easily have sufficient resolution for my needs. However, since it looked like I would need to handle both screen blitting and timer IRQ emulation using the single syscore thread, I still needed a way to sync the screen blitting to the VBlank time. If calling the gspWaitForVBlank() does not work from the syscore thread, perhaps there is a way to directly access the same information?

I again looked into the ctrulib sources, and found a popInterrupt() routine that seemed to actually perform the stuff that the gspWaitForVBlank() call relies on. Since that routine is declared static I could not directly call it (nor access the gspEventData shared memory variable that it uses), so I had to create my own routine and also my own version of the gspEventData variable. After coding this and adding a loop calling my version of popInterrupt() to look for the VBlank event, I was able to sync my syscore screen blitting to the VSync interval, without my needing to call any svc-routines from the main core!

Next, I added some timer code based on the svcGetSystemTick(), for example while looping and waiting for the next VBlank interval I check whether it is time to launch the next emulated timer IRQ. I then copied the SYSINFO.EXE to my SD card via FTP, and changed my launch code to run it instead of 4DOS.COM (as I don't have proper keyboard input emulation yet). Starting my 3DSx86.3dsx on my Nintendo 3DS then started up SYSINFO.EXE and I was greeted by the Norton Sysinfo main screen! However, to be able to progress to the CPU speed test page in Norton Sysinfo I needed at least Enter key support.

So, the next step was to add a simple Enter key sending via emulated keyboard IRQ when I press the Nintendo A button. However, when I tested this, my program simply exited. After some more tests I noticed that it also exited after running for a little while even when I did not press any keys. I realized that this was because I had not implemented thread locking to my IRQ handling variables. On systems that run on a single core (like in DSx86) I don't need to protect access to those variables, but on multi-core platforms I need to have mutex locking around the IRQ handling variables, else I get a race condition which causes my emulation to exit (by design).

I looked into simple locking system on ctrulib, and noticed it had a LightLock locking system which seemed to be sufficient for my needs. I added some LightLocks around my critical IRQ variables, and after that I was able to go to the CPU speed test page in SYSINFO!

I took a picture of this as well, but then thought that perhaps it would be time to add a simple screen capture routine so that I can get screenshots from the actual Nintendo 3DS screen contents. I ported the screen capture code from my rpix86, and was able to capture the full 400x240 screen when running the SYSINFO CPU speed test. I cropped the image to 400x200 as the bottom 40 rows are not used in the 80x25 text mode.

So, my 3DSx86 runs at slightly less than 3 times the speed of DSx86, or at 1/10 of the actual Nintendo 3DS host CPU speed. That is pretty much as expected. There are several reasons why 3DSx86 does not run at 4 times the speed of DSx86, even though the CPU in Nintendo 3DS is 4 times faster (and even has two cores).

DSx86 runs on "bare metal", while 3DSx86 runs under a lightweight operating system.
DSx86 uses self-modifying code to speed up the emulation. This is not allowed in user mode on the Nintendo 3DS. I am not sure if this might be possible when running in privileged mode.
DSx86 has a lot of the most common code in the ITCM (Instruction Tightly Coupled Memory) of the CPU, so it never needs to access the cache or main RAM when executing that code. Nintendo 3DS does not seem to allow putting user code into ITCM.
The CPU emulation core in 3DSx86 is much bigger, as it contains all the 386 opcodes. I did experiment with adding 386 opcodes to DSx86 back in 2012, and noticed that the emulation speed dropped considerably (to about 70% of the normal DSx86 speed) as a result of this. Larger code means more frequent cache misses.

Taken all of the above into consideration, 3DSx86 actually runs at a pretty good speed! Not fast enough to run Doom at any playable speed, but it should run some less demanding 386 games fine.

Next steps

Next I need to work on the keyboard emulation, so that I can begin testing other games, and then immediately after that I need to implement the graphics mode blitting routines. I think I will also need to revisit my VBlank handling system, as the current version is prone to breaking if/when anything in the ctrulib changes. Then there are a lot of other minor issues I need to fix and/or implement before I can release 3DSx86, but things seem to be progressing nicely at least so far.

Jan 31st, 2016 - Work on 3DSx86 started!

What has happened since my last DSx86 blog post?

Okay, it has been a while since I last wrote anything on my DSx86 blog. I stopped working on DSx86 and DS2x86 in the summer of 2012, as I worked on the Raiden Legacy project by DotEmu at that time. After that project I began porting my emulation core to Android and iOS, and then in the beginning of 2013 I started porting it to Raspberry Pi for rpix86. Then later in 2013 I ported my DS2x86 version to GCW-Zero and released my zerox86 emulator.

By the end of 2013 Retro Infinity had licensed my emulation core and wanted me to port it also to the Windows Phone 8 platform, so I began working on that. By the end of 2014 I had my "pax86" emulation core running on various ARM architecture devices (Android, iOS, WP8, Nintendo DS, Raspberry Pi) and on several MIPS architecture devices (DS2x86, GCW-Zero). I have been waiting for Retro Infinity to actually release some games using my emulation core, but that has not happened yet. Last year I decided to work on a Raspberry Pi -based robot, and created my Piro robot.

However, in the autumn of 2015 I began having issues with my health, for example I had very frequent and severe asthma attacks, and finally in December I was actually hospitalized for over a week while the doctors tried to find out what is wrong with me. Finally they came up with a diagnosis, I suffered from Churg-Strauss Syndrome (CSS). This is a rather rare disease, which is why it required some thorough tests before the doctors could confirm the diagnosis. I got proper medication and my health has improved considerably during the last month. So much so that I am again feeling well enough to continue working on my hobby projects. :)

Homebrew on the Nintendo 3DS

I had not followed the Nintendo 3DS homebrew scene for almost a year. I remember seeing some preliminary hacking success stories in the beginning of 2015, but had forgotten about it during the year. A couple of weeks ago I got a couple of emails asking whether I had any plans to port DSx86 over to Nintendo 3DS. This reminded me to take a new look at the current state of the homebrew development possibilities. I noticed that devkitPro supports Nintendo 3DS, so it looked like now might actually be a good time to start working on "3DSx86".

The first step was to look into requirements for homebrew development on Nintendo 3DS. I had no idea what firmware version my old Nintendo 3DS (which I had purchased in 2011 or so) had, or what sort of hardware (flash cart or similar) is required to run homebrew code on it. I found a list of Homebrew Exploits on a 3dbrew.org site. I decided to try the Browserhax exploit, and after a couple of tries my Nintendo 3DS (which seems to be on 9.5.something firmware) was running the exploit! At that time I had not copied anything to the sdcard, so it did not progress very far. After I educated myself further using the 3dbrew web pages, I copied Menuhax on the sdcard, then ran the Browserhax, and this combination then resulted in my Nintendo 3DS booting into homebrew mode whenever I keep the left shoulder button pressed while powering it up. Easy, simple, and no need for any additional hardware! I also installed the homebrew starter kit, so I got ftBRONY for transferring my 3DSx86 (and other software) onto my Nintendo 3DS.

Work on 3DSx86

After installing devkitARM and compiling some of the 3DS example programs, copying them to my Nintendo 3DS and confirming that they work, I began building a simple test program for the initial 3DSx86 tests. I started with just a main program, my text mode screen blit assembler routine, and a 8x8 font BMP file. I had been using Grit in my DSx86 Makefile to convert BMP files to data assets for my project, but the Grit stuff was missing from the Nintendo 3DS template Makefile. Luckily cearn had written a good manual about how to add Grit to devkitARM Makefile, so using that, and also checking my Dsx86 Makefiles, I got that part working pretty quickly.

What did not work quickly was getting my simple test code to actually run! It has been a while since I worked on ARM Assembly, and it felt like my coding skills had gotten quite rusty in the meantime. You would not believe how many typos and other mistakes I could fit into a couple of dozen lines of code! Anyways, after hunting down bugs for a couple of hours I finally got something of my own making to show on the Nintendo 3DS screen! Originally I used an 8x8 font for my test, but the 400-pixel wide screen could only fit 50 characters horizontally. In DSx86 I used a 6x8 font to fit 42 characters on the 256-pixel wide screen, so I thought about switching over to that font, until I realized that 400 pixels would fit exactly 80 characters if I would use a 5-pixel wide font! So, I created a 5x8 font (which is slightly less readable than the very neat 6x8 font) and used that in my test program. That worked pretty well, and I got my "pax86" banner to show! Here is a picture showing my Nintendo 3DS running my test program, with a working blinking cursor and all! :)

After I got that simple code to run, I then copied all my "pax86" source files to the 3DSx86 project source directory and tried to compile the whole system. The timer stuff failed to compile at all (that is hardly surprising, as it has very platform-specific code, so I need to create a new version for Nintendo 3DS), but most of the other code compiled almost as-is. When I originally created the "pax86" version of my emulation core I tried to make it reasonably portable, and I seem to have succeeded pretty well. The code in 3DSx86 will be a port of pax86, which is a port of rpix86, which is a port of DSx86. So 3DSx86 will not be a direct port from DSx86. :)

Anyways, that's it for this blog post, I'll continue working on the port whenever I have time. I am currently in the process of cleaning out my old apartment as I moved into a new apartment immediately after I got home from the hospital just before Xmas, and I am also still making some renovations in the new apartment. So I am busy with other things besides 3DSx86 (or even programming), but I try to keep working on 3DSx86 as well.

Previous blog entries

See here for blog entries from July 2012 to June 2014.
See here for blog entries from January-June 2012.
See here for blog entries from July-December 2011.
See here for blog entries from January-June 2011.
See here for blog entries from July-December 2010.
See here for blog entries from January-June 2010.
See here for blog entries from 2009.

Main Page | Downloads | Credits