I developed a from-scratch operating system as part of my PhD thesis. We initially ran it on qemu due to its speed and simplicity, but the truth is that while qemu would run something developed on real hardware, it would also run a lot more. What it accepted was very, very loose and when you developed for qemu, you generally couldn't take it and run it on real hardware without days worth of debugging and trying to figure out where qemu did things just a little more loosely than actual hardware did.
We ended up using Bochs for that purpose, which is a lot slower but a lot more faithful (and unfortunately lacks a lot of secondary hardware that's important, but it will take you a long time in development before you hit that particular limitation).
I partially disagree. Qemu, & in general simulators, can really speed up development but for the reason you cited, you need to periodically run your code on real hardware. The second suggestion is to minimize assembly language level code -- this will reduce the temptation to use esoteric h/w specific features that Qemu may not handle! In general micro-optimizations should be left for much later, if ever. Third, as soon as possible set things up to load your kernel over the network or a serial link (or JTAG). Particularly if you use a debugger, you will be running the same kernel over and over again to debug something. Fourth, if you plan to use a debugger like gdb, provide support for remote debugging. That is not a lot of code (basically support for communication, breakpoints, peek and poke).
Actually this need for a good debug tool of some kind is a key point. In the 80s I wrote an operating system for 6809, but I did have access to an ICE. I, and I'm sure many others, wanted to write something for that new 386 SX the second it came out. I thought Linus Torvalds had access to a 386 emulator at university, but I can't find a reference. I did find this, which describes how he debugged without one:
(he would have hated the machines I use these days, they take 6 minutes to boot)
"The worst part is starting off: after you have even a minimal system you can use printf etc, but moving to protected mode on a 386 isn't fun, especially if you at first don't know the architecture very well. It's
distressingly easy to reboot the system at this stage: if the 386 notices something is wrong, it shuts down and reboots - you don't even get a chance to see what's wrong.
Printf() isn't very useful - a reboot also clears the screen, and anyway, you have to have access to video-mem, which might fail if your segments are incorrect etc. Don't even think about debuggers: no debugger I know of can follow a 386 into protected mode. A 386 emulator
might do the job, or some heavy hardware, but that isn't usually feasible.
What I used was a simple killing-loop: I put in statements like
die:
jmp die
at strategic places. If it locked up, you were ok, if it rebooted, you knew at least it happened before the die-loop. Alternatively, you might use the sound io ports for some sound-clues, but as I had no experience with PC hardware, I didn't even use that. I'm not saying this is the only way: I didn't start off to write a kernel, I just wanted to explore the 386 task-switching primitives etc, and that's how I started off (in about April-91).
After you have a minimal system up and can use the screen for output, it gets a bit easier, but that's when you have to enable interrupts. Bang, instant reboot, and back to the old way. All in all, it took about 2 months for me to get all the 386 things pretty well sorted out so that I no longer had to count on avoiding rebooting at once, and having the basic things set up (paging, timer-interrupt and a simple task-switcher to test out the segments etc)."
Don't.
I developed a from-scratch operating system as part of my PhD thesis. We initially ran it on qemu due to its speed and simplicity, but the truth is that while qemu would run something developed on real hardware, it would also run a lot more. What it accepted was very, very loose and when you developed for qemu, you generally couldn't take it and run it on real hardware without days worth of debugging and trying to figure out where qemu did things just a little more loosely than actual hardware did.
We ended up using Bochs for that purpose, which is a lot slower but a lot more faithful (and unfortunately lacks a lot of secondary hardware that's important, but it will take you a long time in development before you hit that particular limitation).