A few days ago, monocasa suggested I should look at the 8086's HLT instruction, so here it is. Let me know if you have other comments on what part of the 8086 would be interesting to read about.
Prefixes and (probably related) address generation.
String instructions would be cool, I suspect microcode to have a few features only used by them in collaboration with the decode ROMs. For example REP is probably little more than one of the conditions that the microcode can test.
>For example REP is probably little more than one of the conditions that the microcode can test.
Indeed it is, and the same internal flag set by the REP prefix is also used by the multiplication and division subroutines to keep track of the sign bit. With the prefix, (I)MUL and (I)DIV will return a negated result on the 8086.
The 80186 only does this for IDIV. Mul/div on that processor is already mostly handled by dedicated logic, so the microcode is a lot shorter (and no longer uses any subroutines).
However, MUL and IMUL jump to a common exit point that can put the result into either DX:AX or an arbitrary register (for the three-operand form of IMUL), and again they used the REP flag to keep that extra bit of state. If it is set, the register is selected by the middle ModRM field. The usefulness of that undocumented "feature" is somewhat limited, since that is also part of the opcode, so the only choices for register are AH, CH, SP and BP.
Also, BOUND uses the flag as a 1-bit "loop counter" and will fail to check the upper bound when prefixed with REP.
The 80286 and everything newer either ignore the prefix or handle it similar to 0Fh: each prefixed opcode is now a separate instruction with its own microcode.
Yes, halt is sort of redundant and processors like the 6502 omitted it. I think the historical popularity of halt was because you could indicate to the operator that the computer was halted, rather than in an infinite loop. Peripheral devices could also detect the halt state.
On Z80 home computers, HALT was often used to synchronise some piece of code with a hardware interrupt like VSYNC for 'race-the-beam' stuff (because an interrupt takes the CPU out of the HALT state).
I believe by the time the processor has halted, the IP has advanced, so when the interrupt returns it will resume executing after the HLT. An infinite loop would resume executing the loop.
Yes, and by issuing STI followed by HLT you can ensure that the after the interrupt returns the IP is always pointing past HLT. I think this is a feature that Intel had to patch because they got it wrong in the first steppings.
You mention inheriting little endian from the Datapoint. If that constraint was not there, would a big endian 8086 be materially different in any way? For example could parts be simpler or fewer gates used?
That's a good question. I'm completely guessing, but the Datapoint probably used 0x00 and 0xff as HALT opcodes so you ended up in uninitialized or missing memory the processor would halt. Maybe 0x01 was the "intentional" halt instruction.
Ah, that's a really good point. Having 0x00 be a NOP or maybe worse, instruction that actually is valid and does something, would be a hell of a lot worse for debugging, because after the fact it'd be extremely hard to figure out how you got there.
It's worth nothing though, that that likely wasn't much of a consideration at all at the time. Networks for one were barely a thing, at least on systems so tiny that they'd use an 8086. And even when they were, they tended to be extremely trusting until way into the 90s.
On modern CPUs that actually can't run at full speed for thermal reasons, they're critically important (though a complicated dance with MWAIT and a ton of drivers has supplanted HLT on x86 devices).
On microprocessors of the time, they're indeed a little useless. None of the logic was going to disable the internal clock, this was decades before the introduction of gateable power wells, etc...
But on the bigger hardware where DMA was common, a halted CPU could be relied on not to be issuing needless requests to the memory bus and other clients like I/O devices (SMP was in its infancy in the 70's too) would have lower contention and higher throughput. I'm sure that was part of the thinking. The IBM PC itself tended not to contend on its bus much (CGA and MDA had their own framebuffers and floppy DMA was mostly a joke), but maybe there were other 8086 implementations that cared.
You get a fun reminder of that if you run MWC's Coherence in a vm today. Coherence's idle loop/task does not issue HLT, so you can happily see the CPU core the vm is running on burning away for no good reason.
Sometimes such 'redundant' instructions happen because of incomplete instruction decoding. For instance the ED-prefixed instruction block on the Z80 has:
- 8x NEG
- 8x RETI/RETN (named differently but same behaviour)
- 4x IM0, 2x IM1 and 2x IM2
- and a whopping 178 opcodes in the ED block decode to a NOP (no operation)
https://news.ycombinator.com/item?id=34495317