Reverse-engineering the Intel 8086 processor's HALT circuits

kens · on Jan 26, 2023

A few days ago, monocasa suggested I should look at the 8086's HLT instruction, so here it is. Let me know if you have other comments on what part of the 8086 would be interesting to read about.

https://news.ycombinator.com/item?id=34495317

bonzini · on Jan 26, 2023

Prefixes and (probably related) address generation.

String instructions would be cool, I suspect microcode to have a few features only used by them in collaboration with the decode ROMs. For example REP is probably little more than one of the conditions that the microcode can test.

rep_lodsb · on Jan 27, 2023

>For example REP is probably little more than one of the conditions that the microcode can test.

Indeed it is, and the same internal flag set by the REP prefix is also used by the multiplication and division subroutines to keep track of the sign bit. With the prefix, (I)MUL and (I)DIV will return a negated result on the 8086.

The 80186 only does this for IDIV. Mul/div on that processor is already mostly handled by dedicated logic, so the microcode is a lot shorter (and no longer uses any subroutines).

However, MUL and IMUL jump to a common exit point that can put the result into either DX:AX or an arbitrary register (for the three-operand form of IMUL), and again they used the REP flag to keep that extra bit of state. If it is set, the register is selected by the middle ModRM field. The usefulness of that undocumented "feature" is somewhat limited, since that is also part of the opcode, so the only choices for register are AH, CH, SP and BP.

Also, BOUND uses the flag as a 1-bit "loop counter" and will fail to check the upper bound when prefixed with REP.

The 80286 and everything newer either ignore the prefix or handle it similar to 0Fh: each prefixed opcode is now a separate instruction with its own microcode.

Stratoscope · on Jan 27, 2023

First time I've ever said this on HN, but just this once...

Username checks out.

And thank you for the interesting comments on these historical implementations!

pwg · on Jan 26, 2023

Another suggestion, from the previous thread: https://news.ycombinator.com/item?id=34495797

monocasa · on Jan 29, 2023

I'm just now seeing this; thanks so much for so thoroughly indulging my curiosity! It was a fantastic read.

pifm_guy · on Jan 26, 2023

So why didn't they implement the HLT instruction as simply a 'jump to self' infinite loop?

Then no special logic would be needed, no extra states, etc.

Sure - there would be no power savings, and the memory bus wouldn't be idle, but we're either of those a requirement in 1970?

kens · on Jan 26, 2023

Yes, halt is sort of redundant and processors like the 6502 omitted it. I think the historical popularity of halt was because you could indicate to the operator that the computer was halted, rather than in an infinite loop. Peripheral devices could also detect the halt state.

flohofwoe · on Jan 27, 2023

On Z80 home computers, HALT was often used to synchronise some piece of code with a hardware interrupt like VSYNC for 'race-the-beam' stuff (because an interrupt takes the CPU out of the HALT state).

caf · on Jan 26, 2023

I believe by the time the processor has halted, the IP has advanced, so when the interrupt returns it will resume executing after the HLT. An infinite loop would resume executing the loop.

bonzini · on Jan 26, 2023

Yes, and by issuing STI followed by HLT you can ensure that the after the interrupt returns the IP is always pointing past HLT. I think this is a feature that Intel had to patch because they got it wrong in the first steppings.

billforsternz · on Jan 27, 2023

1976 not 1970, in 1970 a CPU as advanced as the 8086 was science fiction (things were moving fast back then).

tadfisher · on Jan 27, 2023

Yeah, in 1970 the first mass-manufactured integrated circuits were flying to/landing on the Moon. The pace of technology back then was unreal.

rogerbinns · on Jan 26, 2023

You mention inheriting little endian from the Datapoint. If that constraint was not there, would a big endian 8086 be materially different in any way? For example could parts be simpler or fewer gates used?

myself248 · on Jan 27, 2023

Was HLT used much in regular programming, like to say "I've got nothing to do unless an interrupt comes in, might as well just HLT", perhaps?

Or, as I understand other processors of the time might not even bother to include such an instruction, was it sort of optional or rarely used?

blueflow · on Jan 27, 2023

In the IBM PC, HLT was used to wait for keyboard input, which came in as hardware interrupt and resumed execution.

flohofwoe · on Jan 27, 2023

Same on Z80 home computers like the Amstrad CPC to synchronise execution with the (video) raster or vsync interrupt.

jchw · on Jan 26, 2023

Question: why were there three HALT opcodes? does it simply fill otherwise unused opcode encodings?

kens · on Jan 26, 2023

That's a good question. I'm completely guessing, but the Datapoint probably used 0x00 and 0xff as HALT opcodes so you ended up in uninitialized or missing memory the processor would halt. Maybe 0x01 was the "intentional" halt instruction.

jchw · on Jan 26, 2023

Ah, that's a really good point. Having 0x00 be a NOP or maybe worse, instruction that actually is valid and does something, would be a hell of a lot worse for debugging, because after the fact it'd be extremely hard to figure out how you got there.

pcwalton · on Jan 26, 2023

It's also bad for security. IIRC code execution is easier on MIPS because 0x0 is a NOP.

anyfoo · on Jan 26, 2023

It's worth nothing though, that that likely wasn't much of a consideration at all at the time. Networks for one were barely a thing, at least on systems so tiny that they'd use an 8086. And even when they were, they tended to be extremely trusting until way into the 90s.

jchw · on Jan 26, 2023

Definitely. Lot easier to heap spray when most of the memory is a free nopslide.

rep_lodsb · on Jan 27, 2023

>Maybe 0x01 was the "intentional" halt instruction.

00 would be "increment A", 01 "decrement A". I would guess these weren't supported because of how that register is connected to the ALU?

FF = "load memory from memory", same as the HLT on the 8080.

ajross · on Jan 26, 2023

On modern CPUs that actually can't run at full speed for thermal reasons, they're critically important (though a complicated dance with MWAIT and a ton of drivers has supplanted HLT on x86 devices).

On microprocessors of the time, they're indeed a little useless. None of the logic was going to disable the internal clock, this was decades before the introduction of gateable power wells, etc...

But on the bigger hardware where DMA was common, a halted CPU could be relied on not to be issuing needless requests to the memory bus and other clients like I/O devices (SMP was in its infancy in the 70's too) would have lower contention and higher throughput. I'm sure that was part of the thinking. The IBM PC itself tended not to contend on its bus much (CGA and MDA had their own framebuffers and floppy DMA was mostly a joke), but maybe there were other 8086 implementations that cared.

anyfoo · on Jan 26, 2023

You get a fun reminder of that if you run MWC's Coherence in a vm today. Coherence's idle loop/task does not issue HLT, so you can happily see the CPU core the vm is running on burning away for no good reason.

flohofwoe · on Jan 26, 2023

Sometimes such 'redundant' instructions happen because of incomplete instruction decoding. For instance the ED-prefixed instruction block on the Z80 has:

- 8x NEG

- 8x RETI/RETN (named differently but same behaviour)

- 4x IM0, 2x IM1 and 2x IM2

- and a whopping 178 opcodes in the ED block decode to a NOP (no operation)

ok123456 · on Jan 26, 2023

Probably an artifact of the Datapoint's instruction decoder unit.