First, dw
is used to create a 16-bit ("word") value. It won't hold a 32-bit value. You'd need to use dd
to store a 32-bit "dword", or use a pair of 16-bit values.
To multiply a pair of 32-bit values the result can be 64-bit (e.g. 0xFFFFFFFF * 0xFFFFFFFF = 0xFFFFFFFE00000001). For 8086 (and not just real mode code for 80386 or later) there is a MUL instruction, but it is limited to multiplying 2 16-bit values (and getting a 32-bit result). This means that you'd want to treat each 32-bit value as a pair of 16-bit values.
If A is split into A_low (the lowest 16-bits of the first 32-bit number) and A_high (the highest 16-bits of the first 32-bit number), and B is split into B_low and B_high in the same way; then:
A * B = A_low * B_low
+ ( A_high * B_low ) << 16
+ ( A_low * B_high ) << 16
+ ( A_high * B_high ) << 32
The code might look like this (NASM syntax):
section .data
first: dw 0x5678, 0x1234 ;0x12345678
second: dw 0xDEF0, 0x9ABC ;0x9ABCDEF0
result: dw 0, 0, 0, 0 ;0x0000000000000000
section .text
mov ax,[first] ;ax = A_low
mul word [second] ;dx:ax = A_low * B_low
mov [result],ax
mov [result+2],dx ;Result = A_low * B_low
mov ax,[first+2] ;ax = A_high
mul word [second] ;dx:ax = A_high * B_low
add [result+2],ax
adc [result+4],dx ;Result = A_low * B_low
; + (A_high * B_low) << 16
mov ax,[first] ;ax = A_low
mul word [second+2] ;dx:ax = A_low * B_high
add [result+2],ax
adc [result+4],dx ;Result = A_low * B_low
; + (A_high * B_low) << 16
; + (A_low * B_high) << 16
adc word [result+6], 0 ; carry could propagate into the top chunk
mov ax,[first+2] ;ax = A_high
mul word [second+2] ;dx:ax = A_high * B_high
add [result+4],ax
adc [result+6],dx ;Result = A_low * B_low
; + (A_high * B_low) << 16
; + (A_low * B_high) << 16
; + (A_high * B_high) << 32
We don't need adc word [result+6], 0
after the second step ([first+2] * [second]
) because its high half is at most 0xfffe
. [result+4]
is already zero at that point (because this code only works once), so the adc [result+4],dx
can't wrap and produce a carry out. It can at most produce 0xffff
.
(It could be done as adc dx, 0
/ mov [result+4], dx
to avoid depending on that part of result
being already zeroed. Similarly, adc
into a zeroed register could be used for the first write to [result+6]
, to make this code usable without first zeroing result
.)
If you are actually using an 80386 or later, then it's much much simpler:
section .data
first: dd 0x12345678
second: dd 0x9ABCDEF0
result: dd 0, 0 ;0x0000000000000000
section .text
mov eax,[first] ;eax = A
mul dword [second] ;edx:eax = A * B
mov [result],eax
mov [result+4],edx ;Result = A_low * B_low