Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
868 views
in Technique[技术] by (71.8m points)

assembly - Creating an x86 assembler program that converts an integer to a 16-bit binary string of 0's and 1's

As the question suggests, I have to write a MASM program to convert an integer to binary. I have tried many different approaches, but none of them helped me at all. The final code I'm working on is as follows. I get an access memory violation error when I debug my code in Visual Studio.

Any help on how to solve the error and if I'm on the right track or not will be greatly appreciated. The first code is my C++ code which passes a char array to an .asm file to be converted to binary.

#include <iostream>
using namespace std;
extern "C"
{
  int intToBin(char*);
}

int main()
{
  char str[17] = { NULL };
  for (int i = 0; i < 16; i++)
  {
    str[i] = '0';
  }

  cout << "Please enter an integer number :";
  cin >>str;
  intToBin(str);
  cout << " the equivilant binaryis: " << str << endl;
  return 0;
}

and the .asm file is the following:

.686
.model small
.code 

_intToBin PROC       ;name of fucntion
  start:    

    push ebp ; save base pointer
    mov ebp, esp ; establish stack frame

    mov eax, [ebp+8] ; stroing char value into eax
    mov ebx, [ebp+12]; adress offset of char array
    mov edx,32768 ;storin max 16bit binary in edx
    mov ecx,17  ; since its a 16 bit , we do the loop 17 times


  nextBite:
    test eax,edx        ;testing if eax is equal to edx
    jz storeZero        ;if it is 0 is to be moved into bl

    mov bl,'1'          ;if not 1 is moved into bl
    jmp storeAscBit     ;then jump to store ascii bit

  storeZero:
    mov bl,'0'          ;moving 0 into bl register

  storeAscBit:
    mov [di ],bl        ;moving bl (either 1 or 9) into [di]
    inc edx             ;increasing edx stack by 1 point to go to next bt
    shr edx,1           ;shfiting right 1 time so the 1 comes to second      
    loop nextBite       ; do the whole step again

  EndifReach:   
    pop ebp
_intToBin ENDP
 END
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is high-level answer to explain some terms.

Part 1 - about integer numbers and their encoding in computer

Integer value is integer value, in math it's purely abstract thing. Number "5" is not what you see on the monitor (that's digit 5 (graphical image or "glyph") representing value 5 in base-10 (decimal) format for humans (and some trained animals) who can recognize that glyph pattern; the value 5 itself is purely abstract).

When you use int in C++, it's not completely abstract, it's lot more hard-wired into the metal. It's 32 bit (on most of the platforms) integer value.

But still that abstract description is much closer to truth, than imagining it as human decimal format of it.

int a = 12345; // decimal number

Here a contains value 12345, not the format. It's not aware it was entered as decimal string in the source code.

int a = 0x3039; // hexadecimal number

will compile into the exactly same machine code, for CPU it's the same thing, still (a == 12345). And finally:

int a = 0b0011000000111001; // binary number

is again the same thing. It's still the same 12345 value, just written in different formatting.

The last binary form is closest to what CPU is using to store the value. It is stored in 32 bits (low/high voltage cells/wires), so if you would measure voltage on particular cell/wire, you would see the "0" voltage level on top 18 bits, then 2 bits with "1" voltage level, and then the rest like in that binary format above... With two least significant bits being "0" and "1".

Also most of CPU circuitry is not aware of particular value of particular bit, that's again "interpretation" of that 0/1, done by the code. Many CPU algorithms like add or sub work "from right to left" over all bits, not being aware that currently processed bit is representing in final integer value for example 213 value (that's the 14th least significant bit).

It's upon taking those bits, and calculating string with decimal/hexadecimal/binary representation of those bit values, when you give those "1"s their value. So then it becomes text "12345".

If you treat those 32 bits in different way, like representation of ON/OFF LED lights for a LED display panel, then so it will be, once you send it from CPU to the display, the LED display panel will turn on corresponding LED lights, not caring that those bits form also 12345 value when treated as int.

Only very few CPU instructions work in a way, where they need to be aware of particular value of particular bit.

Part 2 - about input, output and arguments of C/C++ functions

You want to "convert decimal integer (input) to binary."

So let's reason what is input and what is output. Input is taken from std::cin, so the user will enter string.

Yet if you will do:

int inputNum;
std::cin >> inputNum;

You will end with already converted integer value (32 bits, see above) (or invalid std::cin state, when user will not enter correct number, probably not your task to handle this).

If you have the number in int, the binary conversion was already done by the clib, when it was encoding user input string as 32 bit integer.

Now you can create asm function with C prototype:

void formatToBinary(uint16_t value, char result[17]);

That means you will give it uint16_t (unsigned 16 bit) integer value, and pointer to 17 reserved bytes in memory, where you will write '0' and '1' ASCII characters, and terminate it by another 0 value (for rough description of this one follow my first link in comments under your question).

If you must take input as string, ie.

char str[17];
std::cin > str;

Then you will have in str (after "12345" input) bytes with values: '1' (49 in decimal), '2', '3', '4', '5', 0. (Note the last one is zero, NOT ASCII digit '0' = value 48).

You will need first to convert these ASCII bytes into integer value (in C++ atoi may help, or one of few other functions for conversions/formatting). In ASM check SO for questions "how to enter integer".

Once you will convert it to integer value, you can proceed the same way as described a bit above (at that moment it's already encoded in 16 or 32 bits, so outputting string representation of it should be easy).

You may still run into some tricky parts, like if you don't want to output leading zeroes, etc... but all of that should be easy, if you understand how this works.

In this case your ASM function prototype may be only void convertToBinary(char*); to reuse the string pointer both as input, and output.

Your int intToBin(char*); looks weird, because it means the ASM will return int .. but why? That's integer value, not bonded into any particular formatting, so it's binary/octal/decimal/hexa at the same time. Depends how you display it. So you don't need it, you need only the string representing the value in binary form, that's that char *. And you don't give it the number you entered (unless it's taking it from the string).


From the task description and your skill level I think you are allowed to convert the input into int right in C++ (ie. std::cin >> int_variable;).


BTW, if you fully understand what is happening to values in computer, and how CPU instruction work over them, you can often come with many different ways how to achieve some result. For example Jose's conversion to binary is written in simple way how an Assembly newcomer would write it (he wrote it like that to make it easier for you to understand):

           mov eax, num   // ?■■ THE NUMBER.
           lea edi, bin   // ?■■ POINT TO VARIABLE "BIN".
           mov ecx, 32    // ?■■ NUMBER IS 32 BITS.
        conversion:
            shl eax, 1     // ?■■ GET LEFTMOST BIT.
            jc  bit1       // ?■■ IF EXTRACTED BIT == 1
            mov [edi], '0'
            jmp skip
        bit1:
            mov [edi], '1'
        skip :
            inc edi   // ?■■ NEXT POSITION IN "BIN".
            loop conversion

It's still a bit fragile, for example he initializes "bin" in such way, that it contains 32 spaces and 33th value is zero (null terminator of C string). Then in code he does modify exactly 32 bytes, so the 33th zero is still there and working. If you would adjust his code to skip leading zeroes, it would "break" by displaying remaining part of buffer, as he doesn't set null terminator explicitly.

This is common way how to code in Assembly for performance, to be exactly aware of everything happening, and not setting values which are already set/etc. While you are learning, I would suggest you to work in "defensive" way, rather doing some wasteful things, which will work as safety net in case of some mistake, so I would add mov byte ptr [edi],0 after loop to set terminator explicitly again.

But it is actually not very fast, as it is using branching. CPU doesn't like that, decoding new instructions is a costly task, and if it is not sure, which instructions will be executed, it simply decodes ahead one path, and in case of wrong guess, it will throw it out, and decode the correct path, but that means several cycles pause in execution, until first instruction of new path is fully decoded and ready for execution.

So when coding for performance, you want to avoid hard-to-predict branches (the final loop is easy to predict for CPU, as it always loops, only until final exit after ecx is 0). One of many possible ways in this case can be:

   mov edx, num
   lea edi, bin
   mov ah,'0'/2   // for fast init of al later
   // '0' is 48 (even), '0'/2 will work (24)
   mov ecx, 32    // countdown counter
conversion:
   mov al,ah      // al = '0'/2
   shl edx, 1     // most significant bit into CF
   adc al,al      // al = '0'/2 + '0'/2 + CF = '0' or '1'
   stosb          // store the '0' or '1' to [edi++]
   dec ecx        // manually written "loop"
   jnz conversion // (it is faster on modern CPUs)
   mov [edi],ch   // explicit set of null-terminator
       // (ch == 0, because here ecx == 0)

As you can see, now there is no branching except the loop, CPU branch prediction will handle this much more smoothly, and the performance will be considerably better.


A dword variant for discussion with Cody (NASM syntax, 32b target):

; .data
binNumber   times 36 db 0

; .text
numberToBin:
    mov     edx,0x12345678
    lea     edi,[binNumber]
    mov     ecx, 32/4       ; countdown counter
n2b_conversion:
    mov     eax,0b11000000110000001100000011000
      ; ^ will become '0'/'1' for each of four bits
    shl     edx,1
    rcr     eax,8
    shl     edx,1
    rcr     eax,8
    shl     edx,1
    rcr     eax,8
    shl     edx,1
    rcr     eax,8
      ; here was "or eax,'0000'" => no more needed.
    stosd
    dec     ecx
    jnz     n2b_conversion
    mov     [edi],dl        ; null terminator
    ret

Didn't profile it, just verified it return correct result.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...