Friday 30 March 2018

Basic Structure of Assembly Language

Prepared by: Saad Aslam



  • Assembly language programs divide roughly into five sections
    • header
    • equates
    • data
    • body
    • closing

The Header
  • The header contains various directives which do not produce machine code
  • Sample header:
%TITLE "Sample Header"
.8086
.model small
.stack 256
Named Constants
  • Symbolic names associated with storage locations represent addresses
  • Named constants are symbols created to represent specific values determined by an expression
  • Named constants can be numeric or string
  • Some named constants can be redefined
  • No storage is allocated for these values
Equates
  • Constant values are known as equates
  • Sample equate section:
Count EQU 10
Element EQU 5
Size = Count * Element
MyString EQU "Maze of twisty passages"
Size = 0
  • = is used for numeric values only
  • Cannot change value of EQU symbol
  • EQUated symbols are not variables
  • EQU expressions are evaluated where used; = expressions are evaluated where defined
The Data Segment
  • Begins with the .data directive
  • Two kinds of variables, initialized and uninitialized.
  • Initialized variables take up space in the program's code file
  • Declare uninitialized variables after initialized ones so they do not take up space in the program's code file
Reserving space for variables
  • Sample DATA SEGMENT
.data
numRows    DB 25
numColumns DB ?
videoBase  DW 0800h
  • DB and DW are common directives (define byte) and (define word)
  • The symbols associated with variables are called labels
  • Strings may be declared using the DB directive:
aTOm DB "ABCDEFGHIJKLM"
Program Data and Storage
  • Pseudo-ops to define data or reserve storage
    • DB - byte(s)
    • DW - word(s)
    • DD - doubleword(s)
    • DQ - quadword(s)
    • DT - tenbyte(s)
  • These directives require one or more operands
    • define memory contents
    • specify amount of storage to reserve for run-time data
Defining Data
  • Numeric data values
    • 100 - decimal
    • 100b - binary
    • 100h - hexadecimal
    • '100' - ASCII
    • "100" - ASCII
  • Use the appropriate DEFINE directive (byte, word, etc.)
  • A list of values may be used - the following creates 4 consecutive words
DW 40Ch,10b,-13,0
  • A ? represents an uninitialized storage location
DB 255,?,-128,'X'
Naming Storage Locations
  • Names can be associated with storage locations
ANum DB -4
     DW 17
ONE
UNO  DW 1
X    DD ?
  • These names are called variables
  • ANum refers to a byte storage location, initialized to FCh
  • The next word has no associated name
  • ONE and UNO refer to the same word
  • X is an uninitialized doubleword
Arrays
  • Any consecutive storage locations of the same size can be called an array
X  DW  040Ch,10b,-13,0
Y  DB  'This is an array'
Z  DD  -109236, FFFFFFFFh, -1, 100b
  • Components of X are at X, X+2, X+4, X+6
  • Components of Y are at Y, Y+1, , Y+15
  • Components of Z are at Z, Z+4, Z+8, Z+12
DUP
  • Allows a sequence of storage locations to be defined or reserved
  • Only used as an operand of a define directive
DB  40 DUP(?)
DW  10h DUP(0)
DB  3 DUP("ABC")
DB  4 DUP(3 DUP (0,1), 2 DUP('$'))
Word Storage
  • Word, doubleword, and quadword data are stored in reverse byte order (in memory)
Directive      Bytes in Storage
DW 256         00 01
DD 1234567h    67 45 23 01
DQ 10          0A 00 00 00 00 00 00 00
X DW 35DAh     DA 35
Low byte of X is at X, high byte of X is at X+1
The Program Body
  • Also known as the code segment
  • Divided into four columns: labels, mnemonics, operands, and comments
  • Labels refer to the positions of variables and instructions, represented by the mnemonics
  • Operands are required by most assembly language instructions
  • Comments aid in remembering the purpose of various instructions
An example
Label    Mnemonic   Operand     Comment
---------------------------------------------------------
         .data
exCode   DB         0          ;A byte variable
myWord   DW         ?          ;Uninitialized word var.
         .code
MAIN    PROC
         mov        ax,@data   ;Initialize DS to address
         mov        ds,ax      ; of data segment
         jmp        Exit       ;Jump to Exit label
         mov        cx,10      ;This line skipped!
Exit:    mov        ah,04Ch    ;DOS function: Exit prog
         mov        al, exCode ;Return exit code value
         int        21h        ;Call DOS. Terminate prog
MAIN     ENDP                  ;End Program
         END        MAIN       ; and specify entry point
The Label Field
  • Labels mark places in a program which other instructions and directives reference
  • Labels in the code segment always end with a colon
  • Labels in the data segment never end with a colon
  • Labels can be from 1 to 31 characters long and may consist of letters, digits, and the special characters ? . @ _ $ %
  • If a period is used, it must be the first character
  • Labels must not begin with a digit
  • The assembler is case insensitive
Legal and Illegal Labels
  • Examples of legal names
    • COUNTER1
    • @character
    • SUM_OF_DIGITS
    • $1000
    • DONE?
    • .TEST
  • Examples of illegal names
    • TWO WORDS contains a blank
    • 2abc begins with a digit
    • A45.28 not first character
    • YOU&ME contains an illegal character
The Mnemonic Field
  • For an instruction, the operation field contains a symbolic operation code (opcode)
  • The assembler translates a symbolic opcode into a machine language opcode
  • Examples are: ADD, MOV, SUB
  • In an assembler directive, the operation field contains a directive (pseudo-op)
  • Pseudo-ops are not translated into machine code; they tell the assembler to do something
The Operand Field
  • For an instruction, the operand field specifies the data that are to be acted on by the instruction. May have zero, one, or two operands
NOP             ;no operands -- does nothing
INC AX          ;one operand -- adds 1 to the contents of AX
ADD WORD1,2     ;two operands -- adds 2 to the contents
                ; of memory word WORD1
  • In a two-operand instruction, the first operand is the destination operand. The second operand is the source operand.
  • For an assembler directive, the operand field usually contains more information about the directive.
The Comment Field
  • A semicolon marks the beginning of a comment field
  • The assembler ignores anything typed after the semicolon on that line
  • It is almost impossible to understand an assembly language program without good comments
  • Good programming practice dictates a comment on almost every line
Good and Bad Comments
  • Don't say something obvious, like
MOV CX,0 ;move 0 to CX
  • Instead, put the instruction into the context of the program
MOV CX,0 ;CX counts terms, initially 0
  • An entire line can be a comment, or be used to create visual space in a program
;
; Initialize registers
;
    MOV AX,0
    MOV BX,0
The Closing
  • The last lines of an assembly language program are the closing
  • Indicates to assembler that it has reached the end of the program and where the entry point is
MAIN  ENDP      ;End of program
      END MAIN  ; entry point for linker use
  • END is a pseudo-op; the single "operand" is the label specifying the beginning of execution, usually the first instruction after the .code pseudo-op
Assembling a Program
  • The source file of an assembly language program is usually named with an extension of .asm
edit myprog.asm
  • The source file is processed (assembled) by the assembler (TASM) to produce an object file (.obj)
tasm myprog produces myprog.obj
  • The object file must be linked by the linker (TLINK) to produce an executable file (.exe)
tlink myprog produces myprog.exe
Dealing with Errors
  • TASM will report the line number and give an error message for each error it finds
  • Sometimes it is helpful to have a listing file (.lst), created by using TASM with the -l option
  • The .lst file contains a complete listing of the program, along with line numbers, object code bytes, and the symbol table
Using the Debugger
  • Useful for logic errors that the assembler misses
  • See the text for a complete tutorial
  • You do not need to use the TDH386.SYS driver or the TD386.EXE debugger with the latest version of the assembler
  • To use the debugger on myprog.asm
tasm /zi myprog
tlink /v myprog
td myprog
.COM and .EXE files
  • The .COM code file format is a relic of the first version of MS-DOS
  • Not recommended for general purposes
  • All code, data, and the stack occupy one 64K segment (Borland's "tiny" model)
  • .EXE code files are more efficient in use of RAM
  • Data and code occupy separate segments
  • The programmer is responsible for setting up the data and code segments properly
Ending a Program
  • All programs, upon termination, must return control back to another program -- the operating system
  • Under MS-DOS, this is COMMAND.COM
  • This is done by doing a DOS system call
Data Transfer Instructions
  • MOV destination,source
    • reg, reg
    • mem, reg
    • reg, mem
    • mem, immed
    • reg, immed
  • Sizes of both operands must be the same
  • reg can be any non-segment register except IP cannot be the target register
  • MOV's between a segment register and memory or a 16-bit register are possible
Examples
  • mov ax, word1
    • "Move word1 to ax"
    • Contents of register ax are replaced by the contents of the memory location word1
  • xchg ah, bl
    • Swaps the contents of ah and bl
  • Illegal: mov word1, word2
    • can't have both operands be memory locations
Sample MOV Instructions
b    db  4Fh
w    dw  2048
mov bl,dh
mov ax,w
mov ch,b
mov al,255
mov w,-100
mov b,0
  • When a variable is created with a define directive, it is assigned a default size attribute (byte, word, etc)
  • You can assign a size attribute using LABEL
LoByte LABEL BYTE
aWord  DW    97F2h
Addresses with Displacements
b    db  4Fh, 20h, 3Ch
w    dw  2048, -100, 0
mov bx, w+2
mov b+1, ah
mov ah, b+5
mov dx, w-3
  • Type checking is still in effect
  • The assembler computes an address based on the expression
  • NOTE: These are address computations done at assembly time

MOV ax,b-1
will not subtract 1 from the value stored at b
eXCHanGe
  • XCHG destination,source
    • reg, reg
    • reg, mem
    • mem, reg
  • MOV and XCHG cannot perform memory to memory moves
  • This provides an efficient means to swap the operands
    • No temporary storage is needed
    • Sorting often requires this type of operation
    • This works only with the general registers
Arithmetic Instructions
ADD dest, source
SUB dest, source
INC dest
DEC dest
NEG dest
  • Operands must be of the same size
  • source can be a general register, memory location, or constant
  • dest can be a register or memory location
    • except operands cannot both be memory
ADD and INC
  • ADD is used to add the contents of
    • two registers
    • a register and a memory location
    • a register and a constant
  • INC is used to add 1 to the contents of a register or memory location
Examples
  • add ax, word1
    • "Add word1 to ax"
    • Contents of register ax and memory location word1 are added, and the sum is stored in ax
  • inc ah
    • Adds one to the contents of ah
  • Illegal: add word1, word2
    • can't have both operands be memory locations
SUB, DEC, and NEG
  • SUB is used to subtract the contents of
    • one register from another register
    • a register from a memory location, or vice versa
    • a constant from a register
  • DEC is used to subtract 1 from the contents of a register or memory location
  • NEG is used to negate the contents of a register or memory location
Examples
  • sub ax, word1
    • "Subtract word1 from ax"
    • Contents of memory location word1 is subtracted from the contents of register ax, and the sum is stored in ax
  • dec bx
    • Subtracts one from the contents of bx
  • Illegal: sub byte1, byte2
    • can't have both operands be memory locations
Type Agreement of Operands
  • The operands of two-operand instructions must be of the same type (byte or word)
    • mov ax, bh     ;illegal
    • mov ax, byte1  ;illegal
    • mov ah,'A'     ;legal -- moves 41h into ah
    • mov ax,'A'     ;legal -- moves 0041h into ax
Translation of HLL Instructions
  • B = A

mov ax,A
mov B,ax
    • memory-memory moves are illegal
  • A = B - 2*A

mov ax,B
sub ax,A
sub ax,A
mov A,ax
Program Segment Structure
  • Data Segments
    • Storage for variables
    • Variable addresses are computed as offsets from start of this segment
  • Code Segment
    • contains executable instructions
  • Stack Segment
    • used to set aside storage for the stack
    • Stack addresses are computed as offsets into this segment
  • Segment directives
.DATA
.CODE
.STACK size
Memory Models
  • .Model memory_model
    • tiny: code+data <= 64K (.com program)
    • small: code<=64K, data<=64K, one of each
    • medium: data<=64K, one data segment
    • compact: code<=64K, one code segment
    • large: multiple code and data segments
    • huge: allows individual arrays to exceed 64K
    • flat: no segments, 32-bit addresses, protected mode only (80386 and higher)
Program Skeleton
.MODEL small
.STACK 100h
.DATA
;declarations
.CODE
MAIN PROC
;main proc code
;return to DOS
ENDP MAIN
;other procs (if any) go here
end MAIN
 
·  Select a memory model
·  Define the stack size
·  Declare variables
·  Write code
·         organize into procedures
·  Mark the end of the source file
·         define the entry point
Input and Output Using 8086 Assembly Language
  • Most input and output is not done directly via the I/O ports, because
    • port addresses vary among computer models
    • it's much easier to program I/O with the service routines provided by the manufacturer
  • There are BIOS routines (which we'll look at later) and DOS routines for handling I/O (using interrupt number 21h)
Interrupts
  • The interrupt instruction is used to cause a software interrupt (system call)
    • An interrupt interrupts the current program and executes a subroutine, eventually returning control to the original program
    • Interrupts may be caused by hardware or software
  • int interrupt_number     ;software interrupt
Output to Monitor
  • DOS Interrupts : interrupt 21h
    • This interrupt invokes one of many support routines provided by DOS
    • The DOS function is selected via AH
    • Other registers may serve as arguments
  • AH = 2, DL = ASCII of character to output
    • Character is displayed at the current cursor position, the cursor is advanced, AL = DL
Output a String
  • Interrupt 21h, function 09h
    • DX = offset to the string (in data segment)
    • The string is terminated with the '$' character
  • To place the address of a variable in DX, use one of the following
    • lea   DX,theString        ;load effective address
    • mov   DX, offset theString ;immediate data
Print String Example
%TITLE "First Program -- HELLO.ASM"
        .8086
        .MODEL   small
        .STACK   256
        .DATA
msg     DB      "Hello, World!$"
        .CODE
MAIN    PROC
        mov     ax,@data     ;Initialize DS to address
        mov     ds,ax        ; of data segment
        lea     dx,msg       ;get message
        mov     ah,09h       ;display string function
        int     21h          ;display message
Exit:   mov     ah,4Ch       ;DOS function: Exit program
        mov     al,0         ;Return exit code value
        int     21h          ;Call DOS. Terminate program
MAIN    ENDP                 ;End of program
        END     MAIN         ; entry point
Input a Character
  • Interrupt 21h, function 01h
  • Filtered input with echo
    • This function returns the next character in the keyboard buffer (waiting if necessary)
    • The character is echoed to the screen
    • AL will contain the ASCII code of the non-control character
      • AL=0 if a control character was entered
An Example Program
%TITLE "Case Conversion"
    .8086
    .MODEL small
    .STACK 256
    .DATA
MSG1     DB 'Enter a lower case letter: $'
MSG2     DB 0Dh,0Ah,'In upper case it is: '
CHAR     DB ?,'$'
exCode   DB 0
    .CODE
MAIN    PROC
;initialize ds
    mov     ax,@data     ; Initialize DS to address
    mov     ds,ax        ; of data segment
;print user prompt
    mov     ah,9         ; display string fcn
    lea     dx,MSG1      ; get first message
    int     21h          ; display it
;input a character and convert to upper case
    mov     ah,1         ; read char fcn
    int     21h          ; input char into AL
    sub     al,20h       ; convert to upper case
    mov     CHAR,al      ; and store it
;display on the next line
    mov     dx,offset MSG2 ; get second message
    mov     ah,9         ; display string function
    int     21h          ; display message and upper case
;return to DOS
Exit:
    mov     ah,4Ch       ; DOS function: Exit program
    mov     al,exCode    ; Return exit code value
    int     21h          ; Call DOS. Terminate program
MAIN ENDP
    END     MAIN        ; End of 


For latest updates,subscribe my blog

0 comments:

Post a Comment