=======================================
Assignment #06 - Relocation and Linking
=======================================
- Ian! D. Allen - idallen@idallen.ca - www.idallen.com

On the "LMCLink.htm" page in the course notes you will find an LMC main
program and two subroutines "Pause" and "Dble".

Questions based on this web page:

1)  Provide a list of the contents of each of the (relocated) three
    external tables that would be associated with the object code for each
    of the three modules Main, Pause, and Dble.  (That means 3 tables
    for each of the three modules: 9 tables in all; some of the tables
    may be empty.)  The tables should reflect the relocated locations
    of the modules loaded in the given order: Main, Pause, Dble.
    Hint: All this information is already worked out in the web page.
ANSWER:

  The whole link process involves three steps:
   (1) copying module code into memory
   (2) relocating the code and tables for the new memory locations
   (3) resolving external references using public symbols

  Looking at each of the modules, we can build the nine tables:

  MAIN (loads starting at 00, length: 16):
    Public Symbol Table:
      no public symbols given (an omission - MAIN should be defined)
      (if there were a "MAIN" public symbol, it would be at 00)
    External Reference Table:
      PAUSE  02
      ARG    04
      DBLE   05
    Relocatable Instruction Table:
      00, 03, 06, 07, 08, 09, 11

  PAUSE (loads starting at 16, length: 9):
    Public Symbol Table:
      PAUSE  (01 + 16 =) 17
    External Reference Table:
      no external references
    Relocatable Instruction Table:
      ([01, 02, 03, 05, 06] + 16 =) 17, 18, 19, 21, 22 

  DBLE (loads starting at 25, length: 8):
    Public Symbol Table:
      DBLE   (01 + 25 =) 26
      ARG    (02 + 25 =) 27
    External Reference Table:
      PAUSE  (06 + 25 =) 31
    Relocatable Instruction Table:
      ([01, 03, 04, 07] + 25 =) 26, 28, 29, 32

============================================================================

2)  Assuming the modules were loaded in the new order Dble, Pause, Main
    (with the Dble module loading starting at location zero, followed by
    Pause and then by the Main module), generate the nine new relocated
    tables that would be associated with the object code for each of
    the three modules Dble, Pause, and Main.   (That means 3 tables for
    each of the three modules: 9 tables in all; some of the tables may
    be empty.)  The tables should reflect the relocated locations of
    the modules loaded in the given order: Dble, Pause, Main.
ANSWER:

We load in the order DBLE, PAUSE, MAIN:

  DBLE starts at 00 and is 8 instructions, which means Pause loads at 08.
  PAUSE starts at 08 and is 9 instructions, which means Main loads at 17.

Having loaded the modules in memory, the linker notes the actual memory
address where each module was loaded and compares it with the memory
address where the module was originally built to start.

Here's a summary of where the modules were originally built to load,
and where the linker actually loaded them in memory:

    MODULE NAME   BUILT TO LOAD AT     ACTUALLY LOADED STARTING AT ADDRESS
    -----------   ----------------     -----------------------------------
    DBLE          00                   00
    PAUSE         00                   08
    MAIN          00                   17

The difference is the amount by which the addresses in the module need
to be relocated:

  Relocation Offsets for each module:

    DBLE:  built to start at "00", actually loaded at 00
           - relocated by 00 - 00 = +00

    PAUSE: built to start at "00", actually loaded at 08
           - relocated by 08 - 00 = +08

    MAIN:  built to start at "00", actually loaded at 17
           - relocated by 17 - 00  = +17

Now that we have the relocation offset for each of the three modules,
we can relocate all the addresses that need relocation for each module.

Looking at each of the modules, we can build the nine tables.  Since all
the tables for each module contain addresses inside the modules, all the
addresses inside all the tables must be relocated when a module loads
at a non-zero memory address.  Here are the nine tables in relocated form:

  DBLE (loads starting at 00, length: 8):
    Public Symbol Table (after relocation):
      DBLE   (01 + 00 =) 01
      ARG    (02 + 00 =) 02
    External Reference Table (after relocation):
      PAUSE  (06 + 00 =) 06
    Relocatable Instruction Table (after relocation):
      ([01, 03, 04, 07] + 00 =) 01, 03, 04, 07

  PAUSE (loads starting at 08, length: 9):
    Public Symbol Table (after relocation):
      PAUSE  (01 + 08 =) 09
    External Reference Table (after relocation):
      no external references
    Relocatable Instruction Table (after relocation):
      ([01, 02, 03, 05, 06] + 08 =) 09, 10, 11, 13, 14

  MAIN (loads starting at 17, length: 16):
    Public Symbol Table (after relocation):
      no public symbols given (an omission - MAIN should be defined)
      (if there were a "MAIN" public symbol, it would be at 00+17=17)
    External Reference Table (after relocation):
      PAUSE  (02 + 17 =) 19
      ARG    (04 + 17 =) 21
      DBLE   (05 + 17 =) 22
    Relocatable Instruction Table (after relocation):
      ([00, 03, 06, 07, 08, 09, 11] + 17 =) 17, 20, 23, 24, 25, 26, 28

The above nine tables are now completely relocated and correctly refer
to the three modules where the linker loaded them into memory.
  
============================================================================

3)  Using the tables from the previous question, generate the relocated
    and linked object code for the Main module loaded at its new address.
    Show where the Main module actually loads into memory, and show
    the original object code and the modified (relocated and linked)
    code for the Main module in its new location.
ANSWER:

We'll do the relocation and linking for all three modules, since it
isn't much more work.  From the previous questions, we already have
the nine tables we need to do the actual relocation and linking of all
three modules.  We have to imitate the operation of the Linker, which
means we have three basic steps:

     1) Copy the modules into memory and observe where each gets loaded.
     2) Relocate the tables and the instruction codes in memory.
     3) Resolve external addresses for the code in memory.

-----------------------------------
STEP ONE - COPY MODULES INTO MEMORY
-----------------------------------

First the linker copies all the object code into memory in the given
order - DBLE, PAUSE, MAIN - without any modifications:

     LOCATION    Contents
    ---------    --------
    ------------------------ start of DBLE module (00)
     00:         000      
     01:         903        <-- this is where the DBLE public label is
     02:         000         
     03:         102
     04:         302
     05:         600
     06:         000        <-- this is "CALL PAUSE"
     07:         900
    ------------------------ end of DBLE, start of PAUSE module (08)
     08:         000
     09:         107        <-- this is where the PAUSE public label is
     10:         408
     11:         207
     12:         801
     13:         901
     14:         900
     15:         000
     16:         001
    ------------------------ end of PAUSE, start of MAIN module (17)
     17:         113        <-- if MAIN had a label "MAIN", it would be here
     18:         600
     19:         000        <-- this is "CALL PAUSE"
     20:         113
     21:         200        <-- this is "STO ARG"
     22:         000        <-- this is "CALL DBLE"
     23:         113
     24:         314
     25:         213
     26:         415
     27:         802
     28:         900
     29:         700
     30:         000
     31:         001
     32:         010

The above list shows the machine instructions loaded sequentially in
memory, one after the other, before any linking or relocation is done.
I've added some comments to show where some key instructions and
locations are.  MAIN didn't have a public label "MAIN"; if it had,
the label would be pointing to the first executable instruction of MAIN.

Be careful to distinguish the different uses of the names DBLE and PAUSE:

    The *module* DBLE is 8 instructions long and is loaded starting at
    address 00.  The *label* DBLE is a symbolic public name for a single
    address inside the DBLE module.  They are not the same.

    The *module* PAUSE is 9 instructions long and is loaded starting
    at address 08.  The *label* PAUSE is a symbolic public name for a
    single address inside the PAUSE module.  They are not the same.

Next: Relocating the code we copied into memory and fixing the tables.

---------------------------------------
STEP TWO - RELOCATE THE CODE AND TABLES
---------------------------------------

We have two places where this relocation work must be done: (1) the
three tables associated with each module and (2) the code for the
module that we just loaded into memory.  (Recall: The tables contain
relocatable addresses because the addresses in the tables refer to
addresses inside the modules.  When the modules move in memory, the
values in the tables must be relocated to refer to the new locations.)

We've already relocated the addresses in the tables in the previous
question.  We'll need the tables to help us find the relocatable addresses
in the code in memory.

Again, be sure about the difference between the name of a module and
the name of a label inside that module:

    The Public Symbol "DBLE" is now located at mailbox 01; the whole
    *routine* that makes up DBLE was loaded starting at mailbox 00.
    Module DBLE is loaded at mailbox 00, with public label "DBLE" at
    mailbox 01.

    The Public Symbol "PAUSE" is now located at mailbox 09, the whole
    *routine* that makes up PAUSE was loaded starting at mailbox 08.
    Module PAUSE is loaded at mailbox 08, with public label "PAUSE"
    at mailbox 09.

    Module MAIN loaded at mailbox 17, with no public label.  (If there were
    such a label, "MAIN", it would be at mailbox 17, the start of the code.)

Having relocated all the tables, we must now relocate the addresses for
the actual code for each module that is in memory, using the Relocatable
Instruction Table for each module to tell us which instructions in
memory to relocate.  Using those three Relocatable Instruction tables,
we generate the following 16 changes to the object code in memory:

Using the addresses in three Relocatable Instruction Tables:

    LOC  BEFORE  AFTER  REASON FOR CHANGE
    ---  ------  -----  -------------------------------------------
    01:  903     903    ; added 00 for relocation of DBLE from 00 to 00
    03:  102     102    ; same
    04:  302     302    ; same
    07:  900     900    ; same
    --------------------------
    09:  107     115    ; added 08 for relocation of PAUSE from 00 to 08
    10:  408     416    ; same
    11:  207     215    ; same
    13:  901     909    ; same
    14:  900     908    ; same
    --------------------------
    17:  113     130    ; added 17 for relocation MAIN from 00 to 17
    20:  113     130    ; same
    23:  113     130    ; same
    24:  314     331    ; same
    25:  213     230    ; same
    26:  415     432    ; same
    28:  900     917    ; same

Those are the 16 instructions that needed relocating, as given by the
three Relocatable Instruction tables.

Next: Resolving external references.

------------------------------------------
STEP THREE - RESOLVING EXTERNAL REFERENCES
------------------------------------------

Only MAIN and DBLE have an External References table - module PAUSE does
not refer to any labels outside the module.

Let's resolve the external references for DBLE first:

The (relocated) table of External References for DBLE shows "PAUSE 06".
The linker must fix location 06 in memory with the real address of
the PAUSE Public Symbol at link time.  The linker looks in all the
Public Symbol tables for some symbol named "PAUSE".  It finds a Public
Symbol named "PAUSE" in the Public Symbol Table for the module PAUSE;
in that table the label "PAUSE" is shown at location 09.

When the linker runs, it will use the relocated External Reference
information "PAUSE 06" to go to memory location 06 (inside DBLE)
and it will change the "000" at location 06 to "009" ("CALL PAUSE"),
because it now knows that the Public Symbol "PAUSE" is located at
memory address 09.  (Note that the linker is resolving a reference to
the *label* PAUSE at location 09, not the location of the whole PAUSE
module at location 08.)

This is the change the linker makes to the machine code for DBLE loaded
in memory:

  LOC  BEFORE  AFTER  REASON FOR CHANGE
  ---  ------  -----  ----------------------------------
  06:  000     009    ; address of Public Symbol "PAUSE" linked in (added)

The Pause module has no external references to resolve.

Let's resolve the external references for MAIN next:

The (relocated) table of External References for MAIN has three entries:

  PAUSE  19
  ARG    21
  DBLE   22

The linker must fix location 19 in memory with the real address of the
"PAUSE" Public Symbol at link time.  It must fix location 21 by adding
in the address of the "ARG" Public Symbol.  It must fix location 22 by
adding in the address of the "DBLE" Public Symbol.  It goes looking for
these symbols in all the Public Symbol tables:

    The linker looks in all the Public Symbol tables for some symbol
    named "PAUSE".  It finds a Public Symbol named "PAUSE" in the Public
    Symbol Table for the module PAUSE; in that table the label "PAUSE"
    is shown at location 09.

    The linker looks in all the Public Symbol tables for some symbol named
    "ARG".  It finds a Public Symbol named "ARG" in the Public Symbol
    Table for module DBLE; in that table the label "ARG" is shown at
    location 02.

    The linker looks in all the Public Symbol tables for some symbol named
    "DBLE".  It finds a Public Symbol named "DBLE" in the Public Symbol
    Table for module DBLE; in that table the label "DBLE" is shown at
    location 01.

When the linker runs, it will use the (relocated) External Reference
information "PAUSE 19" to go to memory location 19 (inside MAIN) and it
will change the "000" at that location to "009" ("CALL PAUSE"), because
it now knows that the label "PAUSE" is located at memory address 09.
The linker will also go to memory location 21 and add the address where
it found the Public Symbol "ARG" (location 02).  The linker will also
go to memory location 22 and add the address where it found the Public
Symbol "DBLE" (location 01).

Using that new, (relocated) External References table for MAIN, we
make these next three changes to the machine codes loaded in memory:

  LOC  BEFORE  AFTER  REASON FOR CHANGE
  ---  ------  -----  ----------------------------------
  19:  000     009    ; address of "PAUSE" (09) linked in (added)
  21:  200     202    ; address of "ARG" (02) linked in (added)
  22:  000     001    ; address of "DBLE" (01) linked in (added)

Note how the addresses of PAUSE and DBLE are simply *added* to the
address fields of the instructions.  In most cases, the existing
address field is zero (a direct reference to the external symbol).

We have made four more changes to the in-memory codes to resolve the
four external references.  That's it for resolving External References.

We're done.  We have relocated nine tables and 16 instructions in
memory, and we have made 4 more changes to instructions in memory to
resolve external symbols.

The final result in memory, all three modules fully Linked (4 changes)
and Relocated (16 changes) looks like this (20 changed instructions
total):

R = Relocated Instruction
E = External Symbol resolved (linked)

Original  New      Reason for change
-------   -------  -------------------------------------------
-------   -------  ; start of DBLE
00: 000   00: 000
01: 903   01: 903  ; R added 00 for relocation from 00 to 00
02: 000   02: 000         
03: 102   03: 102  ; R added 00 for relocation from 00 to 00
04: 302   04: 302  ; R added 00 for relocation from 00 to 00
05: 600   05: 600
06: 000   06: 009  ; E address of "PAUSE" (09) linked in (added)
07: 900   07: 900  ; R added 00 for relocation from 00 to 00
-------   -------  ; start of PAUSE
00: 000   08: 000
01: 107   09: 115  ; R added 08 for relocation from 00 to 08
02: 408   10: 416  ; R added 08 for relocation from 00 to 08
03: 207   11: 215  ; R added 08 for relocation from 00 to 08
04: 801   12: 801
05: 901   13: 909  ; R added 08 for relocation from 00 to 08
06: 900   14: 908  ; R added 08 for relocation from 00 to 08
07: 000   15: 000
08: 001   16: 001
-------   -------  ; start of MAIN
00: 113   17: 130  ; R added 17 for relocation from 00 to 17
01: 600   18: 600
02: 000   19: 009  ; E address of "PAUSE" (09) linked in (added)
03: 113   20: 130  ; R added 17 for relocation from 00 to 17
04: 200   21: 202  ; E address of "ARG" (02) linked in (added)
05: 000   22: 001  ; E address of "DBLE" (01) linked in (added)
06: 113   23: 130  ; R added 17 for relocation from 00 to 17
07: 314   24: 331  ; R added 17 for relocation from 00 to 17
08: 213   25: 230  ; R added 17 for relocation from 00 to 17
09: 415   26: 432  ; R added 17 for relocation from 00 to 17
10: 802   27: 802
11: 900   28: 917  ; R added 17 for relocation from 00 to 17
12: 700   29: 700
13: 000   30: 000
14: 001   31: 001
15: 010   32: 010        

That's all.  Assuming we start running this code at the beginning of MAIN
(loction 17), the above code will now run as a coherhent unit, with all
the address references correct.