Analysis of scatter-loading mechanism and address-independent compilation of microcomputer (3)

Created at 2020-12-10 22:01:43
//.S file to find the corresponding location (you can find it according to .map)

//1. The code corresponding to the main function in main.c (seeing the $Super$$main operation mentioned earlier, it can be seen that the main that the compiler sees is different from the main that we see)
//Main=$Sub$$main+$Super$$main+main seen by the compiler, it combines these three, and KEIL opens all three interfaces to us
//Among them, $Super$$main has no practical effect, the effect is to jump back to main, and $Sub$$main and main have real functions
//On this point, you should know: all i.$Sub$$main or i.main prefixed with i. are functions compiled from .c/.s
i.main
$Super$$main
    0x00030868:    b51c        ..      PUSH     {r2-r4,lr}
    0x0003086a:    482c        ,H      LDR      r0,[pc,#176] ; [0x3091c] = 0x1b0c
    0x0003086c:    4448        HD      ADD      r0,r0,r9
    0x0003086e:    6801        .h      LDR      r1,[r0,#0]
    0x00030870:    2000        .       MOVS     r0,#0
    0x00030872:    4788        .G      BLX      r1
    0x00030874:    4829        )H      LDR      r0,[pc,#164] ; [0x3091c] = 0x1b0c
    0x00030876:    4448        HD      ADD      r0,r0,r9
    0x00030878:    6841        Ah      LDR      r1,[r0,#4]
    0x0003087a:    2000        .       MOVS     r0,#0
    0x0003087c:    4788        .G      BLX      r1
;; Omit other codes
$d
    0x0003091c:    00001b0c    ....    DCD    6924
;; Omit other codes
    
//2. The following is the dynamic initialization code formed in main.c (used to perform the final operation of rwpi at runtime: materialization)
.text
__sta__dyninit
    0x00001d40:    4803        .H      LDR      r0,[pc,#12] ; [0x1d50] = 0x21ac3
    0x00001d42:    4478        xD      ADD      r0,r0,pc
    0x00001d44:    4903        .I      LDR      r1,[pc,#12] ; [0x1d54] = 0x1b0c
    0x00001d46:    4449        ID      ADD      r1,r1,r9
    0x00001d48:    6008        .`      STR      r0,[r1,#0]
    0x00001d4a:    6048        H`      STR      r0,[r1,#4]
    0x00001d4c:    4770        pG      BX       lr
$d
    0x00001d4e:    0000        ..      DCW    0
    0x00001d50:    00021ac3    ....    DCD    137923
    0x00001d54:    00001b0c    ....    DCD    6924
;; Omit other codes


//3. How to call __sta__dyninit
//__main -> __rt_entry -> __rt_lib_init -> __cpp_initialize__aeabi_
//This operation is executed after __scatterload
.text
__cpp_initialize__aeabi_
    0x00002598:    b570        p.      PUSH     {r4-r6,lr}
    0x0000259a:    4c06        .L      LDR      r4,[pc,#24] ; [0x25b4] = 0x35ae0
    0x0000259c:    447c        |D      ADD      r4,r4,pc
    0x0000259e:    4d06        .M      LDR      r5,[pc,#24] ; [0x25b8] = 0x35b10
    0x000025a0:    447d        }D      ADD      r5,r5,pc
    0x000025a2:    e003        ..      B        0x25ac ; __cpp_initialize__aeabi_ + 20
    0x000025a4:    6820         h      LDR      r0,[r4,#0]
    0x000025a6:    4420         D      ADD      r0,r0,r4
    0x000025a8:    4780        .G      BLX      r0
    0x000025aa:    1d24        $.      ADDS     r4,r4,#4
    0x000025ac:    42ac        .B      CMP      r4,r5
    0x000025ae:    d1f9        ..      BNE      0x25a4 ; __cpp_initialize__aeabi_ + 12
    0x000025b0:    bd70        p.      POP      {r4-r6,pc}
$d
    0x000025b2:    0000        ..      DCW    0
    0x000025b4:    00035ae0    .Z..    DCD    219872
    0x000025b8:    00035b10    .[..    DCD    219920
;; Omit other codes
.init_array
Region$$Table$$Limit
SHT$$INIT_ARRAY$$Base
    0x00038080:    fffc8161    a...    DCD    4294738273
.init_array
    0x00038084:    fffc81a1    ....    DCD    4294738337
.init_array
    0x00038088:    fffc838d    ....    DCD    4294738829
.init_array
    0x0003808c:    fffc83bd    ....    DCD    4294738877
.init_array
    0x00038090:    fffc8425    %...    DCD    4294738981
.init_array
    0x00038094:    fffc8d3d    =...    DCD    4294741309
.init_array
    0x00038098:    fffc9109    ....    DCD    4294742281
.init_array
    0x0003809c:    fffc9481    ....    DCD    4294743169
.init_array
    0x000380a0:    fffc95fd    ....    DCD    4294743549
.init_array
    0x000380a4:    fffc96fd    ....    DCD    4294743805
.init_array
    0x000380a8:    fffc9881    ....    DCD    4294744193
.init_array
    0x000380ac:    fffc9c29    )...    DCD    4294745129
.init_array
    0x000380b0:    fffc9c91    ....    DCD    4294745233    ;;Note that this is the data that needs to be dynamically loaded in my main function
.init_array
SHT$$INIT_ARRAY$$Limit

//4. Start analysis
a. During the compilation and linking period, the address-independent compilation technology is used, so the actual running address of our function is uncertain, then it is impossible to simply pass the previously seen dispersion
Load to achieve the "relocation" of data to realize the initialization of variables in ram. It can only be determined after the .bin file is loaded into the flash and executed.
b. So the linker links a __cpp_initialize__aeabi_() function (located in the library "../clib/arm_runtime.c" provided by KEIL. 
The __scatter() responsible for scatter loading is located in the library "../clib/angel/scatter.s", which is an assembly file, so this leads to the fact that __scatter() can not be used With the stack, it can be executed early, and __cpp_initialize__aeabi_() must wait until the stack is initialized before proceeding, so it must be after __scatter().
c. Then we look at how __cpp_initialize__aeabi_() is executed to __sta__dyninit() in each file.
It’s still an old routine, just analyze it again, but this time we directly write the corresponding c code (because this function was originally compiled by c, and the assembly code is also very clear.
Obviously see the traces: after entering the function, protect the registers and lr to be used in the function, and then use r0~r3 as intermediate variables during the whole process, and then when returning, directly pop lr from the stack to pc)


int32_t FuncTable[] @"SHT$$INIT_ARRAY$$Base" = 
{
    0xfffc8161,
    0xfffc81a1,
    0xfffc838d,
    0xfffc83bd,
    0xfffc8425,
    0xfffc8d3d,
    0xfffc9109,
    0xfffc9481,
    0xfffc95fd,
    0xfffc96fd,
    0xfffc9881,
    0xfffc9c29,
    0xfffc9c91, //The relative offset of __sta__dyninit() in main.c and "here"
};
void __cpp_initialize__aeabi_(void)
{
    int32_t start;
    int32_t end;
    void (*func)(void);
    
    //Due to the pipeline mechanism of Cortex-M, the pc here is actually the current execution instruction +4
    start = 0x00035ae0 + 0x000025a0; //=0x00038080  //r4
                                     //=&FuncTable
    end   = 0x00035b10 + 0x000025a4; //=0x000380b4  //r5
                                     //=&FuncTable + sizeof(FuncTable)
    while(1)
    {
        if(start == end)
        {
            break;
        }
        else
        {
            func  = *start;
            func += start; //The actual offset stored in FuncTable is the offset relative to the corresponding element in FuncTable
            func();
            
            start += 4;
        }
    }
}
Let us calculate the following:
.init_array
    0x000380b0: fffc9c91 .... DCD 4294745233 ;; Note that this is the data that needs to be dynamically loaded in my main function
Both are complements, just add them directly: 0xfffc9c91 + 0x000380b0 = 0x00001d41 (found that it is the first address of __sta__dyninit(), but bit[0]=1)
Regarding to the problem of bit[0]=0/1, it is easy to understand if you are familiar with the ARM instruction set. 
d. The above has completed the dynamic loading of ram. In summary, the actual address of the function will only be determined during operation, so these variables can only be assigned after operation.
e. Then we use this variable in the main function, it is very simple: add the base address of r9 and use it.
f. Regarding dynamic loading, it will be explained in detail in the address-independent compilation section. Here is just a simple process, knowing that all data can be allocated and initialized with reasonable evidence
That's it, the work is done before the main function is executed.

2. Summary of scatter loading mechanism

Originally, scatter-loading only refers to __scatter(), and I am accustomed to include dynamic loading as a scatter-loading category, because they all initialize ram. No matter how you divide it, it doesn't matter/

1.png
2.png

Note:

-KEIL is configured through scatter-loading files. Generally, rw, zi, heap, stack (where zi+heap+stack=.bss) are allocated at once. The reason why the stack is after the heap is because the stack generally grows downward in the arm. And the heap is not used up every time and grows in phase, so the heap and stack can share the space between the two to the greatest extent.
-The above is the analysis of KEIL. I did not analyze IAR and GCC in particular. The essence is the same.
-The above is only an analysis of the scatter-loading process when address-independent compilation is not used (although my sample code does use address-independent compilation, there is no difference in essence; at the same time, you can practice it by yourself to further enhance your understanding).

3. Address Independent Compilation

1.ropi introduction

Mainly for const variables and data initialized by scatter loader (contents placed in flash, generally do not include functions).
Its official explanation:

ROPI = Read-Only Position Independence. This concerns everything that is readonly in the ELF output from the linker. Note that this includes const data and data initializers, i.e. typically everything that is put in FLASH.

Mainly explained the following 3 points:
1. const data is a variable modified by const;
2. Data initializers is also the data initializer (I understand it is the dynamic loader __sta__dyninit());
3. Function;
The above 3 points include all the data in flash (everything that is put in FLASH).

So there is no problem under bare metal, because under bare metal we generally need to specify where our program runs in the flash (simply like a .hex file, each record has an address, which is why .hex can be converted to The .bin file, on the contrary, is highly probable. The reason is that the `.bin file is pure instruction code without address information.
But after adding the operating system, it's different. The program runs there, and the operating system takes the only counts. Moreover, programs are usually stored in storage media such as disks, external flash, etc., and are only taken out when they are used, either in the internal memory or in the internal flash.
The prerequisite for the realization of ropi is: The positions of functions and variables in the program block are relative to each other, otherwise everything is unnecessary.

0 Answer

Create
Post