AddressIndependent ScatterLoading

Analysis of scatter-loading mechanism and address-independent compilation of microcomputer (4)

Created at 2020-12-10 22:04:42

a.const data

//1.code in c language
const long TestData[4] = {1, 2, 3, 4};
int main( void )
{
    return (int)TestData[0];
}

//2.The following is the corresponding assembly code
i.main
$Super$$main
    0x000062cc:    4801        .H      LDR      r0,[pc,#4] ; [0x62d4] = 0x5456
    0x000062ce:    4478        xD      ADD      r0,r0,pc
    0x000062d0:    6800        .h      LDR      r0,[r0,#0]
    0x000062d2:    4770        pG      BX       lr
$d
    0x000062d4:    00005456    VT..    DCD    21590
;;Omit other codes

//3.The location of const data in flash
.constdata
TestData
    0x0000b728:    00000001    ....    DCD    1
    0x0000b72c:    00000002    ....    DCD    2
    0x0000b730:    00000003    ....    DCD    3
    0x0000b734:    00000004    ....    DCD    4

Through analysis, we can find that: 0x000062cc~0x000062d0 corresponds to the operation of accessing TestData[0].
With the help of the table pc and $d, it is found through calculation that the value of r0 after the execution of line 0x000062ce is 0x5456 + pc = 0x0000b728, corresponding to Testdata[0].

It can be found that this offset is relative to the instruction at the call site, not the first address of the $d table or anything else.

b.data initializers

.text
__sta__dyninit
    0x00001d40:    4803        .H      LDR      r0,[pc,#12] ; [0x1d50] = 0x21ac3
    0x00001d42:    4478        xD      ADD      r0,r0,pc
    0x00001d44:    4903        .I      LDR      r1,[pc,#12] ; [0x1d54] = 0x1b0c
    0x00001d46:    4449        ID      ADD      r1,r1,r9
    0x00001d48:    6008        .`      STR      r0,[r1,#0]
    0x00001d4a:    6048        H`      STR      r0,[r1,#4]
    0x00001d4c:    4770        pG      BX       lr
$d
    0x00001d4e:    0000        ..      DCW    0
    0x00001d50:    00021ac3    ....    DCD    137923
    0x00001d54:    00001b0c    ....    DCD    6924

;; Omit other codes

i.StartThread_Entry
StartThread_Entry
    0x00023808:    b510        ..      PUSH     {r4,lr}
    0x0002380a:    4604        .F      MOV      r4,r0
    0x0002380c:    f7e5fcaa    ....    BL       CheckSysLimit ; 0x9164

Through analysis, we will find that: 0x00001d40~0x00001d42 corresponds to the operation of obtaining the start address of the StartThread_Entry() function.
It is found that with the help of the table pc and $d, through calculations, it is found that the value of r0 after the execution of line 0x00001d42 is 0x21ac3 + pc = 0x00023809, corresponding to the start address of StartThread_Entry() + bit[0 ]=1.

Call c. function

//1. Call the function example in the library
    0x000308d6:    a023        #.      ADR      r0,{pc}+0x8e ; 0x30964
    0x000308d8:    f7d1fae4    ....    BL       __2printf ; 0x1ea4

//2. Call user-defined program example
    0x000242c8:    f004fa04    ....    BL       api_InitSysWatchVariable ; 0x286d4
    0x000242cc:    f004f9b0    ....    BL       api_InitLcdVariable ; 0x28630

//3. The address of the called function
i.api_InitLcdVariable
api_InitLcdVariable
    0x00028630:    2001        .       MOVS     r0,#1
;; Omit other codes

It can be found that no matter calling library functions or custom functions, pc will not be used as the base address (no additional ±pc).
But in fact, the BL instruction itself is a relative jump, and the calculation method is as follows:

First of all, this BL is a Thumb instruction, which is 2 bytes. Although it is 4 bytes long, it is easy to be misunderstood. In fact, this long jump is a combination of two jump instructions. The format is as follows:

Therefore:

Every two bytes are a group, judge bit[11]: bit[11]=0 is high offset / bit[11]=1 is low offset;
Remove the high 5 bits;
The high offset is shifted to the left by 12 bits, and the low offset is shifted to the left by 1 bit, and then added;
Add the pc where the BL instruction is located, and then add 4;
Finished;

Then for the following instructions:
0x000242cc:    f004f9b0    ....    BL       api_InitLcdVariable ; 0x28630

1. First judge the address offset field in the instruction
0xf004, bit[11]=0, is the address high offset
0xf9b0, bit[11]=1, is the low offset of the address
2. Take the lower 11 bits respectively
0xf004 -> 0x004
0xf9b0 -> 0x1b0
3. Splicing
0x004<<12 + 0x1b0<<1 = 0x004000 + 0x360 = 0x004360
4. Add the offset where the instruction is located
0x004360 + (0x000242cc + 4) = 0x28630 (the address of the api_InitLcdVariable function)

Summary: The function call itself is the address-independent operation used, and the operation of ro and dynamic loading is realized by adding the pc offset.

2.rwpi Introduction

Rwpi is easier to understand: the use of data in rw, or directly the use of data in ram.
Because the location of the data in the ram at runtime is also uncertain, only the runtime knows where the os allocates this piece of data in the ram space.
The processing method is very simple, but when os allocates ram, the base address of this space is obtained, saved in one place, and added when used.

Its official explanation

RWPI = Read-Write Position Independence. This concerns everything that is readwrite in the ELF output from the linker.

everything that is readwrite includes:
1. rw data
2. zi data

First of all, the mechanism of rwpi is:

-Change the access of all variables to r9+offset when compiling. This offset is given as 0 when compiling, and then it is natural to get the offset address of each variable.
-When running, os first allocates a space, saves the base address in r9, and then executes the set of operations of scatter-loading.

a.rw data processing

b.zi data processing

Use a program as an example:

//1.Code in C language
long TestData1[4] = {1, 2, 3, 4};
long TestData2[4];

int main( void )
{
    long dump = TestData2[0] + TestData1[0];
    return (int)dump;
}

//2. The following is the corresponding assembly code
i.main
$Super$$main
    0x000062cc:    4903        .I      LDR      r1,[pc,#12] ; [0x62dc] = 0x3640
    0x000062ce:    4449        ID      ADD      r1,r1,r9
    0x000062d0:    6809        .h      LDR      r1,[r1,#0]
    0x000062d2:    4a03        .J      LDR      r2,[pc,#12] ; [0x62e0] = 0x1a48
    0x000062d4:    444a        JD      ADD      r2,r2,r9
    0x000062d6:    6812        .h      LDR      r2,[r2,#0]
    0x000062d8:    1888        ..      ADDS     r0,r1,r2
    0x000062da:    4770        pG      BX       lr
$d
    0x000062dc:    00003640    @6..    DCD    13888
    0x000062e0:    00001a48    H...    DCD    6728

When analyzing the code, r9 sets 0, and then parses the .map file for analysis.
We can draw an conclusion that:

-The lines 0x000062cc~0x000062d0 correspond to the address of accessing ram space address r9+0x00003640, corresponding to TestData2[0], this address is obviously in .bss;

-The lines 0x000062d2~0x000062d6 correspond to the address of access ram space address r9+0x00001a48, corresponding to TestData1[0], this address is in .data;

//.map file
TestData1        0x0000d600   Data          16  main.o(.data)

//Correspondence between loading view and running view
        0x0000bb67:    XXXXXXXX    ....    DCD    XXXXXX ;;bin文件中的base地址    
Region$$Table$$Base
        0x0000bb68:    00000051    Q...    DCD    81
        0x0000bb6c:    00000002    ....    DCD    2
        0x0000bb70:    00001a74    t...    DCD    6772
        0x0000bb74:    0000ba6b    k...    DCD    47723

//It is easy to get the position of .data in .bin
** Section #3 'data' (SHT_PROGBITS) [SHF_ALLOC + SHF_WRITE]
    Size   : 1992 bytes (alignment 4)
    Address: 0x0000bbb8

//Then do the conversion operation
Use 0x0000d600-0x0000bbb8 = 0x1a48 corresponding to TestData1 in the .map file
This value = &TestData1[0]; //The global variable assigned the initial value

3. Why can't use const function pointer array after using address independent compilation

According to the previous example, it is easy to see that whether it is rwpi or ropi, they are just some processing of variable access, just by giving a base address when accessing, and then by adding an offset address. use. This base address is divided into two types: r9 for ram space and pc for flash space. Apart from this, no other processing was done.

Then we come to analyze:

-In the compilation phase, what is completed is the translation of each file (assigned to code, ro, rw, and zi according to the situation to form different segments).
-For external symbols, as long as the operation is declared, a relocation table (the $d table seen earlier) will be generated after the assembly code of each function, and the call in the instruction will point to the relocation table Each item; At the same time, a segment of the relocation table is also generated, which is used as a reference when relocation in the link phase.
-In the initial stage of linking, the same sections of all files are formed into a whole section (that is, all rw sections are integrated into one rw section...).
-In the later stage of linking, modify the $d formed in the compilation stage according to the relocation section.

When a const function pointer array is used, the array obviously needs to be placed in the flash. Normally, the compilation phase can be passed (assuming that it is only judged based on the tasks completed in the compilation phase, not the actual compiler operation), At this time, the initial value of the array is not known, and the compiler generates a relocation table and uses a temporary data place, which is all feasible.
Then in the early stage of linking, integrating all the segments to form a segment, this step is also no problem, because it is just a splicing.
The last stage of the link is the relocation stage. There is no problem in theory. There is no problem in accessing ordinary variables, but there are some minor problems in this step for accessing function pointers:

-Under the address-independent compilation operation, according to the processing flow of ropi, all const variables are accessed by the operation of pc+offset.
-So the piece of code that uses this const variable can definitely be accessed correctly, but the value of the accessed function pointer is not correct, because the value of the function pointer when the relocation is performed is not the one when it is actually running , And flash can’t be dynamically loaded like ram, so it can’t be accessed relatively through the base address of pc.
-However, function pointers can be used in ram (non-const-modified global variables), and the absolute address of the function is assigned to the function pointer through dynamic loading (this can be found in this blog by searching for __sta__dyninit(), at a glance) ).

So the reason for this problem is the particularity of flash.

In short, in address-independent compilation, const variables cannot appear in the data can only be determined at runtime! Such as assigning function pointers and variable addresses to const variables! The address of this variable includes ro, rw, and zi.

4. Solution

Therefore, the compiler will not allow the const function pointer array syntax to appear, and directly report an error, so it can do nothing. There is no special treatment at all calling places, that is, the variable in this variable is an absolute address, which is used directly at the calling place, and pc will not be added at all (unless we get the pc artificially at the place of use, but the premise is still It must be compiled by the compiler, and then the linker links a relative offset relative to a known position).
Therefore, at present, we can only imitate the dynamic loading mechanism of ram, rewrite the const function pointer array, and obtain the function pointer through an auxiliary function (using the feature that the function call itself uses relative addresses).

void test1(char a) {
    printf("this is dunc:test1 %c\n", a);
}
void test2(char a) {
    printf("this is dunc:test2 %c\n", a);
}
void test3(char a) {
    printf("this is dunc:test3 %c\n", a);
}

typedef struct
{
    char     c;
    int     f;
}_T_Test;

const _T_Test test[3] = 
{
    {    'a',    1, },
    {    'b',     2, },
    {    'c',     3, },
};

typedef void (*Func)(char);
Func Select(int idx)
{
    switch(idx)
    {
        case 1:
            return test1;
        case 2:
            return test2;
        case 3:
            return test3;
        default:
            return 0;
    }
}

int main(void)
{
    Select(test[0].f)(test[0].c);
    Select(test[1].f)(test[1].c);
    Select(test[2].f)(test[2].c);
    
    return 0;
}

If you have any better methods, welcome to share.

Summary

The scatter loading mechanism is essentially to "relocate" data from nonvolatile storage media such as flash or rom to ram. The difference is that different types of data will be processed differently, and some even need to be dynamically loaded.
During dynamic loading, OS is inseparable (note that the OS on the microcontroller is not the OS on windows after all, with limited capabilities, but the principles are the same).
The use of address-independent compilation is the use of two technologies: ropi and rwpi. The essence of this is to deal with variables that are address-independent (because functions are inherently address-independent), thus introducing the processing of ro, and the processing of rw The principle of processing and zi, even the processing of function pointer arrays, is to use relative addressing, and the relative address of the code and data itself is unchanged.