标题: ARM模式、THUMB模式若干工程实践问题
创建: 2019-01-30 15:43 更新: 2019-01-31 10:43 链接: https://scz.617.cn/unix/201901301543.txt
目录:
☆ ARM模式、THUMB模式简介
☆ GDB中判断当前CPU模式
☆ ARM模式与THUMB模式的切换
1) 切换方案
2) arm_thumb_switch_1.s
3) ELF for the ARM Architecture
4) GDB中ARM模式与THUMB模式的切换
5) IDA中指定ARM模式与THUMB模式
☆ ARM汇编编程中的几个坑
☆ 参考资源
☆ 一些与网友的讨论
☆ ARM模式、THUMB模式简介
ARM架构中有一个CPSR寄存器,它的的bit-5是T位(Thumb state flag),置1表示 THUMB模式,置0表示ARM模式。二者区别很多,对于逆向工程来说,可以简单理解成 ARM模式都是4字节指令,THUMB模式尽可能采用2字节指令编码方案;这种说法很不严 谨,但不影响大局。此处只考虑32-bits ARM。
一段代码中可以先出现ARM模式的指令,接着设法修改CPSR寄存器T位,切入THUMB模 式,在THUMB模式下执行一系列指令后,再次设法修改CPSR寄存器T位,切回ARM模式。 这两种模式可以混着用。
☆ GDB中判断当前CPU模式
GDB里如何知道当前CPU模式是ARM模式还是THUMB模式?
参看:
《ARMv5 Architecture Reference Manual》
A1.1.3 Status registers (P31) A2.5 Program status registers (P49) A2.5.8 The T and J bits (P53)
CPSR寄存器的bit-5是T位(Thumb state flag),置1表示THUMB模式,置0表示ARM模式。
GDB中可以直接查看CPSR寄存器:
(gdb) p/x $cpsr&0x20 $1 = 0x20
如果等于0x20,表示是THUMB模式,如果等于0,表示是ARM模式。
☆ ARM模式与THUMB模式的切换
1) 切换方案
参看:
《ARMv5 Architecture Reference Manual》 A2.6 Exceptions A2.8.1 Unaligned instruction fetches (P76) A3.3 Branch instructions (P113) A3.10.1 CPSR value (P127) A4.1.10 BX (P170) A6.1.1 Entering Thumb state (P496) A6.1.2 Exceptions (P497) A6.3.3 Branch with exchange (P501) A7.1.19 BX (P548) A7.1.49 POP (P598)
异常处理始终在ARM模式进行,异常处理完成后靠SPSR恢复T位。
BX
有多种方案修改CPSR寄存器T位,最常用的是BX指令,它无论如何都会修改T位,不管 当前是哪种模式。BX指令的伪操作如下:
CPSR T bit = Rm[0] PC = Rm AND 0xFFFFFFFE
现实世界中很多代码利用BX指令从ARM模式切至THUMB模式,上述第一条伪操作给很多 人带来误解,认为PC寄存器的bit-0用于确定CPU模式;事实上只有CPSR的T位用于确 定CPU模式,仅仅是BX的Rm[0]可以为1,以此修改T位,而Rm被装载到PC寄存器时, Rm[0]被掩码按位与掉了,PC寄存器的bit-0永远为0,无论哪种模式。
ARM模式与THUMB模式的切换有多种具体实现,对于编写shellcode的人群,小结两种 实现:
ARM->THUMB
.arm
add r0,pc,#1
bx r0
.thumb
THUMB->ARM
.thumb
.align 2
mov r0,pc
bx r0
.arm
两种实现没有考虑规避'\0'或者出现在可打印字符范围这类问题。THUMB模式转ARM模式 时,".align 2"刻意没有直接放在".arm"前面,而是放在"mov r0,pc"前面;mov+bx 共占4字节,只要mov对齐在4字节边界上,".arm"也就对齐在4字节边界上;如果 ".align 2"直接放在".arm"前面,r0寄存器的值不能保证对齐在4字节边界上,从而 不能保证"bx r0"跳到ARM模式代码。
如果觉得这里坑多,参看:
《ARMv5 Architecture Reference Manual》 A2.4.3 Register 15 and the program counter (P47)
When an instruction reads the PC, the value read depends on which instruction set it comes from:
For an ARM instruction, the value read is the address of the instruction plus 8 bytes. Bits [1:0] of this value are always zero, because ARM instructions are always word-aligned.
For a Thumb instruction, the value read is the address of the instruction plus 4 bytes. Bit [0] of this value is always zero, because Thumb instructions are always halfword-aligned.
2) arm_thumb_switch_1.s
$ vi arm_thumb_switch_1.s
.syntax divided
.arch armv5te
.section .text
.globl _start
_start:
.arm
mov r2,#14
adr r1,msg_0
mov r0,#1
mov r7,#4
svc #0
add r0,pc,#1
bx r0
.thumb
mov r2,#12
add r1,pc,#0x38
add r1,#2
mov r0,#1
mov r7,#4
svc #0x2f
mov r0,pc
bx r0
.align 2
.arm
mov r2,#10
adr r1,msg_2
mov r0,#1
mov r7,#4
svc #0
eor r0,r0,r0
mov r7,#1
svc #0
msg_0:
.ascii "Hello, world.\n"
msg_1:
.ascii "thumb mode.\n"
msg_2:
.ascii "arm mode.\n"
$ as -o arm_thumb_switch_1.o arm_thumb_switch_1.s $ ld -N -o arm_thumb_switch_1 arm_thumb_switch_1.o
$ ./arm_thumb_switch_1 Hello, world. thumb mode. arm mode.
arm_thumb_switch_1首先在ARM模式下运行,调用:
write( stdout, msg_0, 14 )
接着利用BX指令修改CPSR寄存器T位切入THUMB模式,在THUMB模式下调用:
write( stdout, msg_1, 12 )
再次利用BX指令修改CPSR寄存器T位切回ARM模式,在ARM模式下调用:
write( stdout, msg_2, 10 ) _exit( 0 )
3) ELF for the ARM Architecture
$ file -b arm_thumb_switch_1 ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped
ld时我故意没有指定"-s",如果strip过,后面的实验会变。
$ readelf -Wa arm_thumb_switch_1 ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: ARM Version: 0x1 Entry point address: 0x10054 Start of program headers: 52 (bytes into file) Start of section headers: 684 (bytes into file) Flags: 0x5000200, Version5 EABI, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 6 Section header string table index: 5
Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 00010054 000054 000070 00 WAX 0 0 4 [ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 0000c4 00001b 00 0 0 1 [ 3] .symtab SYMTAB 00000000 0000e0 000130 10 4 11 4 [ 4] .strtab STRTAB 00000000 000210 00006b 00 0 0 1 [ 5] .shstrtab STRTAB 00000000 00027b 000031 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), y (purecode), p (processor specific)
There are no section groups in this file.
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000054 0x00010054 0x00010054 0x00070 0x00070 RWE 0x4
Section to Segment mapping: Segment Sections... 00 .text
There is no dynamic section in this file.
There are no relocations in this file.
There are no unwind sections in this file.
Symbol table '.symtab' contains 19 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00010054 0 SECTION LOCAL DEFAULT 1 2: 00000000 0 SECTION LOCAL DEFAULT 2 3: 00000000 0 FILE LOCAL DEFAULT ABS arm_thumb_switch_1.o 4: 00010054 0 NOTYPE LOCAL DEFAULT 1 $a // arm code 5: 000100a0 0 NOTYPE LOCAL DEFAULT 1 msg_0 6: 00010070 0 NOTYPE LOCAL DEFAULT 1 $t // thumb code 7: 00010080 0 NOTYPE LOCAL DEFAULT 1 $a // arm code 8: 000100ba 0 NOTYPE LOCAL DEFAULT 1 msg_2 9: 000100a0 0 NOTYPE LOCAL DEFAULT 1 $d // literal data 10: 000100ae 0 NOTYPE LOCAL DEFAULT 1 msg_1 11: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 _bss_end__ 12: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 bss_start 13: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 bss_end 14: 00010054 0 NOTYPE GLOBAL DEFAULT 1 _start 15: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 bss_start 16: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 __end 17: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 _edata 18: 000100c4 0 NOTYPE GLOBAL DEFAULT 1 _end
No version information found in this file. Attribute Section: aeabi File Attributes Tag_CPU_name: "5TE" Tag_CPU_arch: v5TE Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-1
$ objdump -d arm_thumb_switch_1
arm_thumb_switch_1: file format elf32-littlearm
Disassembly of section .text:
00010054 <_start>:
10054: e3a0200e mov r2, #14
10058: e28f1040 add r1, pc, #64 ; 0x40
1005c: e3a00001 mov r0, #1
10060: e3a07004 mov r7, #4
10064: ef000000 svc 0x00000000
10068: e28f0001 add r0, pc, #1
1006c: e12fff10 bx r0
10070: 220c movs r2, #12
10072: a10e add r1, pc, #56 ; (adr r1, 100ac
000100a0
000100ae
000100ba
objdump能识别出中部(0x10070)的THUMB模式代码,是因为符号表中有几个特殊符号 ($a、$t):
$ objdump --sym --special-syms arm_thumb_switch_1
arm_thumb_switch_1: file format elf32-littlearm
SYMBOL TABLE: 00010054 l d .text 00000000 .text 00000000 l d .ARM.attributes 00000000 .ARM.attributes 00000000 l df ABS 00000000 arm_thumb_switch_1.o 00010054 l .text 00000000 $a // arm code 000100a0 l .text 00000000 msg_0 00010070 l .text 00000000 $t // thumb code 00010080 l .text 00000000 $a // arm code 000100ba l .text 00000000 msg_2 000100a0 l .text 00000000 $d // literal data 000100ae l .text 00000000 msg_1 000100c4 g .text 00000000 _bss_end__ 000100c4 g .text 00000000 bss_start 000100c4 g .text 00000000 bss_end 00010054 g .text 00000000 _start 000100c4 g .text 00000000 bss_start 000100c4 g .text 00000000 __end 000100c4 g .text 00000000 _edata 000100c4 g .text 00000000 _end
arm_thumb_switch_1中保留了as产生的$a、$t,objdump靠这些信息识别出中部 (0x10070)的THUMB模式代码。
关于.symtab section中的$a、$t、$d,参看:
《ELF for the ARM Architecture》 《ARM Mapping Symbols》
$ cp arm_thumb_switch_1 arm_thumb_switch_1_strip $ strip arm_thumb_switch_1_strip $ file -b arm_thumb_switch_1_strip ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, stripped
arm_thumb_switch_1_strip是strip过的,没有$a、$t。
$ objdump --sym --special-syms arm_thumb_switch_1_strip
arm_thumb_switch_1_strip: file format elf32-littlearm
SYMBOL TABLE: no symbols
此时objdump无法识别出中部的THUMB模式代码:
$ objdump -d arm_thumb_switch_1_strip
arm_thumb_switch_1_strip: file format elf32-littlearm
Disassembly of section .text:
00010054 <.text>: 10054: e3a0200e mov r2, #14 10058: e28f1040 add r1, pc, #64 ; 0x40 1005c: e3a00001 mov r0, #1 10060: e3a07004 mov r7, #4 10064: ef000000 svc 0x00000000 10068: e28f0001 add r0, pc, #1 1006c: e12fff10 bx r0 10070: a10e220c tstge lr, ip, lsl #4 10074: 20013102 andcs r3, r1, r2, lsl #2 10078: df2f2704 svcle 0x002f2704 1007c: 47004678 smlsdxmi r0, r8, r6, r4 10080: e3a0200a mov r2, #10 10084: e28f102e add r1, pc, #46 ; 0x2e 10088: e3a00001 mov r0, #1 1008c: e3a07004 mov r7, #4 10090: ef000000 svc 0x00000000 10094: e0200000 eor r0, r0, r0 10098: e3a07001 mov r7, #1 1009c: ef000000 svc 0x00000000 100a0: 6c6c6548 cfstr64vs mvdx6, [ip], #-288 ; 0xfffffee0 100a4: 77202c6f strvc r2, [r0, -pc, ror #24]! 100a8: 646c726f strbtvs r7, [ip], #-623 ; 0xfffffd91 100ac: 68740a2e ldmdavs r4!, {r1, r2, r3, r5, r9, fp}^ 100b0: 20626d75 rsbcs r6, r2, r5, ror sp 100b4: 65646f6d strbvs r6, [r4, #-3949]! ; 0xfffff093 100b8: 72610a2e rsbvc r0, r1, #188416 ; 0x2e000 100bc: 6f6d206d svcvs 0x006d206d 100c0: 0a2e6564 beq 0xba9658
0x10070处THUMB模式代码按ARM模式反汇编了。注意,$a、$t是否存在只影响ELF工具 的反汇编效果,不影响实际执行效果。
4) GDB中ARM模式与THUMB模式的切换
set arm fallback-mode (arm|thumb|auto)
gdb uses the symbol table, when available, to determine whether
instructions are ARM or Thumb. This command controls gdb's default
behavior when the symbol table is not available. The default is auto,
which causes gdb to use the current execution mode (from the T bit in
the CPSR register).
GDB尝试寻找$a、$t,找不到时GDB按此设置切换模式。如果此设置是auto,GDB
从CPSR中取T位来确定模式。
set arm force-mode (arm|thumb|auto)
This command overrides use of the symbol table to determine whether
instructions are ARM or Thumb. The default is auto, which causes gdb
to use the symbol table and then the setting of "set arm fallback-mode".
若此设置为auto,GDB受"set arm fallback-mode"影响,否则完全不理符号表,
强制使用指定模式。如果正在逆向非ELF格式的裸格式固件,忘了符号表吧。
如果CPU在THUMB模式,但某些地址处的代码实际是ARM模式的,此时可以
"set arm force-mode arm"之后"x/5i"。
$ gdb -q -nx ./arm_thumb_switch_1_strip
(gdb) starti Starting program: /tmp/arm_thumb_switch_1_strip
Program stopped. 0x00010054 in ?? () (gdb) display/5i $pc 1: x/5i $pc => 0x10054: mov r2, #14 0x10058: add r1, pc, #64 ; 0x40 0x1005c: mov r0, #1 0x10060: mov r7, #4 0x10064: svc 0x00000000 (gdb) p/x $cpsr&0x20 $1 = 0x0
CPU当前是ARM模式,尝试反汇编0x10070处的THUMB模式代码:
(gdb) x/8i 0x10070
0x10070: tstge lr, r12, lsl #4
0x10074: andcs r3, r1, r2, lsl #2
0x10078: svcle 0x002f2704
0x1007c: ;
(gdb) tb *0x1007e Temporary breakpoint 1 at 0x1007e (gdb) c Continuing. Hello, world. thumb mode.
Temporary breakpoint 1, 0x0001007e in ?? () 1: x/5i $pc => 0x1007e: bx r0 0x10080: movs r0, #10 0x10082: b.n 0x107c6 0x10084: asrs r6, r5, #32 0x10086: b.n 0x105a8 (gdb) p/x $cpsr&0x20 $2 = 0x20 (gdb) i r r0 r0 0x10080 65664
CPU当前是THUMB模式,尝试反汇编0x10080处的ARM模式代码:
(gdb) x/5i 0x10080 0x10080: movs r0, #10 0x10082: b.n 0x107c6 0x10084: asrs r6, r5, #32 0x10086: b.n 0x105a8 0x10088: movs r1, r0 (gdb) set arm force-mode arm (gdb) x/5i 0x10080 0x10080: mov r2, #10 0x10084: add r1, pc, #46 ; 0x2e 0x10088: mov r0, #1 0x1008c: mov r7, #4 0x10090: svc 0x00000000
"set arm force-mode"、"set arm fallback-mode"默认均为auto,如果动态调试到 某地址,"x/5i $pc"时没有问题,不必手工设置什么,GDB会根据CPSR中的T位自动确 定模式。
如果不使用上述设置,GDB有个邪门办法在ARM模式下强制查看THUMB模式代码:
$ gdb -q -nx ./arm_thumb_switch_1_strip
(gdb) starti (gdb) x/8i 0x10070+1 0x10071: movs r2, #12 0x10073: add r1, pc, #56 ; (adr r1, 0x100ac) 0x10075: adds r1, #2 0x10077: movs r0, #1 0x10079: movs r7, #4 0x1007b: svc 47 ; 0x2f 0x1007d: mov r0, pc 0x1007f: bx r0
这事你在GDB手册里可能找不到。注意显示出来的地址,其最低位均置1了。再次强调, PC寄存器最低位"不"用于确定CPU模式,这只是GDB显示上的trick,实际地址最低位 都是0。
5) IDA中指定ARM模式与THUMB模式
光标停在指令处,Alt-G,弹出"Segment Register Value"对话框,选中T/Value:
0 ARM模式,反汇编窗口显示"CODE32" 1 THUMB模式,反汇编窗口显示"CODE16"
☆ ARM汇编编程中的几个坑
这节就当是留给那些永远充满好奇心的读者的作业吧,针对arm_thumb_switch_1.s提 几个小问题:
1) ".syntax divided"换成".syntax unified"会如何 2) ".arch armv5te"换成".arch armv6t2"会如何 3) ".arm"、".thumb"这些Directives意义何在 4) 如果_start处一上来就是".thumb",ld命令该如何写 5) 为什么"add r1,pc,#0x38"那两行代码没有直接写成"add r1,pc,#0x3a" 6) ".align 2"意义何在
☆ 参考资源
[1] 《ARMv5 Architecture Reference Manual》 https://developer.arm.com/docs/ddi0100/latest/armv5-architecture-reference-manual
[2] 《ELF for the ARM Architecture》 http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044e/IHI0044E_aaelf.pdf
[3] 《ARM Mapping Symbols》 https://sourceware.org/binutils/docs/as/ARM_002dDependent.html
☆ 一些与网友的讨论
2019-01-31 10:05
p5yGh0:
我在实战中遇到过跨ARM模式下断点断不下来的问题,后来自己想了个办法,在里面 放各自的nop指令,引起SIGILL异常,通过接管异常的方法实现断点的效果。实现起 来形如:
define set_thumb_sigbkpt set $opcode_ill = 0xbebe set $opcode_bus = 0xde10 set (unsigned short )$arg0 = $opcode_ill set (unsigned short )($arg0+2) = $opcode_bus end handle all nostop handle all pass handle SIGBUS stop nopass handle SIGILL stop nopass noprint
scz:
我文章里跨模式设断,断下来了。你有最小测试用例吗?
p5yGh0:
我回头构造一个试试,环境比较特殊,可能和环境有关,被调试程序在安卓里跑,挂 在一个特殊版本的gdbserver上。gdb挂上去时,程序在epoll的api里,是arm的,我 想让程序断在某个thumb的so里,当时尝试过切换模式,都不太成功,然后发现引起 异常比较稳定,就先用着了。
scz:
这种有可能自己交叉编译一个最新版GDBServer就可以了。
你那两个short是什么指令,不是nop吧?
$ rasm2 -a arm -b 16 -o 0 -D "be be 10 de" 0x00000000 2 bebe bkpt 0xbe 0x00000002 2 10de udf 0x10
$ rasm2 -a arm -b 16 -o 0 -D "be be de 10" 0x00000000 2 bebe bkpt 0xbe 0x00000002 2 de10 asrs r6, r3, 3
我在一个IoT设备上用自己交叉编译的gdb 8.2调试arm_thumb_switch_1_strip:
gdb-8.2 -q -nx ./arm_thumb_switch_1_strip starti set arm force-mode thumb tb *0x1007e c
starti停在e_entry时是ARM模式,强制指定GDB的反汇编模式为THUMB模式,0x1007e 处是THUMB模式代码,对之设断,c之后这个跨模式断点正常命中。
如果starti之后不强制指定GDB的反汇编模式并跨模式设断,会触发SIGSEGV:
gdb-8.2 -q -nx ./arm_thumb_switch_1_strip starti tb *0x1007e c
Program received signal SIGSEGV, Segmentation fault. 0x00000e00 in ?? ()
下面这种调试序列同样会触发SIGSEGV:
gdb-8.2 -q -nx ./arm_thumb_switch_1_strip starti set arm force-mode thumb tb *0x1007e set arm force-mode auto c
Program received signal SIGSEGV, Segmentation fault. 0x00000e00 in ?? ()
要点是,假设当前模式是ARM模式,跨模式对THUMB模式代码设断前,必须强制指定 GDB反汇编模式为THUMB模式,c之前不要改变设断时的反汇编模式。
下面这两种调试序列均会触发SIGILL:
gdb-8.2 -q -nx ./arm_thumb_switch_1_strip starti set arm force-mode thumb tb 0x1007e c tb 0x1009c c
Program received signal SIGILL, Illegal instruction. 0x000100a0 in ?? ()
gdb-8.2 -q -nx ./arm_thumb_switch_1_strip starti set arm force-mode thumb tb 0x1007e c set arm force-mode auto tb 0x1009c c
Program received signal SIGILL, Illegal instruction. 0x000100a0 in ?? ()
下面这种调试序列正常:
gdb-8.2 -q -nx ./arm_thumb_switch_1_strip starti set arm force-mode thumb tb 0x1007e c set arm force-mode arm tb 0x1009c c si
要点是,欲跨模式对某种CPU模式代码设断,安全起见,先强制指定GDB反汇编模式为 目标模式,设断,c,中间不要再次修改GDB反汇编模式。
总之,我这里跨模式设断时,一切可解释并有应对方案,无法重现你那个现象,要不 先换个新版GDBServer试试。