标题: 针对Python 3.9修改pycdc源码示例
创建: 2023-05-10 16:43 更新: 2023-05-12 11:29 链接: https://scz.617.cn/python/202305101643.txt
目录:
☆ 背景介绍
☆ pycdc源码浅析
☆ 部分增加/修改的指令示例
1) CALL_FUNCTION_KW
2) CALL_FUNCTION_EX/DICT_MERGE
3) SETUP_FINALLY
4) JUMP_IF_NOT_EXC_MATCH
5) RERAISE
☆ 后记
☆ 背景介绍
pycdc是款C++开发的开源Python反编译器,对Python 3.9有部分支持:
Decompyle++ A Python Byte-code Disassembler/Decompiler https://github.com/zrax/pycdc
有几次用到它,小打小闹修补过。我不会C++,本文属于备忘笔记,必有谬误,只可 借鉴,不可全信。
☆ pycdc源码浅析
查看ASTree.cpp
void decompyle(PycRef
void print_src(PycRef
cur_indent++;
print_block(blk, mod);
cur_indent--;
}
break;
...
}
查看ASTNode.h
有两种重要数据类型,ASTNode、ASTBlock。ASTNode有如下类型
NODE_INVALID, NODE_NODELIST, NODE_OBJECT, NODE_UNARY, NODE_BINARY, NODE_COMPARE, NODE_SLICE, NODE_STORE, NODE_RETURN, NODE_NAME, NODE_DELETE, NODE_FUNCTION, NODE_CLASS, NODE_CALL, NODE_IMPORT, NODE_TUPLE, NODE_LIST, NODE_SET, NODE_MAP, NODE_SUBSCR, NODE_PRINT, NODE_CONVERT, NODE_KEYWORD, NODE_RAISE, NODE_EXEC, NODE_BLOCK, NODE_COMPREHENSION, NODE_LOADBUILDCLASS, NODE_AWAITABLE, NODE_FORMATTEDVALUE, NODE_JOINEDSTR, NODE_CONST_MAP, NODE_ANNOTATED_VAR, NODE_CHAINSTORE, NODE_TERNARY
ASTBlock有如下类型
BLK_MAIN, BLK_IF, BLK_ELSE, BLK_ELIF, BLK_TRY, BLK_CONTAINER, BLK_EXCEPT, BLK_FINALLY, BLK_WHILE, BLK_FOR, BLK_WITH, BLK_ASYNCFOR
有多种ASTNode,反编译结果以ASTNode为基本单位进行组织。有些ASTNode简单,比 如ASTReturn,有些ASTNode复杂,比如ASTBlock。ASTReturn、ASTBlock都是ASTNode 的派生类。ASTBlock对应try块、except块这些。
RERAISE_test.py
def func () : try : x = 51201314 except : pass
RERAISE_test.pycdump.asm
11 0 SETUP_FINALLY 8 (to 10)
12 2 LOAD_CONST 1 (51201314) 4 STORE_FAST 0 (x) 6 POP_BLOCK 8 JUMP_FORWARD 12 (to 22)
13 >> 10 POP_TOP 12 POP_TOP 14 POP_TOP
14 16 POP_EXCEPT 18 JUMP_FORWARD 2 (to 22) 20 RERAISE >> 22 LOAD_CONST 0 (None) 24 RETURN_VALUE
组织反编译结果时,大致如此
NODE_NODELIST (1) NODE_BLOCK (25) - BLK_CONTAINER (5) // func() NODE_BLOCK (25) - BLK_TRY (4) // try NODE_BLOCK (25) - BLK_EXCEPT (6) // except NODE_RETURN (8) // return
最终的反编译输出
def func():
try:
x = 51201314
except:
pass
return None
BuildFromCode()组织反编译结果,具体解析每条指令
PycRef
if (stack_hist.size()) {
fputs("Warning: Stack history is not empty!\n", stderr);
while (stack_hist.size()) {
stack_hist.pop();
}
}
if (blocks.size() > 1) {
fputs("Warning: block stack is not empty!\n", stderr);
while (blocks.size() > 1) {
PycRef<ASTBlock> tmp = blocks.top();
blocks.pop();
blocks.top()->append(tmp.cast<ASTNode>());
}
}
cleanBuild = true;
return new ASTNodeList(defblock->nodes());
}
"pycdc some.pyc"若输出"Unsupported opcode: XXX",表示XXX指令未被支持,该警 告由BuildFromCode()发出。
"Unsupported opcode: XXX"中的"XXX"就是opname,假设是带参指令,"XXX"不包含 后缀"_A",但switch/case中带参指令有"_A"后缀。
对"Unsupported opcode: XXX"的修补在BuildFromCode()的switch/case中进行。
BuildFromCode()内部维护名为blocks[]的栈,用于存放解析后组织出来的ASTBlock。
☆ 部分增加/修改的指令示例
pycdc未支持的Python 3.9指令不少;还有一些指令貌似支持,但实际对标更低版本 Python,并不对标3.9;这些都需要处理,此间只记录部分。
1) CALL_FUNCTION_KW
在ASTree.cpp中搜
case Pyc::CALL_FUNCTION_A case Pyc::CALL_FUNCTION_KW_A
原实现对标3.5及更早版本,与3.9不兼容,参看
https://docs.python.org/3.4/library/dis.html https://docs.python.org/3.9/library/dis.html
CALL_FUNCTION(argc) 3.4
Calls a function. The low byte of argc indicates the number of positional parameters, the high byte the number of keyword parameters. On the stack, the opcode finds the keyword parameters first. For each keyword argument, the value is on top of the key. Below the keyword parameters, the positional parameters are on the stack, with the right-most parameter on top. Below the parameters, the function object to call is on the stack. Pops all function arguments, and the function itself off the stack, and pushes the return value.
CALL_FUNCTION_KW(argc) 3.4
Calls a function. The low byte of argc indicates the number of positional parameters, the high byte the number of keyword parameters. The top element on the stack contains the keyword arguments dictionary, followed by explicit keyword and positional arguments.
CALL_FUNCTION(argc) 3.9
Calls a callable object with positional arguments. argc indicates the number of positional arguments. The top of the stack contains positional arguments, with the right-most argument on top. Below the arguments is a callable object to call. CALL_FUNCTION pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.
Changed in version 3.6: This opcode is used only for calls with positional arguments.
CALL_FUNCTION_KW(argc) 3.9
Calls a callable object with positional (if any) and keyword arguments. argc indicates the total number of positional and keyword arguments. The top element on the stack contains a tuple with the names of the keyword arguments, which must be strings. Below that are the values for the keyword arguments, in the order corresponding to the tuple. Below that are positional arguments, with the right-most parameter on top. Below the arguments is a callable object to call. CALL_FUNCTION_KW pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.
Changed in version 3.6: Keyword arguments are packed in a tuple instead of a dictionary, argc indicates the total number of arguments.
"case Pyc::CALL_FUNCTION_A"事实上可以不修改,因为operand不会很大,此时 kwparams为0,测3.9也行。
"case Pyc::CALL_FUNCTION_KW_A"需要修改,下面是简单示例
case Pyc::CALL_FUNCTION_KW_A :
{
ASTCall::kwparam_t kwparamList;
ASTCall::pparam_t pparamList;
/
* 当前指令的参数有两部分来源,一是operand,二是stack
/
PycRef
CALL_FUNCTION_KW_test.py
import dis
def func () : func_4( 0, b=1 )
dis.dis( func )
func()的反汇编
11 0 LOAD_GLOBAL 0 (func_4) 2 LOAD_CONST 1 (0) 4 LOAD_CONST 2 (1) 6 LOAD_CONST 3 (('b',)) 8 CALL_FUNCTION_KW 2 10 POP_TOP 12 LOAD_CONST 0 (None) 14 RETURN_VALUE
假设读者具有Python字节码功底,不多解释。
2) CALL_FUNCTION_EX/DICT_MERGE
CALL_FUNCTION_EX_test.py
import dis
def func () : func_6_0( 0, *{"a":1, "b":2} ) func_6_1( [1, 2, 3] )
dis.dis( func )
11 0 LOAD_GLOBAL 0 (func_6_0) 2 LOAD_CONST 6 ((0,)) 4 BUILD_MAP 0 6 LOAD_CONST 2 (1) 8 LOAD_CONST 3 (2) 10 LOAD_CONST 4 (('a', 'b')) 12 BUILD_CONST_KEY_MAP 2 14 DICT_MERGE 1 16 CALL_FUNCTION_EX 1 18 POP_TOP
12 20 LOAD_GLOBAL 1 (func_6_1) 22 BUILD_LIST 0 24 LOAD_CONST 5 ((1, 2, 3)) 26 LIST_EXTEND 1 28 CALL_FUNCTION_EX 0 30 POP_TOP 32 LOAD_CONST 0 (None) 34 RETURN_VALUE
https://docs.python.org/3.9/library/dis.html
CALL_FUNCTION_EX(flags) 3.9
Calls a callable object with variable set of positional and keyword arguments. If the lowest bit of flags is set, the top of the stack contains a mapping object containing additional keyword arguments. Before the callable is called, the mapping object and iterable object are each "unpacked" and their contents passed in as keyword and positional arguments respectively. CALL_FUNCTION_EX pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object.
New in version 3.6.
DICT_MERGE、CALL_FUNCTION_EX未被支持,需要增加
/
* DICT_MERGE(i) calls dict.update(TOS1[-i], TOS), raises an exception for
* duplicate keys. Used to build dicts. New in version 3.9.
/
case Pyc::DICT_MERGE_A :
{
/
* 简单处理operand,暂不考虑非1情形
/
if ( operand != 1 )
{
fprintf( stderr, "Unsupported operand found for DICT_MERGE\n" );
break;
}
PycRef
case Pyc::CALL_FUNCTION_EX_A :
{
ASTCall::kwparam_t kwparamList;
ASTCall::pparam_t pparamList;
PycRef
3) SETUP_FINALLY
参看
https://docs.python.org/3.4/library/dis.html https://docs.python.org/3.9/library/dis.html
SETUP_EXCEPT(delta) 3.4
Pushes a try block from a try-except clause onto the block stack. delta points to the first except block.
SETUP_FINALLY(delta) 3.4
Pushes a try block from a try-except clause onto the block stack. delta points to the finally block.
SETUP_FINALLY(delta) 3.9
Pushes a try block from a try-finally or try-except clause onto the block stack. delta points to the finally block or the first except block.
3.9没有SETUP_EXCEPT,全部揉进SETUP_FINALLY。在ASTree.cpp中合并这两个case
case Pyc::SETUP_EXCEPT_A case Pyc::SETUP_FINALLY_A
只是应急方案,有一堆后遗症。
4) JUMP_IF_NOT_EXC_MATCH
参看
https://docs.python.org/3.9/library/dis.html
JUMP_IF_NOT_EXC_MATCH(target) 3.9
Tests whether the second value on the stack is an exception matching TOS, and jumps if it is not. Pops two values from the stack.
New in version 3.9.
case Pyc::JUMP_IF_NOT_EXC_MATCH_A :
{
PycRef
pycdc需做大量模式匹配,基于经验结论组织反编译结果,JUMP_IF_NOT_EXC_MATCH的 前导指令不一定是LOAD_NAME,上例只处理了这一种情形。
5) RERAISE
参看
https://docs.python.org/3.9/library/dis.html
RERAISE 3.9
Re-raises the exception currently on top of the stack.
New in version 3.9.
case Pyc::RERAISE :
{
// stack.pop();
// stack.pop();
// stack.pop();
PycRef
一上来就弹了三次栈,这个操作可能不对,毕竟这是反编译过程,而非执行过程,反 编译时POP_EXCEPT就啥也没干,但实在懒得测试各种情形,将就对付吧。
☆ 后记
pycdc将各种Python版本的指令放在一起处理,这样干的坏处太多。比如某指令在几 个Python版本之间发生变化,pycdc很可能未测试到此情形,但又不发出警告,最后 反编译结果一乱糟,很难定位root cause。pycdc已为向后兼容性所拖累,其代码可 维护性越来越差,感觉作者只是在垂死挣扎。
开发Python反编译器,最大困难是CFG模式识别,纯体力活,需要精心准备各种测试 用例,一不留神就覆盖不到,这得是靠爱发电的人干的事儿。
我没有大修过pycdc。