标题: 逆向工程中面对并理解编译器对除法的优化
创建: 2016-03-31 12:47 更新: 2024-11-06 14:30 链接: https://scz.617.cn/misc/201603311247.txt
参看:
Understanding of MSVS C++ compiler optimizations - Hans Passant [2014-07-08] http://stackoverflow.com/questions/24628899/understanding-of-msvs-c-compiler-optimizations
这是我看过的最简洁的解答:
Processors are not very good at dividing, an idiv can take between 11 and 18 cycles. As opposed to shifts and multiplies, they usually only take a single cycle.
So the optimizer replaced your division by a multiplication using fixed-point math, taking advantage of a 32-bit multiply producing a 64-bit result into edx:eax. Back-of-the-envelope:
n / 100 = n * 0.32 / 32 = n * (0.32 * pow(2,32)) / pow(2,32) / 32
Those divisions are very cheap, just a right-shift. And the multiplier becomes:
0.32 * pow(2,32) = 1374389534.72 ~= 1374389535 = 0x51eb851f (向上取整)
n / 10 = n * 0.4 / 4 = n * (0.4 * pow(2,32)) / pow(2,32) / 4 0.4 * pow(2,32) = 1717986918.4 ~= 1717986919 = 0x66666667 (向上取整)
n / 5 = n * 2 / 10 = n * (2 * 0.4 * pow(2,32)) / pow(2,32) / 4 2 * 0.4 * pow(2,32) = 3435973836.8 ~= 3435973837 = 0xcccccccd (向上取整)
n / 3 = n * (2/3) / 2 = n * ((2/3) * pow(2,32)) / pow(2,32) / 2 (2/3) * pow(2,32) ~= 2863311530.667 ~= 2863311531 = 0xaaaaaaab (向上取整)
n / 7 = n * (4/7) / 4 = n * ((4/7) * pow(2,32)) / pow(2,32) / 4 (4/7) * pow(2,32) ~= 2454267026.2857 ~= 2454267026 = 0x92492492 (向下取整)
有多种优化方式,比如:
n / 10 = n * 0.4 / 4 = n * (0.4 * pow(2,32)) / pow(2,32) / 4 n / 10 = n / 5 / 2 = n * (0.8 * pow(2,32)) / pow(2,32) / 8
n / 5 = n / 10 * 2 = n * (0.4 * pow(2,32)) / pow(2,32) / 2 n / 5 = n / 10 * 2 = n * (0.8 * pow(2,32)) / pow(2,32) / 4
n / 7 = n * (4/7) / 4 = n * (4/7 * pow(2,32)) / pow(2,32) / 4 n / 7 = n * (2/7) / 2 = n * (2/7 * pow(2,32)) / pow(2,32) / 2 n / 7 = n * (1/7) / 1 = n * (1/7 * pow(2,32)) / pow(2,32) / 1
保底的优化:
n / c = n * (1/c) / 1 = n * ((1/c) * pow(2,32)) / pow(2,32) magic = (1/c) * pow(2,32)
一般形式:
假设c不是2的幂,b为小于c的2的幂,d为b的对数,d不要求为符合要求的最大值
n / c = n * (b/c) / b = n * ((b/c) * pow(2,32)) / pow(2,32) / b magic = (b/c) * pow(2,32) pow(2,d) = b d = int(math.log2(b)) n / c = n * magic / pow(2,32) / b = (n * magic) >> 32 >> d
示例:
c = 125 b = 8 d = int(math.log2(b)) = 3 magic = (b/c) * pow(2,32) = 274877906.944 ~= 274877907 = 0x10624dd3 (向上取整) n / 125 = n * magic / pow(2,32) / b = (n * 0x10624dd3) >> 32 >> 3
前面只是介绍了编译器对除法优化的基本原理,并不是严格的数学推导。这里涉及一 个magic如何取整的问题,前述例子中4/7的情形是向下取整。前文没有解释如何取整, 并不是四舍五入,理解其本质需要一些初等代数的知识,下文讲了一些,但也不太如 意。
快速立即除法的乘法实现(通用算法) - HAM [2009-10-31] http://blog.csdn.net/concreteham/article/details/4750740
如果拿0x51eb851f做关键字进行百度搜索,能找到一些从汇编角度讲解这种优化的中 文文章,其中有一篇被四处转载的,还做什么误差分析,挺扯的。
小H找了一篇讲除法优化原理及相关数学证明的文章:
Division by constant unsigned integers https://rubenvannieuwpoort.nl/posts/division-by-constant-unsigned-integers
Theorem 3 (round-up method) Theorem 6
但我看不下去,脑子已经不好使了,有兴趣且有能力的,可以一试。
有一些现成的程序,在指定除数的情况下给出优化结果:
http://win32assembly.programminghorizon.com/files/magicnumber.zip http://www.masmforum.com/archive2012/6717_MagicNumber64.zip
推荐后者,将32位无符号X除以125优化成乘法运算,其中一种方案如下:
mov eax,X mov edx,010624DD3h ; ; 乘法结果保存在edx:eax,高位在edx,低位在eax ; mul edx ; ; 不关心乘法结果中的eax ; shr edx,03h ; edx = quotient
这是编译器对除法的优化,一般人这辈子都不需要关心这种优化。但是,对于逆向工 程来说,可能必须面对并理解这种优化,现有F5无法将优化结果还原成人类可读方式。
话说回来,从功利性实践角度讲,不理解优化原理也无所谓,在逆向输出中看到神密 常量,放狗搜之即可,一般秒中。
罗列这种神密常量没有意义,尤其同一除数存在多种优化方式的情况下,下面这些仅 仅是便于查找:
0xaaaaaaab 3 2/3 0xcccccccd 5 4/5 0x92492492 7 4/7 0x49249249 7 2/7 0x24924925 7 1/7 0x38e38e39 9 0x66666667 10 4/10 0x51eb851f 100 32/100 0x10624dd3 1000
有本书可以看看:
《Hacker's Delight 2nd Edition》
2024-11-05
《逆向工程中面对并理解编译器对除法的优化》 https://scz.617.cn/misc/201603311247.txt
《逆向工程中面对并理解编译器对除法的优化(2)》 https://scz.617.cn/misc/202007071258.txt