羊城杯OddCode題解(unicorn模擬調試+求解)
本文為看雪論壇優秀文章
看雪論壇作者ID:34r7hm4n
首先恭喜0x401 Team首次在CTF比賽中奪得第一名,順便和學弟AK了逆向,戰隊能取得今天的成績離不開隊員的努力。但是不得不承認另一個原因是,很多強隊的火力都被隔壁RCTF吸引了,我們還需要繼續努力:

自從上次強網杯unicorn那題以來我就一直對unicorn很感興趣,但平時又沒有用unicorn解決實際問題的場景,這次羊城杯總算碰到了,借此機會學習一下unicorn。OddCode這題有大量花指令和垃圾跳轉,手動分析幾乎不可能,如果使用unicorn模擬執行會方便很多。
比賽時一直爆肝到凌晨5點才把這題弄出來(被屑出題人的大小寫坑了3個小時),本文的解法是我賽后優化之后的解法。
1
概覽-32位模式部分
首先這是一個32位的可執行文件:

沒有main函數,程序直接從start函數開始執行,首先是一段校驗輸入格式的代碼:

一個很奇怪的遠跳轉,一開始在IDA看了半天沒明白是什么意思,用WinDbg調試后才發現IDA的反匯編有問題:

實際上是一個遠跳轉到33:2E5310這個地址,遠跳轉有一個隱形的操作,他會將代碼段寄存器CS設置為跳轉到的這個段對應的段選擇子,這里執行完了遠跳轉之后,CS的值被置為0x33:

這里涉及一個我之前折騰自制操作系統時接觸到的一個知識點——在Windows中,程序可以通過修改代碼段寄存器切換32位模式和64位模式,當CS為0x33時,CPU按64位模式執行指令,當CS為0x23,時,CPU按32位模式執行指令。執行完這個遠跳轉后,程序跳轉到2E5310這個地址(也就是下一條指令),CPU切換到64位模式執行,所以接下來的代碼都要按64位模式解析。
這個技術的一個典型應用是在惡意代碼領域,參考:天堂之門(Heaven’s Gate)技術的詳細分析(https://www.freebuf.com/articles/web/209983.html)
切換到64位模式后,執行sub_2E1010函數:

接下來一段代碼的作用是將CS的值改回0x23,切回32位模式:

最后根據sub_2E1010函數的返回值判斷輸入是否正確,所以本題的關鍵是sub_2E1010函數:

2
概覽-64位模式部分
先到回到這個部分,遠跳轉前的兩個lea語句相當于是傳遞參數,把input存入esi,把key存入edi:

key是一個16字節的數組,推測之后加密或者校驗輸入的時候會用上:

從sub_2E1010函數開始的代碼要用64位模式查看:


從這里開始有大量的垃圾代碼和花指令,手動分析了半天都沒找到關鍵代碼,于是我決定用unicorn寫一個模擬調試器幫我找到關鍵代碼。
3
unicorn模擬調試
最開始看到用unicorn實現調試器的思路是在這篇文章:匯編與反匯編神器Unicorn(https://www.52pojie.cn/thread-1026209-1-1.html)。里面用到的調試器代碼貌似出自無名俠:

我們也來寫個簡單的調試器來模擬64位代碼的執行,并且實現一個tracer,用來跟蹤代碼塊執行的軌跡:
from unicorn import *from unicorn.x86_const import * ADDRESS = 0x2E1000 # 程序加載的地址INPUT_ADDRESS = 0x2E701D # 輸入的地址KEY_ADDRESS = 0x2E705C # 16字節key的地址with open('OddCode.exe', 'rb') as file: file.seek(0x400) X64_CODE = file.read(0x4269) # 讀取代碼 class Unidbg: def __init__(self, flag): mu = Uc(UC_ARCH_X86, UC_MODE_64) # 基址為0x2E1000,分配16MB內存 mu.mem_map(ADDRESS, 0x1000000) mu.mem_write(ADDRESS, X64_CODE) mu.mem_write(INPUT_ADDRESS, flag) # 隨便寫入一個flag mu.mem_write(KEY_ADDRESS, b'\x90\xF0\x70\x7C\x52\x05\x91\x90\xAA\xDA\x8F\xFA\x7B\xBC\x79\x4D') # 初始化寄存器,寄存器的狀態就是切換到64位模式之前的狀態,可以通過動調得到 mu.reg_write(UC_X86_REG_RAX, 1) mu.reg_write(UC_X86_REG_RBX, 0x51902D) mu.reg_write(UC_X86_REG_RCX, 0xD86649D8) mu.reg_write(UC_X86_REG_RDX, 0x2E701C) mu.reg_write(UC_X86_REG_RSI, INPUT_ADDRESS) # input參數 mu.reg_write(UC_X86_REG_RDI, KEY_ADDRESS) # key參數 mu.reg_write(UC_X86_REG_RBP, 0x6FFBBC) mu.reg_write(UC_X86_REG_RSP, 0x6FFBAC) mu.reg_write(UC_X86_REG_RIP, 0x2E1010) mu.hook_add(UC_HOOK_CODE, self.trace) # hook代碼執行,保存代碼塊執行軌跡 self.mu = mu self.except_addr = 0 self.traces = [] # 用來保存代碼塊執行軌跡 def trace(self, mu, address, size, data): if address != self.except_addr: self.traces.append(address) self.except_addr = address + size def start(self): try: self.mu.emu_start(0x2E1010, -1) except: pass print([hex(addr)for addr in self.traces]) Unidbg(b'SangFor{00000000000000000000000000000000}').start()
unicorn可以hook代碼塊執行,但是會被花指令干擾,所以這里通過hook指令執行,再判斷當前的地址是否與上次執行的地址+上一條指令的長度是否相等來判斷是否發生了代碼塊跳轉:
def trace(self, mu, address, size, data): if address != self.except_addr: self.traces.append(address) self.except_addr = address + size
模擬執行的過程中會莫名其妙報錯,所以直接加了一個try,最后打印出來的軌跡如下:
['0x2e1010', '0x2e3634', '0x2e3e1d', '0x2e389c', '0x2e3d9e', '0x2e3b8e', '0x2e37ae', '0x2e3f3a', '0x2e4ee5', '0x2e51ad', '0x2e45f9', '0x2e4e03', '0x2e3c8f', '0x2e4cf1', '0x2e4e96', '0x2e3d49', '0x2e3641', '0x2e4ca8', '0x2e49fd', '0x2e5109', '0x2e4e16', '0x2e382a', '0x2e48f1', '0x2e3ec2', '0x2e4567', '0x2e3a7e', '0x2e4ae0', '0x2e3718', '0x2e402f', '0x2e4ba1', '0x2e4263', '0x2e4441', '0x2e4af2', '0x2e42f7', '0x2e5163', '0x2e3dd1', '0x2e49b7', '0x2e4907', '0x2e4ddb', '0x2e2896', '0x2e2e08', '0x2e35a4', '0x2e2bd2', '0x2e32a2', '0x2e2cf2', '0x2e296d', '0x2e2eb6', '0x2e3391', '0x2e2f9b', '0x2e2ff8', '0x2e2b83', '0x2e3082', '0x2e2ab3', '0x2e333e', '0x2e2ee9', '0x2e2bc5', '0x2e3519', '0x2e3447', '0x2e31a1', '0x2e33fa', '0x2e2bba', '0x2e3623', '0x2e2b95', '0x2e2e99', '0x2e308d', '0x2e33a0', '0x2e3473', '0x2e35ac', '0x2e2b21', '0x2e2980', '0x2e341d', '0x2e31d4', '0x2e32ab', '0x2e30e2', '0x2e289c', '0x2e2acb', '0x2e30f4', '0x2e34f8', '0x2e3176', '0x2e2e5d', '0x2e2cfe', '0x2e2bfb', '0x2e2f15', '0x2e2c6e', '0x2e2ea5', '0x2e305d', '0x2e2f91', '0x2e3267', '0x2e3210', '0x2e324a', '0x2e330f', '0x2e32d9', '0x2e2e78', '0x2e2924', '0x2e34d5', '0x2e2c19', '0x2e3121', '0x2e2907', '0x2e2a75', '0x2e332e', '0x2e2dc9', '0x2e2edc', '0x2e353d', '0x2e2c2f', '0x2e2cd4', '0x2e28e4', '0x2e2b6c', '0x2e3481', '0x2e294b', '0x2e2b40', '0x2e2e83', '0x2e2f4d', '0x2e31f8', '0x2e4df6', '0x2e4177', '0x2e496d', '0x2e37a1', '0x2e3a3a', '0x2e4d76', '0x2e3e38', '0x2e45bc', '0x2e3f86', '0x2e3df5', '0x2e4242', '0x2e3aee', '0x2e5039', '0x2e3ff8', '0x2e4cb9', '0x2e48a1', '0x2e4135', '0x2e3d05', '0x2e4bd9', '0x2e3c0e', '0x2e5133', '0x2e42d7', '0x2e4bff', '0x2e39fe', '0x2e50a8', '0x2e4a2f', '0x2e4e6a', '0x2e43f6', '0x2e401d', '0x2e43a1', '0x2e4b95', '0x2e37d5', '0x2e404d', '0x2e37c6', '0x2e46b3', '0x2e5120', '0x2e5013', '0x2e5075', '0x2e4673', '0x2e45e1', '0x2e3ba2', '0x2e4802', '0x2e481c', '0x2e38d6', '0x2e4f11', '0x2e4494', '0x2e41f1', '0x2e3853', '0x2e504d', '0x2e4529', '0x2e50df', '0x2e3671', '0x2e3968', '0x2e3741', '0x2e4074', '0x2e368e', '0x2e4ffb', '0x2e4c86', '0x2e491f', '0x2e432b', '0x2e3e8c', '0x2e3f97', '0x2e38e5', '0x2e44bc', '0x2e444e', '0x2e3a48', '0x2e39c9', '0x2e46d2', '0x2e3982', '0x2e3eed', '0x2e4682', '0x2e3d7c', '0x2e3eb6', '0x2e3c25', '0x2e4390', '0x2e462c', '0x2e4957', '0x2e4a0c', '0x2e486e', '0x2e493b', '0x2e4479', '0x2e4760', '0x2e4ed5', '0x2e4eb6', '0x2e4d52', '0x2e39a8', '0x2e41bb', '0x2e4e48', '0x2e39b4', '0x2e513e', '0x2e41a4', '0x2e473a', '0x2e4abe', '0x2e47d8', '0x2e4650', '0x2e51b7', '0x2e4367', '0x2e3b75', '0x2e3c63', '0x2e4542', '0x2e487f', '0x2e4b79', '0x2e4ccc', '0x2e3cc8', '0x2e4d28', '0x2e36f1', '0x2e4a7b', '0x2e3cd3', '0x2e3e98', '0x2e4f28', '0x2e3847', '0x2e38ac', '0x2e365c', '0x2e454f', '0x2e3944', '0x2e4105', '0x2e4506', '0x2e4bb6', '0x2e3893', '0x2e4c71', '0x2e3839', '0x2e4f3b', '0x2e3bca', '0x2e3795', '0x2e3b16', '0x2e40c9', '0x2e3d3c', '0x2e3afe', '0x2e5230', '0x2e419c']
這么多代碼塊一個個去手動分析不太現實,于是再加一個hook來hook輸入和key的訪問操作,來幫助我們找到了訪問了輸入和key的指令所在的代碼塊,加上:
mu.hook_add(UC_HOOK_MEM_READ, self.hook_mem_read) def hook_mem_read(self, mu, access, address, size, value, data): if address >= INPUT_ADDRESS and address <= INPUT_ADDRESS + 41: print(f'Read input[{address - INPUT_ADDRESS}] at {hex(mu.reg_read(UC_X86_REG_RIP))}') if address >= KEY_ADDRESS and address <= KEY_ADDRESS + 16: print(f'Read key[{address - KEY_ADDRESS}] at {hex(mu.reg_read(UC_X86_REG_RIP))}')
輸出:
Read input[8] at 0x2e326dRead input[8] at 0x2e3214Read input[8] at 0x2e3219Read input[9] at 0x2e324aRead input[9] at 0x2e3254Read input[9] at 0x2e325eRead key[0] at 0x2e3a3e
通過內存訪問hook我們得到了幾個很重要的信息:
- 讀取輸入的地址
- 讀取key的地址
- 輸入可能恒是2字節一組進行加密后比較
- 當前比對失敗后程序不會繼續比對剩下的部分
第三、四個特點是一個伏筆,之后我們會利用這個性質對flag進行爆破。
接下來看到訪問了輸入的幾段代碼,這些代碼的作用是將第一個字節讀入到al,第二個字節讀入到bl:



...
從這里開始,順著我們之前打印出的軌跡往后再分析一會還能發現這樣的代碼:


說明程序確實是將16進制兩字節的輸入轉換成了對應的16進制數。
再來看到訪問了key的代碼塊:

我們再修改一下trace函數,通過capstone反匯編引擎找到執行到的cmp指令和test指令的地址:
def trace(self, mu, address, size, data): disasm = self.md.disasm(mu.mem_read(address, size), address) for i in disasm: mnemonic = i.mnemonic if mnemonic == 'cmp' or mnemonic == 'test': print(f'Instruction {mnemonic} at {hex(address)}') if address != self.except_addr: self.traces.append(address) self.except_addr = address + size
輸出:
Instruction cmp at 0x2e3ca1Instruction cmp at 0x2e4de8Instruction cmp at 0x2e326dRead input[8] at 0x2e326dInstruction cmp at 0x2e3214Read input[8] at 0x2e3214Read input[8] at 0x2e3219Instruction cmp at 0x2e324aRead input[9] at 0x2e324aInstruction cmp at 0x2e3254Read input[9] at 0x2e3254Read input[9] at 0x2e325eInstruction test at 0x2e4177Read key[0] at 0x2e3a3eInstruction cmp at 0x2e38e7
可以看到在讀取key之后執行的cmp指令只有一個,位于2E38E7這個地址,代碼如下,大致可以確定是flag加密后比較的代碼,比對成功的話不會執行jnz跳轉:

所以我們可以通過記錄程序第幾次執行到了2E38EF這個地址,來判斷比較成功比對了幾個字節,通過這種方法來爆破flag。
4
爆破flag
再改一下trace函數:
def trace(self, mu, address, size, data): ''' disasm = self.md.disasm(mu.mem_read(address, size), address) for i in disasm: mnemonic = i.mnemonic if mnemonic == 'cmp' or mnemonic == 'test': print(f'Instruction {mnemonic} at {hex(address)}') ''' if address != self.except_addr: self.traces.append(address) self.except_addr = address + size if address == 0x2E38EF: self.hit += 1 #print(f'hit {self.hit}') if self.hit == self.except_hit: self.success = True mu.emu_stop()
爆破flag的函數get_flag:
def get_flag(flag, except_hit): for i in b'1234567890abcdefABCDEF': for j in b'1234567890abcdefABCDEF': flag[8 + (except_hit - 1) * 2] = i flag[8 + (except_hit - 1) * 2 + 1] = j if Unidbg(bytes(flag), except_hit).solve(): return
這里選擇的字符集為b'1234567890abcdefABCDEF',包括了小寫的字母,比賽的時候我是根據traces手動分析加密流程,被大小寫坑了幾個小時。爆破結果如下:
SangFor{A7000000000000000000000000000000}SangFor{A7A40000000000000000000000000000}SangFor{A7A4A000000000000000000000000000}SangFor{A7A4A0C0000000000000000000000000}SangFor{A7A4A0C0B10000000000000000000000}SangFor{A7A4A0C0B10B00000000000000000000}SangFor{A7A4A0C0B10Baf000000000000000000}SangFor{A7A4A0C0B10Bafa70000000000000000}SangFor{A7A4A0C0B10Bafa77600000000000000}SangFor{A7A4A0C0B10Bafa776F5000000000000}SangFor{A7A4A0C0B10Bafa776F55F0000000000}SangFor{A7A4A0C0B10Bafa776F55FF400000000}SangFor{A7A4A0C0B10Bafa776F55FF4F8000000}SangFor{A7A4A0C0B10Bafa776F55FF4F8C60000}SangFor{A7A4A0C0B10Bafa776F55FF4F8C6E800}SangFor{A7A4A0C0B10Bafa776F55FF4F8C6E849}

5
完整exp
from ctypes import addressoffrom unicorn import *from unicorn.x86_const import *from capstone import * ADDRESS = 0x2E1000 # 程序加載的地址INPUT_ADDRESS = 0x2E701D # 輸入的地址KEY_ADDRESS = 0x2E705C # 16字節key的地址with open('OddCode.exe', 'rb') as file: file.seek(0x400) X64_CODE = file.read(0x4269) # 讀取代碼 class Unidbg: def __init__(self, flag, except_hit): self.except_hit = except_hit self.hit = 0 self.success = False mu = Uc(UC_ARCH_X86, UC_MODE_64) # 基址為0x2E1000,分配16MB內存 mu.mem_map(ADDRESS, 0x1000000) mu.mem_write(ADDRESS, X64_CODE) mu.mem_write(INPUT_ADDRESS, flag) # 隨便寫入一個flag mu.mem_write(KEY_ADDRESS, b'\x90\xF0\x70\x7C\x52\x05\x91\x90\xAA\xDA\x8F\xFA\x7B\xBC\x79\x4D') # 初始化寄存器,寄存器的狀態就是切換到64位模式之前的狀態,可以通過動調得到 mu.reg_write(UC_X86_REG_RAX, 1) mu.reg_write(UC_X86_REG_RBX, 0x51902D) mu.reg_write(UC_X86_REG_RCX, 0xD86649D8) mu.reg_write(UC_X86_REG_RDX, 0x2E701C) mu.reg_write(UC_X86_REG_RSI, INPUT_ADDRESS) # input參數 mu.reg_write(UC_X86_REG_RDI, KEY_ADDRESS) # key參數 mu.reg_write(UC_X86_REG_RBP, 0x6FFBBC) mu.reg_write(UC_X86_REG_RSP, 0x6FFBAC) mu.reg_write(UC_X86_REG_RIP, 0x2E1010) mu.hook_add(UC_HOOK_CODE, self.trace) # hook代碼執行,保存代碼塊執行軌跡 #mu.hook_add(UC_HOOK_MEM_READ, self.hook_mem_read) self.mu = mu self.except_addr = 0 self.traces = [] # 用來保存代碼塊執行軌跡 self.md = Cs(CS_ARCH_X86, CS_MODE_64) def trace(self, mu, address, size, data): ''' disasm = self.md.disasm(mu.mem_read(address, size), address) for i in disasm: mnemonic = i.mnemonic if mnemonic == 'cmp' or mnemonic == 'test': print(f'Instruction {mnemonic} at {hex(address)}') ''' if address != self.except_addr: self.traces.append(address) self.except_addr = address + size if address == 0x2E38EF: self.hit += 1 #print(f'hit {self.hit}') if self.hit == self.except_hit: self.success = True mu.emu_stop() def hook_mem_read(self, mu, access, address, size, value, data): if address >= INPUT_ADDRESS and address <= INPUT_ADDRESS + 41: print(f'Read input[{address - INPUT_ADDRESS}] at {hex(mu.reg_read(UC_X86_REG_RIP))}') if address >= KEY_ADDRESS and address <= KEY_ADDRESS + 16: print(f'Read key[{address - KEY_ADDRESS}] at {hex(mu.reg_read(UC_X86_REG_RIP))}') def solve(self): try: self.mu.emu_start(0x2E1010, -1) except: pass return self.success def get_flag(flag, except_hit): for i in b'1234567890abcdefABCDEF': for j in b'1234567890abcdefABCDEF': flag[8 + (except_hit - 1) * 2] = i flag[8 + (except_hit - 1) * 2 + 1] = j if Unidbg(bytes(flag), except_hit).solve(): return flag = bytearray(b'SangFor{00000000000000000000000000000000}')for i in range(1, 17): get_flag(flag, i) print(flag.decode())
看雪ID:34r7hm4n
https://bbs.pediy.com/user-home-910514.htm