<menu id="guoca"></menu>
<nav id="guoca"></nav><xmp id="guoca">
  • <xmp id="guoca">
  • <nav id="guoca"><code id="guoca"></code></nav>
  • <nav id="guoca"><code id="guoca"></code></nav>

    指令級工具Dobby源碼閱讀

    VSole2022-07-29 16:55:04

    Dobby一共兩個功能,其一是inlinehook,其二是指令插樁,兩者原理差不多,主要介紹指令插樁。

    所謂指令插樁,就是在任意一條指令(函數頭或者函數內部都行),進行插樁,執行到這條指令的時候,會去執行我們定義的回調函數,然后再回來執行原來的指令流。使用方法:

    int res_instument = DobbyInstrument((void *) addr, offset_name_handler);//handler即我們自定義的回調 //RegisterContext為寄存器上下文,HookEntrtInfo為hook一些必要信息,比如hook地址等void offset_name_handler(RegisterContext *ctx, const HookEntryInfo *info)typedef struct _RegisterContext {  uint32_t dummy_0;  uint32_t dummy_1;   uint32_t dummy_2;  uint32_t sp;   union {    uint32_t r[13];    struct {      uint32_t r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12;    } regs;  } general;   uint32_t lr;} RegisterContext //HookEntryInfo為hook地址及idtypedef struct _HookEntryInfo {  int hook_id;  union {    void *target_address;    void *function_address;    void *instruction_address;  };} HookEntryInfo;
    

    一、工作原理

    所謂聽君一席話,勝讀十年書;看君一張圖,勝過十席話。用圖來說明最好不過了,我在閱讀的過程中也是一邊梳理一邊畫圖。

    被插樁指令處被替換為:

    ------------------------------------------------------------------------------process 60890x9d639d32 nop0x9d639d34 ldr.w pc, [pc, #-0x0]0x9d639d38  //地址 0xcea0a0ac0x9d639d38            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF00000000  ac a0 a0 ce                                      ....------------------------------------------------------------------------------
    

    arm處理器采用指令流水技術,即取指譯碼執行三階段同步進行,pc寄存器指向的是正在取指的指令,arm模式中為當前執行的指令地址+8,thhumb模式中為當前位置+4,故而上面的ldr執行的時候,pc寄存器值為ldr指令位置+4,所以ldr,pc,[pc,-0x0]剛好是把下一條內容放入pc中,即跳轉了。

    這種跳轉方式支持的范圍是一個寄存器的寬度,也就是32位,4g內存,linux進程的虛擬地址空間好像也是4g,這樣就可以進程全地址跳轉了。那會跳轉到哪里呢,跳轉到

    prologue_dispatch_bridge

    0xcea0a0ac。

    0xcea0a0ac ldr ip, [pc]0xcea0a0b0 ldr pc, [pc]0xcea0a0b4  //地址 0xa2305b800xcea0a0b8  //地址 0xcea0a0000xcea0a0b4            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF00000000  80 5b 30 a2 00 a0 a0 ce
    

    主要做了兩件事,第一,把0xa2305b80放到ip寄存器,第二,跳轉到0xcea0a000。注意,這里的是arm模式的指令,pc偏移是8。

    其中,0xcea0a000就是closure bridge上半場。

    0xcea0a000 sub sp, sp, #0x380xcea0a004 str lr, [sp, #0x34]0xcea0a008 str ip, [sp, #0x30]0xcea0a00c str fp, [sp, #0x2c]0xcea0a010 str sl, [sp, #0x28]0xcea0a014 str sb, [sp, #0x24]0xcea0a018 str r8, [sp, #0x20]0xcea0a01c str r7, [sp, #0x1c]0xcea0a020 str r6, [sp, #0x18]0xcea0a024 str r5, [sp, #0x14]0xcea0a028 str r4, [sp, #0x10]0xcea0a02c str r3, [sp, #0xc]0xcea0a030 str r2, [sp, #8]0xcea0a034 str r1, [sp, #4]0xcea0a038 str r0, [sp]0xcea0a03c add r0, sp, #0x380xcea0a040 sub sp, sp, #80xcea0a044 str r0, [sp, #4]0xcea0a048 sub sp, sp, #80xcea0a04c mov r0, sp0xcea0a050 mov r1, ip0xcea0a054 bl #0xcea0a05c0xcea0a058 b #0xcea0a0640xcea0a05c ldr pc, [pc, #-4]0xcea0a060 //地址 0x9d2b43e10xcea0a060            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF00000000  e1 43 2b 9d
    

    這里0xcea0a060處的0x9d2b43e1是高層handler,高層handler會調用我們自定義的handler,就是它。

    instrument_call_forward_handler

    void instrument_call_forward_handler(RegisterContext *ctx, HookEntry *entry) {  DynamicBinaryInstrumentRouting *route = (DynamicBinaryInstrumentRouting *)entry->route;  if (route->handler) {    DBICallTy handler;    HookEntryInfo entry_info;    entry_info.hook_id = entry->id;    entry_info.instruction_address = entry->instruction_address;    handler = (DBICallTy)route->handler;    (*handler)(ctx, (const HookEntryInfo *)&entry_info);  }   // set prologue bridge next hop address with origin instructions that have been relocated(patched)  set_routing_bridge_next_hop(ctx, entry->relocated_origin_instructions);}
    

    這個handler除了調用我們的handler,還做了一件茍且的事情,后面會說到。

    梳理一下這個closure bridge,首先保存寄存器環境,然后到地址0xcea0a054時,用bl指令跳到 0xcea0a05c,0xcea0a05c通過ldr方式找到高層handler地址并且調用,注意,bl指令會把下一條指令地址,即0xcea0a058放入lr寄存器,當bl跳到指定函數并且執行之后,函數會返回到lr寄存器保存的地址,即0xcea0a058 b #0xcea0a064,看看0xcea0a064內容。

    closure bridge下半場

    0xcea0a064 add sp, sp, #80xcea0a068 add sp, sp, #80xcea0a06c pop {r0}0xcea0a070 pop {r1}0xcea0a074 pop {r2}0xcea0a078 pop {r3}0xcea0a07c pop {r4}0xcea0a080 pop {r5}0xcea0a084 pop {r6}0xcea0a088 pop {r7}0xcea0a08c pop {r8}0xcea0a090 pop {sb}0xcea0a094 pop {sl}0xcea0a098 pop {fp}0xcea0a09c pop {ip}0xcea0a0a0 pop {lr}0xcea0a0a4 mov pc, ip
    

    做的事情很平常,就是把之前上半場保存的寄存器出棧,同時恢復棧平衡;只有一點不平常,就是最后一條mov,pc,ip,跳到ip寄存存保存的地址,那么ip寄存起保存的地址是啥呢,還記得上文說的茍且之事嗎?

    instrument_call_forward_handler函數的最后一句

    // set prologue bridge next hop address with origin instructions that have been relocated(patched)  set_routing_bridge_next_hop(ctx, entry->relocated_origin_instructions); void set_routing_bridge_next_hop(RegisterContext *ctx, void *address) {  *reinterpret_cast<void **>(&ctx->general.regs.r12) = address;}
    

    就是把entry->relocated_origin_instructions的內容賦給r12寄存器,這個entry->relocated_origin_instructions就是原始指令的重定位之后的位置。

    因為原始指令被我們patch成了ldr pc,[pc,-4]以及一條地址,這些被patch的指令會被修復好,放在entry->relocated_origin_instructions(指令修復問題后文繼續說),執行完修復好的原始指令之后,會跳回到被patch的原始指令之后的那些指令,繼續執行,這個過程大致如下:

    原指令

    因為是patch需要至少8字節,而這里原始指令是thumb,所以patch了四條,修復好的。

    重定位后的指令

    ------------------------------------------------------------------------------process 60890xcea0a0c0 nop0xcea0a0c2 nop0xcea0a0c4 push {r0, r1, r2, lr}0xcea0a0c6 nop0xcea0a0c8 cbz r0, #0xcea0a0cc0xcea0a0ca nop0xcea0a0cc b.w #0xcea0a0d00xcea0a0d0 ldr.w pc, [pc, #0x14]  0xcea0a0d0 + 0x14+thumb_pc_offset(4)=0xcea0a0e8,即 0x9d639d450xcea0a0d4 nop0xcea0a0d6 nop0xcea0a0d8 add r2, sp, #80xcea0a0da nop0xcea0a0dc str r1, [r2, #-0x4]!0xcea0a0e0 ldr.w pc, [pc, #-0x0]   同理,0x9d639d3d0xcea0a0e4 //地址 0x9d639d3d0xcea0a0e8 //地址 0x9d639d450xcea0a0e4            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF00000000  3d 9d 63 9d 45 9d 63 9d                          =.c.E.c.------------------------------------------------------------------------------
    

    指令修復的邏輯是,pc相關的指令,采用ldr給它跳回正確的位置;pc無關的指令,直接復制過來,這里push指令,add指令以及str.w指令都被直接復制過來,插入的一些nop指令是為了4字節對齊,至于為啥要對齊我就不知道了,印象中thumb指令似乎是兩字節對齊就可以了。

    原指令中,只有cbz是pc相關的,這條指令的語義是,看r0寄存器是否為零,為零則跳轉到到給定位置,這個例子中是跳轉到當前位置+0x10,即偏移25ED44處;可以看到,Dobby的修復手段是,修改cbz指令,如果r0為0,則通過ldr(0xcea0a0d0 ldr.w pc, [pc, #0x14])指令,給它跳回偏移25ED44處(0x9d639d45)。

    若不為0,則通過ldr(0xcea0a0e0 ldr.w pc, [pc, #-0x0])跳轉到被patch指令之后的指令繼續去執行,在這里是偏移25ED3C(0x9d639d3d),這兩條地址都加了個1,是因為原指令是thumb指令,arm處理器通過末尾地址是否為1來確定采用arm模式還是thumb模式,為1采用thum模式。至此,整個Dobby指令插樁的邏輯就完了。

    二、代碼詳解

    先遍歷一個HookEntry鏈表,這個鏈表保存了每一次插樁的信息;每插樁一條指令,都會生成一個HookEntry結構體并且添加到這個鏈表。遍歷這個鏈表可以判斷當前要插樁的指令是否被插過。

    route->DispatchRouting();為重點方法,這個方法完成幾乎所有的插樁工作,route->DispatchRouting()調用了兩個方法BuildDynamicBinaryInstrumentRouting()和GenerateRelocatedCode(trampolinebuffer->getSize())。

    void DynamicBinaryInstrumentRouting::DispatchRouting() {  BuildDynamicBinaryInstrumentRouting();   // generate relocated code which size == trampoline size  GenerateRelocatedCode(trampoline_buffer_->getSize());}
    

    BuildDynamicBinaryInstrumentRouting()

    void DynamicBinaryInstrumentRouting::BuildDynamicBinaryInstrumentRouting() {  // create closure trampoline jump to prologue_routing_dispath with the `entry_` data   ClosureTrampolineEntry *closure_trampoline;   void *handler = (void *)instrument_routing_dispatch;#if __APPLE__#if __has_feature(ptrauth_calls)  handler = __builtin_ptrauth_strip(handler, ptrauth_key_asia);#endif#endif    closure_trampoline = ClosureTrampoline::CreateClosureTrampoline(entry_, handler);  this->SetTrampolineTarget(closure_trampoline->address);    DLOG(0, "[closure bridge] Carry data %p ", entry_);    DLOG(0, "[closure bridge] Create prologue_dispatch_bridge %p", closure_trampoline->address);   // generate trampoline buffer, run before `GenerateRelocatedCode`  GenerateTrampolineBuffer(entry_->target_address, GetTrampolineTarget());}
    

    其中,closuretrampoline = ClosureTrampoline::CreateClosureTrampoline(entry, handler);會生成

    prologue_dispatch_bridge的那些匯編指令,其中__ EmitAddress((uint32_t)get_closure_bridge())是一個重點,closure_bridge的指令在這里生成。

    ClosureTrampolineEntry *ClosureTrampoline::CreateClosureTrampoline(void *carry_data, void *carry_handler) {    ClosureTrampolineEntry *entry = nullptr;  entry = new ClosureTrampolineEntry; #ifdef ENABLE_CLOSURE_TRAMPOLINE_TEMPLATE#define CLOSURE_TRAMPOLINE_SIZE (7 * 4)  // use closure trampoline template code, find the executable memory and patch it.  Code *code = Code::FinalizeCodeFromAddress(closure_trampoline_template, CLOSURE_TRAMPOLINE_SIZE);#else // use assembler and codegen modules instead of template_code#include "TrampolineBridge/ClosureTrampolineBridge/AssemblyClosureTrampoline.h"#define _ turbo_assembler_.  TurboAssembler turbo_assembler_(0);   PseudoLabel entry_label;  PseudoLabel forward_bridge_label;     _ Ldr(r12, &entry_label);  _ Ldr(pc, &forward_bridge_label);  _ PseudoBind(&entry_label);  _ EmitAddress((uint32_t)entry);  _ PseudoBind(&forward_bridge_label);  _ EmitAddress((uint32_t)get_closure_bridge());   AssemblyCodeChunk *code = nullptr;  code = AssemblyCodeBuilder::FinalizeFromTurboAssembler(&turbo_assembler_);   entry->address = (void *)code->raw_instruction_start();  entry->size = code->raw_instruction_size();  entry->carry_data = carry_data;  entry->carry_handler = carry_handler;   delete code;  return entry;#endif}
    void *get_closure_bridge() {   // if already initialized, just return.  if (closure_bridge)    return closure_bridge; // check if enable the inline-assembly closure_bridge_template#if ENABLE_CLOSURE_BRIDGE_TEMPLATE  extern void closure_bridge_tempate();  closure_bridge = closure_bridge_template;// otherwise, use the Assembler build the closure_bridge#else#define _ turbo_assembler_.  TurboAssembler turbo_assembler_(0);   _ sub(sp, sp, Operand(14 * 4));  _ str(lr, MemOperand(sp, 13 * 4));  _ str(r12, MemOperand(sp, 12 * 4));  _ str(r11, MemOperand(sp, 11 * 4));  _ str(r10, MemOperand(sp, 10 * 4));  _ str(r9, MemOperand(sp, 9 * 4));  _ str(r8, MemOperand(sp, 8 * 4));  _ str(r7, MemOperand(sp, 7 * 4));  _ str(r6, MemOperand(sp, 6 * 4));  _ str(r5, MemOperand(sp, 5 * 4));  _ str(r4, MemOperand(sp, 4 * 4));  _ str(r3, MemOperand(sp, 3 * 4));  _ str(r2, MemOperand(sp, 2 * 4));  _ str(r1, MemOperand(sp, 1 * 4));  _ str(r0, MemOperand(sp, 0 * 4));   // store sp  _ add(r0, sp, Operand(14 * 4));  _ sub(sp, sp, Operand(8));  _ str(r0, MemOperand(sp, 4));   // stack align  _ sub(sp, sp, Operand(8));   _ mov(r0, Operand(sp));  _ mov(r1, Operand(r12));   _ CallFunction(ExternalReference((void *)intercept_routing_common_bridge_handler));   // stack align  _ add(sp, sp, Operand(8));   // restore sp placeholder stack  _ add(sp, sp, Operand(8));   _ ldr(r0, MemOperand(sp, 4, PostIndex));  _ ldr(r1, MemOperand(sp, 4, PostIndex));  _ ldr(r2, MemOperand(sp, 4, PostIndex));  _ ldr(r3, MemOperand(sp, 4, PostIndex));  _ ldr(r4, MemOperand(sp, 4, PostIndex));  _ ldr(r5, MemOperand(sp, 4, PostIndex));  _ ldr(r6, MemOperand(sp, 4, PostIndex));  _ ldr(r7, MemOperand(sp, 4, PostIndex));  _ ldr(r8, MemOperand(sp, 4, PostIndex));  _ ldr(r9, MemOperand(sp, 4, PostIndex));  _ ldr(r10, MemOperand(sp, 4, PostIndex));  _ ldr(r11, MemOperand(sp, 4, PostIndex));  _ ldr(r12, MemOperand(sp, 4, PostIndex));  _ ldr(lr, MemOperand(sp, 4, PostIndex));   // auto switch A32 & T32 with `least significant bit`, refer `docs/A32_T32_states_switch.md`  _ mov(pc, Operand(r12));   AssemblyCodeChunk *code = AssemblyCodeBuilder::FinalizeFromTurboAssembler(&turbo_assembler_);  closure_bridge = (void *)code->raw_instruction_start();   DLOG(0, "[closure bridge] Build the closure bridge at %p", closure_bridge);#endif  return (void *)closure_bridge;}
    

    BuildDynamicBinaryInstrumentRouting()還調用了這個GenerateTrampolineBuffer(entry_->target_address, GetTrampolineTarget()); 這個方法生成了TrampolineBuffer,也就是用于patch原始指令的那些指令,流程圖的第二個小方塊。

    bool InterceptRouting::GenerateTrampolineBuffer(void *src, void *dst) {  CodeBufferBase *trampoline_buffer = NULL;  // if near branch trampoline plugin enabled  if (RoutingPluginManager::near_branch_trampoline) {    RoutingPluginInterface *plugin = NULL;    plugin = reinterpret_cast(RoutingPluginManager::near_branch_trampoline);    if (plugin->GenerateTrampolineBuffer(this, src, dst) == false) {      DLOG(0, "Failed enable near branch trampoline plugin");    }  }   if (this->GetTrampolineBuffer() == NULL) {    trampoline_buffer = GenerateNormalTrampolineBuffer((addr_t)src, (addr_t)dst);    this->SetTrampolineBuffer(trampoline_buffer);     DLOG(0, "[trampoline] Generate trampoline buffer %p -> %p", src, dst);  }  return true;}
    

    GenerateRelocatedCode(trampolinebuffer->getSize())

    bool InterceptRouting::GenerateRelocatedCode(int tramp_size) {  // generate original code  AssemblyCodeChunk *origin = NULL;  origin = AssemblyCodeBuilder::FinalizeFromAddress((addr_t)entry_->target_address, tramp_size);  origin_ = origin;   // generate the relocated code  AssemblyCodeChunk *relocated = NULL;  relocated = AssemblyCodeBuilder::FinalizeFromAddress(0, 0);  relocated_ = relocated;   void *relocate_buffer = NULL;  relocate_buffer = entry_->target_address;   GenRelocateCodeAndBranch(relocate_buffer, origin, relocated);  if (relocated->raw_instruction_start() == 0)    return false;   // set the relocated instruction address  entry_->relocated_origin_instructions = (void *)relocated->raw_instruction_start();   DLOG(0, "[insn relocate] origin %p - %d", origin->raw_instruction_start(), origin->raw_instruction_size());    DLOG(0, "[insn relocate] relocated %p - %d", relocated->raw_instruction_start(), relocated->raw_instruction_size());    // save original prologue  memcpy((void *)entry_->origin_chunk_.chunk_buffer, (void *)origin_->raw_instruction_start(),         origin_->raw_instruction_size());  entry_->origin_chunk_.chunk.re_init_region_range(origin_);  return true;}
    

    其中GenRelocateCodeAndBranch(relocate_buffer, origin, relocated);是重點,它會生成重定位代碼,放在relocated指針指向的地址空間中。

    void GenRelocateCodeAndBranch(void *buffer, AssemblyCodeChunk *origin, AssemblyCodeChunk *relocated) {  CodeBuffer *code_buffer = new CodeBuffer(64);   ThumbTurboAssembler thumb_turbo_assembler_(0, code_buffer);#define thumb_ thumb_turbo_assembler_.  TurboAssembler arm_turbo_assembler_(0, code_buffer);#define arm_ arm_turbo_assembler_.   Assembler *curr_assembler_ = NULL;   AssemblyCodeChunk origin_chunk;  origin_chunk.init_region_range(origin->raw_instruction_start(), origin->raw_instruction_size());   bool entry_is_thumb = origin->raw_instruction_start() % 2;  if (entry_is_thumb) {    origin->re_init_region_range(origin->raw_instruction_start() - THUMB_ADDRESS_FLAG, origin->raw_instruction_size());  }   LiteMutableArray relo_map(8); relocate_remain:  addr32_t execute_state_changed_pc = 0;   bool is_thumb = origin_chunk.raw_instruction_start() % 2;  if (is_thumb) {    curr_assembler_ = &thumb_turbo_assembler_;     buffer = (void *)((addr_t)buffer - THUMB_ADDRESS_FLAG);     addr32_t origin_code_start_aligned = origin_chunk.raw_instruction_start() - THUMB_ADDRESS_FLAG;    // remove thumb address flag    origin_chunk.re_init_region_range(origin_code_start_aligned, origin_chunk.raw_instruction_size());     gen_thumb_relocate_code(&relo_map, &thumb_turbo_assembler_, buffer, &origin_chunk, relocated,                            &execute_state_changed_pc);    if (thumb_turbo_assembler_.GetExecuteState() == ARMExecuteState) {      // relocate interrupt as execute state changed      if (execute_state_changed_pc < origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size()) {        // re-init the origin        int relocate_remain_size =            origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size() - execute_state_changed_pc;        // current execute state is ARMExecuteState, so not need `+ THUMB_ADDRESS_FLAG`        origin_chunk.re_init_region_range(execute_state_changed_pc, relocate_remain_size);         // update buffer        buffer = (void *)((addr_t)buffer + (execute_state_changed_pc - origin_code_start_aligned));         // add nop to align ARM        if (thumb_turbo_assembler_.pc_offset() % 4)          thumb_turbo_assembler_.t1_nop();        goto relocate_remain;      }    }  } else {    curr_assembler_ = &arm_turbo_assembler_;     gen_arm_relocate_code(&relo_map, &arm_turbo_assembler_, buffer, &origin_chunk, relocated,                          &execute_state_changed_pc);    if (arm_turbo_assembler_.GetExecuteState() == ThumbExecuteState) {      // relocate interrupt as execute state changed      if (execute_state_changed_pc < origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size()) {        // re-init the origin        int relocate_remain_size =            origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size() - execute_state_changed_pc;        // current execute state is ThumbExecuteState, add THUMB_ADDRESS_FLAG        origin_chunk.re_init_region_range(execute_state_changed_pc + THUMB_ADDRESS_FLAG, relocate_remain_size);         // update buffer        buffer = (void *)((addr_t)buffer + (execute_state_changed_pc - origin_chunk.raw_instruction_start()));        goto relocate_remain;      }    }  }   // TODO:  // if last instr is unlink branch, skip  //dkl 調回插樁點之后繼續執行  addr32_t rest_instr_addr = origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size();  if (curr_assembler_ == &thumb_turbo_assembler_) {    // Branch to the rest of instructions    thumb_ AlignThumbNop();    thumb_ t2_ldr(pc, MemOperand(pc, 0));    // Get the real branch address    thumb_ EmitAddress(rest_instr_addr + THUMB_ADDRESS_FLAG);  } else {    // Branch to the rest of instructions    CodeGen codegen(&arm_turbo_assembler_);    // Get the real branch address    codegen.LiteralLdrBranch(rest_instr_addr);  }   // Realize all the Pseudo-Label-Data  thumb_turbo_assembler_.RelocBind();   // Realize all the Pseudo-Label-Data  //dkl 在這里會修正之前lable link的ldr指令,  arm_turbo_assembler_.RelocBind();   // Generate executable code  {    // assembler without specific memory address    AssemblyCodeChunk *cchunk;    cchunk = MemoryArena::AllocateCodeChunk(code_buffer->getSize());    if (cchunk == nullptr)      return;     thumb_turbo_assembler_.SetRealizedAddress(cchunk->address);    arm_turbo_assembler_.SetRealizedAddress(cchunk->address);     // fixup the instr branch into trampoline(has been modified)     reloc_label_fixup(origin, &relo_map, &thumb_turbo_assembler_, &arm_turbo_assembler_);     AssemblyCodeChunk *code = NULL;    code = AssemblyCodeBuilder::FinalizeFromTurboAssembler(curr_assembler_);    relocated->re_init_region_range(code->raw_instruction_start(), code->raw_instruction_size());    delete code;  }   // thumb  if (entry_is_thumb) {    // add thumb address flag    relocated->re_init_region_range(relocated->raw_instruction_start() + THUMB_ADDRESS_FLAG,                                    relocated->raw_instruction_size());  }   // clean  {    thumb_turbo_assembler_.ClearCodeBuffer();    arm_turbo_assembler_.ClearCodeBuffer();     delete code_buffer;  }}
    

    感覺有點啰嗦了,重點說一下指令修復那塊吧,我們的例子中,要修復的指令是thumb1指令,最終會調到這里;我省略了其他指令的修復,只看cbz的,細節就不說了。

    大概思路就是,用ldr pc,[pc,xxx]去跳轉,但是第一次生成ldr指令的時候,xxx是沒用的,等到全部重定位指令都生成之后,這些ldr都會被修正,因為ldr跳轉的地址,都是儲存在所有指令之后的,從流程圖以及上面說的各個塊的匯編指令也可以看出,地址都是存在指令末尾。

    static void Thumb1RelocateSingleInstr(ThumbTurboAssembler *turbo_assembler, LiteMutableArray *thumb_labels,                                      int16_t instr, addr32_t from_pc, addr32_t to_pc,                                      addr32_t *execute_state_changed_pc_ptr) {  bool is_instr_relocated = false;   _ AlignThumbNop();   uint32_t val = 0, op = 0, rt = 0, rm = 0, rn = 0, rd = 0, shift = 0, cond = 0;  int32_t offset = 0;   int32_t op0 = 0, op1 = 0;  op0 = bits(instr, 10, 15);  // [F3.2.3 Special data instructions and branch and exchange]  if (op0 == 0b010001) {    op0 = bits(instr, 8, 9);    // [Add, subtract, compare, move (two high registers)]    if (op0 != 0b11) {      int rs = bits(instr, 3, 6);      // rs is PC register      if (rs == 15) {        val = from_pc;         uint16_t rewrite_inst = 0;        rewrite_inst = (instr & 0xff87) | LeftShift((VOLATILE_REGISTER.code()), 4, 3);         ThumbRelocLabelEntry *label = new ThumbRelocLabelEntry(val, false);        _ AppendRelocLabelEntry(label);         _ T2_Ldr(VOLATILE_REGISTER, label);        _ EmitInt16(rewrite_inst);         is_instr_relocated = true;      }    }     // compare branch (cbz, cbnz)  if ((instr & 0xf500) == 0xb100) {    uint16_t imm5 = bits(instr, 3, 7);    uint16_t i = bit(instr, 9);    uint32_t offset = (i << 6) | (imm5 << 1);    val = from_pc + offset;    rn = bits(instr, 0, 2);     //ThumbTurboAssembler 的data_labels_記錄所有的ThumbRelocLabelEntry,保存著要跳轉的地址,同時綁定了跳轉指令,等待后續把要跳轉的地址找到合適的內存儲存后,一起修復好//    即,修復前 ldr pc,xxx  修復后 ldr pc, [pc,offset],pc+offset就是存儲要跳轉地址的內存    ThumbRelocLabelEntry *label = new ThumbRelocLabelEntry(val + 1, true);    _ AppendRelocLabelEntry(label); //    imm5 = bits(0x4 >> 1, 1, 5);    //dkl 修復      imm5 = bits(0, 1, 5);    i = bit(0x4 >> 1, 6);     _ EmitInt16((instr & 0xfd07) | imm5 << 3 | i << 9);    _ t1_nop(); // manual align    _ t2_b(0);    //這個label持有要跳轉過去的地址,跳轉采用ldr pc 的方式,這個label同時又采用PseudoLabelInstruction結構體綁定到指令上,所以,已經具備了跳轉的全部信息了,    // 只差把跳轉地址存到合適的位置,然后修復ldr即可,修復工作好像是后面統一處理, thumb_turbo_assembler_.RelocBind();在這里修正    _ T2_Ldr(pc, label);     is_instr_relocated = true;  }     // if the instr do not needed relocate, just rewrite the origin  if (!is_instr_relocated) {#if 0        if (from_pc % Thumb2_INST_LEN)            _ t1_nop();#endif    _ EmitInt16(instr);  }}
    

    至此,代碼詳解也結束了,其實代碼修復主要是解析指令,這一塊稍微繁瑣一點。

    三、收獲

    最主要的是有了一次完整的源碼閱讀經驗,同時學到了一些工程技巧,比如c++的鏈表技巧。

    先定義一個通用鏈表頭:

    具體數據節點:

    這樣寫的好處是,遍歷鏈表時,直接采用NodHead指針去遍歷,然后需要讀取數據的時候,把NodHead轉為 EntryNod即可,因為結構體指針就是結構體首項地址,這NodHead和EntryNod值都是一樣的。這樣就可以寫出一個通用的鏈表模板,以后設么鏈表都可以用這套模板,把EntryNod改改就行。

    第二個收獲是,一些經典宏,比如##可以連接字符串,比如這個宏,可以通過類類型,類成員名稱,類成員地址取到類的this指針,參考 container_of宏(https://blog.csdn.net/lezardfu/article/details/44916167)。

    #define offsetof(t, d) __builtin_offsetof(t, d) #define container_of(ptr, type, member)                                                                                \  ({                                                                                                                   \    const __typeof(((type *)0)->member) *__mptr = (ptr);                                                               \    (type *)((char *)__mptr - offsetof(type, member));                                                                 \  })
    

    同時,Dobby有自己的內存分配模塊,他會把每次分配的相同屬性的內存記錄下來,等到需要申請內存的時候,先查看已經分配的內存是否有可用的,這樣就避免了頻繁的內存分配。

    四、使用Dobby過程中遇到的問題

    我總共遇到了三個問題,第一個問題是插樁的時候,正好那條指令正在執行,這樣就會出錯。修復辦法有兩個,第一個是在so加載的第一時間就完成插樁;第二個辦法是,通過異常使進程中斷,自定義信號處理函數,在異常處理過程中完成插樁。

    我采用的是第一個辦法,在so加載的第一時間就完成插樁.android的so加載最終都是通過linker的do_dlopen加載so,而do_dlopen會調用

    soinfo* si = find_library(ns, translated_name, flags, extinfo, caller);在這里可以拿到soinfo指針,有了soinfo就有了一切。

    所以只需要hook這個函數即可。實際上,在aosp10,這個函數是內聯的,所以我hook了find_library中的si->increment_ref_count();這個函數拿到的soinfo指針。

    第二個問題是,mproterct問題,因為需要patch 的原指令,但是原指令一般內存屬性是只讀的,需要使用mprotect去把屬性改成可寫,mprotect是按頁整數倍進行修改的,Dobby會把需要插樁的那條指令所在頁面權限修改,大多數情況下沒有問題。

    但是偶爾,被插樁的指令位于頁面底部,而patch又需要至少8字節,這就導致了會橫跨兩個頁面,而Dobby只是修改了一個頁面,需要注意一下。

    第三個問題,sigll,這個問題主要是指令修復的時候,沒有生成正確的匯編,跳錯地方了,這個需要針對性的根據源碼來修復了,這也是我去看Dobby源碼的原因。

    五、總結

    目前逆向工具中,ida是靜態分析的王者,frida(估計)是動態分析的王者,但是frida是函數級的工作,粒度不夠,需要Dobby配合使用,即可達到指令級的動態分析。

    調試器雖然也可以達到目的,但是調試器容易引入很多其他的問題,我開始就是使用gdb的,但是遇到了很多問題,比如gdb把進程暫停了,android一些廣播超時,就把我的進程殺了,或者不小心摸了一下屏幕,屏幕響應超時,又把我殺了,有時候gdb識別不出thumb指令,還得給它手動設置模式,體驗不好;不過gdb有個內存斷點,估計有時候不得不用一下。

    源碼匯編指令
    本作品采用《CC 協議》,轉載必須注明作者和本文鏈接
    棧與棧幀的調試
    2022-03-06 16:24:19
    再次執行pop EAX,ESP的值增加4個字節,變為0012FFC4。OD狀態變成最開始的狀態。
    Dobby一共兩個功能,其一是inlinehook,其二是指令插樁,兩者原理差不多,主要介紹指令插樁。所謂指令插樁,就是在任意一條指令,進行插樁,執行到這條指令的時候,會去執行我們定義的回調函數
    毫無疑問,app逆向的第一步是抓包。通過抓包可以獲取很多有用的線索,比如url,參數名等。再根據url,參數名等,可以逐步抽絲剝繭找到app收發數據的地方,然后就能找到最關鍵的簽名所在的位置。Cronet是從Chrome中抽出的給移動端使用的網絡組件,目前針對Cronet抓包,證書校驗的研究比較少,大多是奇技淫巧,沒能從根本上挖掘。如果沒有錯誤則檢查了證書的發布者是否為known_root,然后返回OK。換句話說,就是證書校驗的過程被bypass了。
    加密算法共4種,第二個任務注冊機,缺一個算法的解密算法,其他三個算法均已寫好C實現的解密算法。隨后在xxx函數通過frida分析找到XTEA加密,然后用frida在內存中找到并提取了密鑰。Dump && Recover IL2CPP雖然用修改后的frida去hook libsec2023.so仍然會被檢測,但是hook其他庫沒有出現問題。
    能運行的環境包括I/O,權限控制,系統調用,進程管理,內存管理等多項功能都可以歸結到上邊兩點中。需要注意的是,kernel 的crash 通常會引起重啟。注意大多數的現代操作系統只使用了 Ring 0 和 Ring 3。
    看雪論壇作者ID:xxxlion
    前言最近一段時間在研究Android加殼和脫殼技術,其中涉及到了一些hook技術,于是將自己學習的一些hook技術進行了一下梳理,以便后面回顧和大家學習。主要是進行文本替換、宏展開、刪除注釋這類簡單工作。所以動態鏈接是將鏈接過程推遲到了運行時才進行。
    源碼分析1、LLVM編譯器簡介LLVM 命名最早源自于底層虛擬機的縮寫,由于命名帶來的混亂,LLVM就是該項目的全稱。LLVM 核心庫提供了與編譯器相關的支持,可以作為多種語言編譯器的后臺來使用。自那時以來,已經成長為LLVM的主干項目,由不同的子項目組成,其中許多是正在生產中使用的各種 商業和開源的項目,以及被廣泛用于學術研究。
    近期突然發現64位APP分析需求激增,然而手邊好用的 inlineHook 只有 Frida 一款,所以打算稍微研究下 Frida 的思路,以作借鑒,然后寫一款滿足簡單自用需求的 AArch64 inlineHook 工具。Step1:首先我們簡單編寫一個 com.example.x64 應用作為目標 APP,且在 libx64.so 中放置一個 native 函數:?
    由于init函數是linker調用的,所以沒法做加密。所以我們合理懷疑初始化函數位置找錯了。其實之所以會搞錯,是因為錯誤的section header干擾了ida的解析。這通常是因為代碼中有花指令的緣故,我們要考慮去除花指令了。所以有理由懷疑,這里就是花指令,用來干擾ida解析的。執行完后再加上0x20,棧是平衡的。所以我們確信,中間的ret部分就是花指令
    VSole
    網絡安全專家
      亚洲 欧美 自拍 唯美 另类