phdays.com, phdays.ru Moscow, 2012 Day2, Track3, 16:00 Dmirty Evdokimov, Light and dark side of code instrumentation Static binary instrumentation tools dyninst eel atom pebil eresi tau vulcan bird slan (4514N) Debuggers sw/hw breakpoints (hw only 4 => mostly sw) scripting windbg + pykd ollydbg + ptyhon = immunity debuggers gdb+pythondfb python libs: buggery, idapythonm, immlib, lldb, pydbg, pydbgeng, pygdbm python-ptrace, vtrace, winappdbg deubber and application works at the same level eg: better to do this kind of instrumentation ... Dynamic binary instrumentation aka virtual code integration is a process to control and analysis of own code into a process already in the memory dba tools: small plugins (win=dll libs, *nix=so libs) dba tools: instrumentaiton routines executed just once, the place where we need to add our code at this stage the instrumentation introduces our code analysis routines this gets called when the above detected place is reached (can be called multiple times) compared to debuggers, there is no need to switch context Modes user mode vs kernel mode Mode of work start to finish attach Mode of exec there is a graph JIT vs PROBE interpretaiton modew valgrind, useful for heavy and slow analysis (memory leaks for huge processes like Oracle DB, etc,) probe-mode (MORE performance) instruction overwrite jit-mode (MORE functionality) binary -> disasm -> disasm instrumentation -> recompile -> original code never executed, just merely an instrumented equivalent DBI Frameworks DBI::Intro from zeronights conf Frameworks PIN (Intel) DynamoRIO (HP) DynInst (Maryland & Vinsconsin Universities) Valgrind (FOSS worldwide) Nirvana (MS) command line example given Levels of granularity instruction basic block trace/superblock function requires symbols, otherwise better to use instruction level section events binary image Self-modifying code and DBI in case the code is self-modifiable, in the cache of the DBI engine, the cache contains NOT the code which got executed, but the one replaced by the malicious code how to detect write-protected code pages checking store address inserting extra code Overhead O=X+Y X=N*Z Y=K+L O=tool overhead N=number of times function is called Y=analysis routines overhead TODO Rewriting instructions fixed length instruct (ARM) variable length instruct (x86, x64) graph with distribution by instruction length (TODO)