[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-77707":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":41,"readmeContent":42,"aiSummary":43,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":44,"discoverSource":45},77707,"mkPIVM","D7EAD\u002FmkPIVM","D7EAD","Generate polymorphic, position-independent virtual machines (PIVMs) from arbitrary x86\u002Fx64 shellcode.","",null,"C++",425,14,5,0,2,105,3.53,"MIT License",false,"main",true,[24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40],"cobalt-strike","edr","evasion","exploit","exploitation","havoc","malware","metasploit","mythic","obfuscation","red-team","research","reverse-engineering","shellcode","sliver","virtual-machine","virtualization","2026-06-12 02:03:44","\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fimages\u002Fmkpivm.png\" width=\"30%\" height=\"30%\">\n  \u003Cbr>\n  \u003Ca href=\".\u002Fresearch\u002Fmkpivm-research.pdf\">Read\u003C\u002Fa> the research paper.\n\u003C\u002Fp>\n\n**mkPIVM** is a polymorphic position-independent shellcode virtualizer for Windows x86 and x64.\n\nFeed it raw shellcode. It emits another raw blob: a small virtual machine that interprets a lifted, encrypted-at-rest version of your original instructions. The output is itself position-independent code and runs anywhere the original shellcode would, from a remote-thread loader to a code cave detour. Every per-seed knob varies independently: cipher family, register slot layout, opcode-to-handler permutation, dispatcher topology, junk-gadget pattern, IR obfuscation insertion points. Two builds from the same input share fewer than a hundred coincidental bytes out of tens of kilobytes.\n\nWhy: native shellcode is signature-trivial. Wrapping it in a per-instance VM with a per-instance cipher leaves nothing useful at rest, and lifting the instructions to bytecode puts another wall between disk bytes and any disassembler that knows what x86 looks like. As far as I can tell from a literature sweep, no public tool ships exactly this pipeline: raw PIC in, raw polymorphic VM PIC out. So, I mentioned that in the research paper it demanded. To be honest, if I am right about no one having done this (publicly) before, and I am pretty confident, I am surprised. _Nonetheless, enjoy._\n\n## Related Work\n> Coming soon...\n\n## Quick start\n\n```\nmkpivm.exe shellcode.bin --arch x64 -o out.bin\n```\n\nYour PIVM is hot and ready. That's the simplest path. Several other modes vary how aggressively the original instructions get virtualized, whether the output is a standalone blob or a patched PE, and whether the lift runs at all.\n\n# Showcase\n\nI got the receipts. You can see a video of mkPIVM in action below, fully virtualizing a Meterpreter stager (vanilla btw), injecting into explorer.exe, and us capturing a callback. Of course this is just an example, and mkPIVM can be applied to much more, assuming the instructions in the shellcode are supported. If they aren't, make an Issue, send me the shellcode, I got you.\n\nSee it [here](https:\u002F\u002Fgithub.com\u002FD7EAD\u002FmkPIVM\u002Fraw\u002Frefs\u002Fheads\u002Fmain\u002Fmedia\u002Fmkpivm-showcase.mp4). Hosted in .\u002Fmedia, can't embed sadly.\n\nHere is the VirusTotal report for that exact virtualized sample (as of 05\u002F21\u002F2026).\n\n\u003Cimg src=\".\u002Fimages\u002Fvt.png\">\n\n...and the packed version, not even virtualized, notably higher entropy.\n\n\u003Cimg src=\".\u002Fimages\u002Fpacked.png\">\n\nThere was careful attention paid to the entropy telemetry of the output of this tool, which results in shellcode of entropy less than typical Windows WinAPI DLLs (outside of packing mode), such as ntdll.dll or kernel32.dll. The entropy comparison is about...\n\n| File | Bytes | Entropy |\n|------|-------|---------|\n| `p_m64.bin` | 3,969 | **7.1181** |\n| `msvcrt.dll` | 699,888 | 6.5319 |\n| `wininet.dll` | 2,724,528 | 6.4934 |\n| `shell32.dll` | 7,839,992 | 6.3639 |\n| `kernel32.dll` | 836,232 | 6.3597 |\n| `crypt32.dll` | 1,538,632 | 6.3010 |\n| `rpcrt4.dll` | 1,162,672 | 6.2405 |\n| `ntdll.dll` | 2,522,104 | 6.1934 |\n| `v.bin` | 29,229 | **6.0442** |\n\n## Modes at a glance\n\n| Mode | Flags | What changes |\n|------|-------|--------------|\n| Default | none | Lift the whole input. Everything virtualized. |\n| Packer | `--pack` | Don't lift. Wrap input as encrypted data, decrypt at runtime, jump in. |\n| Hybrid | `--ranges A:B,...` | Lift only the chosen byte ranges. Rest stays native. |\n| Stacked | `--pack --ranges A:B` | Build the hybrid blob, then pack-wrap it. |\n| Detour | `--embed-into PE --at RVA` | Take a pre-built blob, embed into a PE, patch a jmp at the chosen RVA. |\n| Scan | `--scan` | Print eligible `--ranges` candidates from the input's CFG, then exit. |\n\nEvery mode honors `--seed`, `--arch`, `--input-format`, and `--format`. See the per-mode sections below for the build pipeline and the runtime flow.\n\n## Default virtualization\n\nThe lifter walks the whole CFG and lowers every instruction to a custom IR. The IR goes through two obfuscation passes, then through codecs that encode each insn into the per-seed bytecode shape. The block table, handler table, and data island are encrypted with the same per-byte stream cipher as the bytecode. At runtime the prologue decrypts those three regions in place and the dispatcher loop fetches bytecode bytes one at a time, decrypting and dispatching to a handler that does the work.\n\n### Build pipeline\n\nSteps the build performs to turn raw shellcode into the emitted virtualized blob, end to end.\n\n```mermaid\nflowchart TB\n    A[shellcode.bin] --> CFG[CFGBuilder: identify blocks via Zydis disasm + recursive descent]\n    SEED[seed u64] --> VMC[VMConfig: pick cipher kind, reg perm, opcode map, dispatcher topology]\n    CFG --> LIFT[LifterRegistry: lower each block to IR]\n    LIFT --> RBT[resolve_branch_targets: link BR_CC\u002FBR\u002FCALL_VM\u002FLOOP_DEC to target_block_id; synthesize JMP_NATIVE block for any out-of-range jcc]\n    RBT --> OBF1[obfuscate_ir_dead_inject: 20% per insn-gap, IMM Tmp2\u002FTmp3 random]\n    OBF1 --> OBF2[obfuscate_ir_opaque_predicates: 25% per block, split block with IMM Tmp3=0 + TEST + BR_CC NZ random_block, never taken]\n    OBF2 --> ENC[BytecodeBuilder: each codec emits its variant for this seed]\n    ENC --> CDATA[compact data island: bytes not covered by any CFG block]\n    CDATA --> PROMO[promote LEA-fixup target VAs back into data island if CFG put them in code]\n    ENC --> BTAB[build block table: va_off to bytecode_off pairs]\n    PROMO --> ECIPH[encrypt data island with cipher_init]\n    BTAB --> ECIPH2[encrypt block table with cipher_init]\n    ENC --> ECIPH3[encrypt bytecode per-block with cipher_init reset at each block start]\n    VMC --> STUB[VMCodeGen::emit_full: prologue, state init, dispatcher tail, handlers, sbox_inv, exit handler]\n    STUB --> HTAB[handler table: 256 entries, each a 32-bit offset from handler_base; encrypted at rest]\n    ECIPH --> ASM[finalize: stub + trampolines + sbox_inv + bytecode + data island + block table]\n    ECIPH2 --> ASM\n    ECIPH3 --> ASM\n    HTAB --> ASM\n    ASM --> OUT[out.bin]\n```\n\n### Runtime flow\n\nWhat the emitted blob does when the host loader calls into it at byte 0, from prologue through dispatcher fetches and the various termination paths.\n\n```mermaid\nflowchart TB\n    ENTRY[blob entry at byte 0] --> PROL[prologue: push NV regs in seed-shuffled order, pushfq, allocate frame, lea state_ptr]\n    PROL --> SI[emit_state_init body, gated by init_flag byte]\n    SI --> ZERO[zero VM regs via seed-permuted xor source]\n    ZERO --> CINIT[set cipher_state register, store cipher_init at cipher_extra+48]\n    CINIT --> SBOX[copy sbox_inv table from blob to cipher_extra+256]\n    SBOX --> NONCE[derive runtime nonce: rdtsc XOR cipher_init, store at cipher_extra+80]\n    NONCE --> DDI[decrypt data island in place via emit_fetch_byte_dec loop]\n    DDI --> DBT[decrypt block table in place]\n    DBT --> DHT[decrypt handler table in place]\n    DHT --> XHT[XOR each handler table entry with the runtime nonce so memory image is process-variant]\n    XHT --> SET_INIT[set init_flag = 1 to gate subsequent vm_entry invocations]\n    SET_INIT --> DISP[dispatcher tail]\n    DISP --> FB[fetch one byte via emit_fetch_byte_dec, decrypts using cipher_state]\n    FB --> HLU[\"load handler offset: mov scratch_b_32, [handler_base + op*4]\"]\n    HLU --> UXOR[\"xor scratch_b_32, [state_ptr + cipher_extra + runtime_nonce_off]\"]\n    UXOR --> SXD[movsxd to 64 bit, lea handler_addr = handler_base + offset, jmp handler]\n    SXD --> H[handler body: fetch operands via stream cipher, perform op, advance state]\n    H --> NEXT{terminator?}\n    NEXT -- no --> DISP\n    NEXT -- BR\u002FBR_CC --> CRESET[cipher_reset to cipher_init, advance ip to target block]\n    CRESET --> DISP\n    NEXT -- CALL_VM --> CRESET\n    NEXT -- JMP_NATIVE imm --> MARSH[\"marshal VM regs to host regs, rsp = VM_RSP, jmp [target_slot]\"]\n    NEXT -- CALL_NATIVE --> PUSHTR[push trampoline addr to VM_RSP, marshal, jmp target]\n    NEXT -- RET_VM --> POPSS[pop shadow stack, jmp to popped trampoline addr]\n    POPSS --> CRESET\n    NEXT -- exit_handler reached --> EPI[restore NV regs, ret to caller of vm_entry]\n```\n\nThe runtime nonce trick is the key anti-memory-scan move: the handler table sits in memory XOR'd with a value derived from `rdtsc XOR cipher_init`, so two processes loading the same blob hold byte-different handler tables. The dispatcher unmasks at lookup time with one extra XOR.\n\n## Packer mode: `--pack`\n\nThe opposite tradeoff. The lifter does not run. The original shellcode goes into the data island encrypted, and the IR is a single synthetic `JMP_NATIVE imm=0`. The prologue intentionally skips the data-island decrypt that default mode runs eagerly. Instead, the first time the lone JMP_NATIVE handler fires, it decrypts the data island in place, sets a marker byte, then transfers control to byte 0 of the now-plaintext shellcode. The shellcode runs natively from there.\n\nWorks on any shellcode, regardless of lifter coverage. Useful for stageless cobalt, exotic syscall-heavy payloads, anything too large or weird to virtualize. Loses the per-instruction virtualization defense but keeps the full per-seed VM polymorphism on the wrapper.\n\n### Build pipeline\n\nHow `--pack` wraps the original shellcode as an encrypted data island behind a one-instruction synthetic IR program.\n\n```mermaid\nflowchart TB\n    A[shellcode.bin] --> SKIP[skip CFG build, skip lift]\n    SEED[seed u64] --> VMC[VMConfig polymorphism axes as in default mode]\n    SKIP --> SYN[synthesize 1-insn IRProgram: one block with JMP_NATIVE Imm 0 Width Q]\n    SYN --> SETF[VMConfig::set_pack_mode true, set_data_island_size shellcode.size]\n    SETF --> ENC[BytecodeBuilder encodes the 1 insn into ~35 bytes]\n    ENC --> CDATA[data island = entire shellcode bytes verbatim]\n    CDATA --> ECIPH[encrypt shellcode bytes with cipher_init as the data island]\n    ENC --> ECIPH2[encrypt bytecode + block table + handler table]\n    VMC --> STUB[VMCodeGen::emit_full]\n    STUB --> FLAG[data_island_init_flag byte starts at 0 because pack_mode true]\n    STUB --> ASM[finalize layout]\n    ECIPH --> ASM\n    ECIPH2 --> ASM\n    FLAG --> ASM\n    ASM --> OUT[out.bin]\n```\n\n### Runtime flow\n\nWhat the pack wrapper does on first entry, including the gated lazy decrypt of the data island and the single synthetic JMP_NATIVE that hands control to the now-plaintext shellcode.\n\n```mermaid\nflowchart TB\n    ENTRY[blob entry] --> PROL[prologue + state init]\n    PROL --> SKIP_DI[skip data island decrypt because data_island_size gated on pack_mode false at build]\n    SKIP_DI --> DBT[decrypt block table in place]\n    DBT --> DHT[decrypt handler table in place + runtime nonce XOR]\n    DHT --> DISP[dispatcher fetches first opcode = JMP_NATIVE]\n    DISP --> NH[emit_native_handler entry]\n    NH --> GATE{data_island_init_flag == 0?}\n    GATE -- yes --> SAVE[save cipher_state and ip to kPreDecryptCsOff\u002FIpOff slots]\n    SAVE --> RESET[reset cipher_state to cipher_init]\n    RESET --> LOOP[decrypt data island in place byte by byte]\n    LOOP --> RESTORE[restore cipher_state and ip from saved slots]\n    RESTORE --> SETFLAG[set data_island_init_flag = 1]\n    GATE -- no --> CONT[skip the decrypt body]\n    SETFLAG --> CONT\n    CONT --> FETCH_TGT[fetch JMP_NATIVE operand: tag = 0, imm = 0]\n    FETCH_TGT --> COMPUTE[target = data_island_base + imm = start of decrypted shellcode]\n    COMPUTE --> MARSH[marshal all VM regs to host regs, rsp = VM_RSP seeded with exit_handler]\n    MARSH --> JMP[\"jmp [target_slot]\"]\n    JMP --> NATIVE[original shellcode runs natively]\n    NATIVE --> RET{shellcode does ret?}\n    RET -- yes --> POPSS[ret pops exit_handler from VM shadow stack]\n    POPSS --> EPI[exit_handler restores NV regs, rets to original caller]\n    RET -- ExitProcess --> DEAD[process terminates]\n```\n\nThe deferred data-island decrypt is pack-mode-only. In default and hybrid mode the lifted code does VM LOAD\u002FSTORE on data-island bytes mid-execution and needs them plaintext from the start, so the prologue handles it eagerly. In pack mode there is exactly one consumer of the data island, the native escape after the synthetic JMP_NATIVE, so the decrypt can wait until then.\n\n## Hybrid mode: `--ranges A:B,C:D`\n\nTargeted virtualization. Pick byte ranges in the input that should be lifted. Everything else stays as the original native bytes in the output. At each range start the lifter patches a 5-byte `jmp rel32` to a `vm_entry_K` stub appended after the native shellcode region. External native code may only re-enter a lifted range through the patched start byte. Mid-range bytes are int3-filled, so any native branch that targets a mid-range byte is rejected at scan time. Lifted code that exits a range to bytes outside it becomes `JMP_NATIVE` or `CALL_NATIVE`.\n\nUse `--scan` first to find eligible candidates. Scan classifies ranges with gaps, with no `ret`-terminated block, with body shorter than 5 bytes, or with external native branches into mid-range bytes as near-miss rather than eligible.\n\n### Build pipeline\n\nHow `--ranges` lifts only the chosen byte ranges and patches the native shellcode in place so control redirects into the appended VM entry stubs.\n\n```mermaid\nflowchart TB\n    A[shellcode.bin] --> RP[parse_ranges from --ranges flag]\n    RP --> CFG[CFGBuilder with set_lifted_ranges restriction]\n    CFG --> LIFT[lift_program: only blocks whose start_va is inside any range]\n    LIFT --> XCG[branches\u002Fcalls leaving the range become JMP_NATIVE\u002FCALL_NATIVE]\n    XCG --> OBF[IR obfuscation passes same as default]\n    OBF --> ENC[encode bytecode]\n    ENC --> BTAB[block table for RET_VM lookup]\n    SEED[seed u64] --> VMC[VMConfig]\n    VMC --> STUB[VMCodeGen::emit_range_mode: one vm_entry_K per range + dispatcher + handlers]\n    STUB --> NSEC[start with verbatim copy of the native shellcode bytes]\n    NSEC --> PATCH[at each range start, write 5-byte jmp rel32 to vm_entry_K]\n    PATCH --> INT3[fill remaining bytes of the displaced run with int3]\n    INT3 --> APPEND[append: VM stub + sbox_inv + bytecode + data island + block table]\n    APPEND --> OUT[out.bin]\n```\n\n### Runtime flow\n\nControl flow through a mixed native-and-VM blob, including the entry into VM dispatch when native code hits a patched range start and the native-escape paths used when lifted code branches back out.\n\n```mermaid\nflowchart TB\n    ENTRY[blob byte 0 = native shellcode prologue] --> NAT[native shellcode runs]\n    NAT --> HIT{control reaches a patched range start?}\n    HIT -- no --> NAT\n    HIT -- yes --> JMP[jmp rel32 to vm_entry_K]\n    JMP --> RPROL[range prologue: push NV regs, allocate frame, lea state_ptr]\n    RPROL --> MARSHIN[marshal host volatile regs to VM slots, NV regs preserved by Win64 ABI]\n    MARSHIN --> SI[state init gated by init_flag for first-time decrypts]\n    SI --> DISP[dispatcher loop]\n    DISP --> HND[handler]\n    HND --> NXT{terminator?}\n    NXT -- BR\u002FBR_CC inside range --> DISP\n    NXT -- range ret reached --> CLEANUP[restore NV regs, pop frame, ret pops native retaddr from host stack]\n    NXT -- JMP_NATIVE to byte outside range --> MIDEXIT[mid-exec cleanup: overwrite caller retaddr slot with target, jmp to exit_handler]\n    NXT -- CALL_NATIVE to API --> APITAIL[push trampoline addr to VM_RSP, jmp resolved API]\n    CLEANUP --> NAT\n    MIDEXIT --> EH[exit_handler unwinds NV pushes, ret lands at the target we wrote]\n    EH --> NAT\n    APITAIL --> APIRET[API rets to trampoline in stub which resumes VM dispatch]\n    APIRET --> DISP\n```\n\nCoroutine-style ranges, the ones with no `ret` that exit via a tail-jmp, are documented via `--coroutines` for `--scan` output. The flag does not change codegen at present. Range mode is best fit for self-contained leaf functions. Helper functions that depend on a specific caller-supplied register state cannot be lifted standalone; the API ends up called with garbage args.\n\n## Stacked mode: `--pack --ranges A:B,C:D`\n\nRun the hybrid build, then pack-wrap the result. The outer VM decrypts the inner hybrid blob in place and jumps to byte 0 of it. From there execution proceeds exactly like standalone hybrid mode, except the entire blob including the chosen-range bytecode is encrypted at rest. The two layers compose cleanly because the inner blob is a self-contained PIC region.\n\n### Build pipeline\n\nHow stacked mode recursively invokes the packager: an inner `--ranges` build first, then an outer `--pack` wrap of that build.\n\n```mermaid\nflowchart TB\n    A[shellcode.bin] --> INNER[invoke package_shellcode recursively in range-only mode]\n    INNER --> RBLOB[range-mode bytes in memory]\n    RBLOB --> WRAP[invoke package_shellcode again in pack-only mode with RBLOB as input]\n    WRAP --> OUT[out.bin: outer pack VM wrapping the inner range-mode blob]\n```\n\n### Runtime flow\n\nHow the outer pack VM hands control to the inner range-mode VM, each running on its own independent VMState frame.\n\n```mermaid\nflowchart TB\n    E[blob entry] --> OPROL[outer pack prologue + state init]\n    OPROL --> ODISP[outer dispatcher fetches synthetic JMP_NATIVE]\n    ODISP --> LAZY[lazy decrypt of inner range-mode blob via deferred data-island decrypt]\n    LAZY --> JN[outer JMP_NATIVE imm 0 sets target = inner blob byte 0]\n    JN --> INAT[inner native shellcode runs]\n    INAT --> IHIT{inner patched range start hit?}\n    IHIT -- yes --> IVM[inner range vm_entry_K]\n    IHIT -- no --> INAT\n    IVM --> IRDISP[inner dispatcher loop, separate VMState frame]\n    IRDISP --> INAT\n```\n\nThe inner VM and outer VM share no state. They are two independent VMs that happen to live in the same blob. The outer one's only job is to gate the inner one behind a decryption layer.\n\n## Detour mode: `--embed-into target.exe --at RVA`\n\nDifferent shape from the others. Input is raw shellcode, conventionally a VM blob already emitted by mkPIVM in another mode, though any PIC bytes will work. Output is a patched PE.\n\nThe tool parses `target.exe`, locates the chosen RVA in an executable section, disassembles enough instructions there to cover 5 bytes, refuses if the displaced run contains RIP-relative addressing or relative control flow that cannot survive being moved, then adds a fresh RWX section containing a wrapper plus the VM blob. The wrapper preserves caller state, transfers to the VM blob, restores state, executes the displaced original bytes, and jumps back to the byte after the patch. A 5-byte `jmp rel32` at the chosen RVA points at the wrapper.\n\nTwo sub-modes for how the wrapper transfers to the VM:\n\n### Threaded sub-mode, the default\n\nTwo diagrams follow. The first is the build-time PE patching pipeline that injects the wrapper section, fixes up references, and writes the 5-byte jmp at the chosen RVA. The second is the runtime control flow when the host process eventually reaches that RVA.\n\n```mermaid\nflowchart TB\n    BLOB[vm_blob.bin pre-built from any other mode] --> READ[read target.exe bytes]\n    TGT[target.exe] --> READ\n    READ --> PEHDR[parse DOS header, NT headers, sections, OptionalHeader, BASERELOC dir]\n    PEHDR --> ARCHCHK[arch from PE32\u002FPE32+ magic must match blob arch]\n    ARCHCHK --> LOC[locate RVA inside an executable section]\n    LOC --> DA[Zydis disassemble at RVA, accumulate insns until total length >= 5]\n    DA --> VALID{any displaced insn has RIP-relative mem operand or is a rel32 branch?}\n    VALID -- yes --> FAIL[error, pick a different RVA]\n    VALID -- no --> IATSCAN[scan IMAGE_DIRECTORY_ENTRY_IMPORT for kernel32!CreateThread]\n    IATSCAN --> IATFOUND{found?}\n    IATFOUND -- no --> FAIL2[error, suggest --detour-inline or different target]\n    IATFOUND -- yes --> EMITW[emit threaded wrapper bytes]\n    EMITW --> APPEND[concatenate wrapper + vm_blob into new section content]\n    APPEND --> ARCHBR{arch?}\n    ARCHBR -- x64 --> X64FIX[fix wrapper rel32s: lea r8 to vm_blob; call qword ptr rip+iat_disp32]\n    ARCHBR -- x86 --> X86FIX[fix wrapper abs32s: push vm_blob_va; call dword ptr iat_va; then append combined IMAGE_BASE_RELOCATION table with new HIGHLOW entries for those abs32s]\n    X64FIX --> RJMP[compute jmp_rel32 from wrapper-tail back to RVA + displaced_len]\n    X86FIX --> RJMP\n    RJMP --> SEC[allocate next aligned VA + raw offset, write IMAGE_SECTION_HEADER with RWX + CNT_CODE]\n    SEC --> BUMP[bump NumberOfSections, SizeOfImage, zero CheckSum]\n    BUMP --> X86RELOC{x86?}\n    X86RELOC -- yes --> RDIR[update DataDirectory BASERELOC to new combined table]\n    X86RELOC -- no --> WJ[write 5-byte jmp rel32 at RVA, NOP-fill remaining displaced bytes]\n    RDIR --> WJ\n    WJ --> WRITE[serialize patched bytes to output path]\n    WRITE --> OUT[patched.exe]\n```\n\n```mermaid\nflowchart TB\n    HOST[host main reaches patched RVA] --> RJMP[jmp rel32 to wrapper in new section]\n    RJMP --> PUSH[pushfq, push all volatile regs and rbp]\n    PUSH --> SAVE_RSP[mov rbp, rsp; and rsp, -16; sub rsp, 0x38 for shadow + spill alignment]\n    SAVE_RSP --> ARGS[xor ecx,ecx; xor edx,edx; lea r8, vm_blob_rel32; xor r9d,r9d; spill 0 at rsp+0x20, 0 at rsp+0x28]\n    ARGS --> CT[\"call qword ptr [rip + CreateThread_iat_disp32]\"]\n    CT --> WTHREAD[worker thread starts running vm_blob]\n    CT --> REST[mov rsp, rbp; pop volatiles; popfq]\n    REST --> DISP[execute the displaced original bytes verbatim]\n    DISP --> RJMP2[jmp rel32 back to RVA + N_displaced]\n    RJMP2 --> HOST_CONT[host main continues normally]\n    WTHREAD --> VMBODY[vm_blob runs concurrently: stager beacons, mbox shows dialog, whatever]\n```\n\nThe threaded sub-mode is the right shape for stagers and beacons. Host main never blocks. The worker thread inherits whatever lifecycle the payload needs. If the payload calls `ExitProcess`, the whole process dies, but for a non-terminating payload like every C2 beacon, the host runs forever in parallel with it.\n\n### Inline sub-mode: `--detour-inline`\n\nSame wrapper structure but the wrapper does a direct `call vm_blob` instead of `CreateThread`. Host main thread blocks until the VM returns. Useful when the target lacks `CreateThread` in its IAT, or as a fallback when the threaded base-reloc path cannot apply against a particular target.\n\n```mermaid\nflowchart TB\n    HOST[host main reaches patched RVA] --> RJMP[jmp rel32 to wrapper]\n    RJMP --> SAVE[save flags + volatile regs + align]\n    SAVE --> CALL[call rel32 vm_blob synchronously]\n    CALL --> BLOCK[host main thread blocks here for the duration]\n    BLOCK --> RET{vm_blob returns?}\n    RET -- via ret --> REST[restore rsp, pop volatiles, popfq]\n    RET -- via ExitProcess in payload --> DEAD[process terminates, no further code runs]\n    REST --> DISP[execute displaced original bytes]\n    DISP --> RJMP2[jmp rel32 back to RVA + N_displaced]\n    RJMP2 --> HOST_CONT[host continues]\n```\n\n## Scan mode: `--scan`\n\nNo output file. Builds the CFG from the input shellcode and prints `--ranges` candidates to stderr. Each candidate is classified as eligible, coroutine, near-miss, or internal, where internal means shadowed by a larger eligible candidate that already covers it. Use this to pick range arguments before invoking the tool again in hybrid mode.\n\n```mermaid\nflowchart TB\n    INP[shellcode.bin] --> CFGB[CFGBuilder + recursive descent over the whole input]\n    CFGB --> ITER[for each block that is a call_target or jmp_target]\n    ITER --> BFS[BFS the reachable subgraph from this entry]\n    BFS --> SPAN[compute min_va..max_va span]\n    SPAN --> CONT_CHK{every byte in span belongs to a visited block or to a known insn boundary?}\n    CONT_CHK -- yes --> RET_CHK{any visited block ends with ret?}\n    CONT_CHK -- no --> NM1[near-miss: fragmented or mid-fn data]\n    RET_CHK -- yes --> EXT_CHK{any non-visited block has a successor pointing inside the span?}\n    RET_CHK -- no --> CORO[coroutine candidate, no ret terminator]\n    EXT_CHK -- yes --> NM2[near-miss: native branches to mid-range bytes]\n    EXT_CHK -- no --> SIZE_CHK{span >= 5 bytes for the entry patch?}\n    SIZE_CHK -- yes --> ELIGIBLE[eligible: prints with --ranges hint]\n    SIZE_CHK -- no --> NM3[near-miss: body too short]\n    ELIGIBLE --> DEDUP[shadow dedupe: drop entries whose reachable set is fully contained in an earlier eligible's reachable set]\n    CORO --> DEDUP\n    NM1 --> DEDUP\n    NM2 --> DEDUP\n    NM3 --> DEDUP\n    DEDUP --> PRINT[stderr table sorted by category]\n    PRINT --> END[exit 0]\n```\n\n## Polymorphism axes per seed\n\nListed roughly in order of how much they affect the static signature.\n\n* Cipher family. One of ARX, LcgSub, SBoxAdd, FeistelByte. Picks both the bytecode encryption used at build time and the inline decrypt emitted in the dispatcher fetch path. Encrypt and decrypt match by construction.\n* Register slot layout. The VMState contains `reg_count` slots, sized 24 to 32 per seed. The 16 architectural GPRs plus 4 Tmp regs get a fresh permutation of slot indices each seed.\n* Opcode-to-handler mapping. Each codec family is assigned a random opcode byte each seed. The 256-entry handler table indexes by opcode; the encoded bytecode references the mapped opcodes.\n* Dispatcher topology. Threaded or central, picked per seed.\n* Prologue self-locate strategy. `call $+5; pop`, `lea reg, [rip+0]`, or a jmp\u002Fcall shuffle.\n* Handler register temporaries. BR_CC handler permutes its 6 temporary roles across the volatile pool. Store handler does the same.\n* Junk gadget density. 0 to 3, controls inter-handler garbage emission.\n* IR obfuscation pass decisions. Dead-IR injection and opaque predicates each use their own sub-RNG so their decisions are deterministic per seed without disturbing other axes.\n* Bytecode encryption initial state. Random 64-bit per seed.\n\n## Tested payloads\n\nThis tool was validated against several Cobalt Strike, MSF, Sliver, and numerous other shellcode samples.\n\n## Build\n\nRequires Visual Studio 2022, CMake 3.21 or newer, and vcpkg. The CMake project pulls Zydis and fmt through vcpkg's manifest mode.\n\n```\ncmake -S . -B build\ncmake --build build --config Release --target mkpivm\n```\n\n## Known limits\n\n* Range mode for cobalt stagers does not work as a standalone leaf. The stager's helper functions depend on caller-supplied register state that the runner does not provide. Full virtualization or `--pack` are the routes that actually beacon.\n* x86 threaded detour requires the target to either lack ASLR or accept added base relocation entries, which the tool emits when present. If the target's BASERELOC data dir is malformed or absent, the tool falls back to inline mode.\n* The lifter does not currently cover SSE\u002FAVX register moves, atomics, CMPXCHG, RDMSR, or privileged instructions. Pack mode is the workaround for shellcodes using those.\n* Authenticode signatures on `--embed-into` output are invalidated. The PE checksum is zeroed.\n* Output blobs require RWX at runtime because in-place decryption writes back to the blob's own pages. Most loaders that run shellcode allocate RWX anyway. There is no plan to move to RX-only because the trade buys RX at the cost of a PEB walk and a `VirtualAlloc` call, which is probably a worse signature than the RWX page it replaces.\n* Virtualizing stageless payloads via default mode is not supported yet, as they are horribly complicated to wrap in the VM; prefer `--pack + --ranges`.\n\n# Notes\n* This project is largely proof-of-concept research. If it is well-received, I will extend it as requested and welcome contribution. However, it seemed stable for the tested samples.\n* If your shellcode doesn't work, and you don't want to place it in an Issue, then unfortunately I can't help you. This was tested on Sliver, Cobalt Strike 4.12, MSF, Havoc, and a few other undisclosed samples.\n* In my testing, injection of the VM into live processes worked fine. However, when it comes to embedding into PEs, this was not tested with commercial software like MS Word, only synthetic tests, but it probably works. If not, will fix.\n* As of 5\u002F20\u002F2026, it appears the repo is being swarmed with bots. So, that's wonderful. Please ignore all blank Github accounts.\n\n# Contributing\nThis is some cool shit, get real. If you want to contribute new ideas, fix your own bugs, submit Issues for me to fix, so on and so forth, go for it.\n","mkPIVM 是一个用于生成多态、位置无关虚拟机（PIVM）的工具，可以从任意 x86\u002Fx64 shellcode 生成。其核心功能包括将原始 shellcode 转换为一个小型虚拟机，该虚拟机解释并执行经过加密和混淆处理后的指令。技术特点涵盖多种加密算法、寄存器布局随机化、操作码到处理器映射的置换、调度器拓扑结构变化以及插入干扰代码等，确保每次构建的结果具有高度的不可预测性。适用于需要绕过端点检测与响应系统（EDR）、反病毒软件或其他安全措施的安全研究、红队演练及恶意软件分析等场景。","2026-06-11 03:55:54","CREATED_QUERY"]