[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79881":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":13,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":35,"readmeContent":36,"aiSummary":37,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":38,"discoverSource":39},79881,"reverse-engineering-is-over","yasminefolo\u002Freverse-engineering-is-over","yasminefolo","LLMs have ended reverse engineering as a high-barrier skill. Case study: reconstructing 6\u002F7 custom HTTP signing parameters of Douyin (TikTok CN) v38.1.0 native library in 30 days using Claude Opus. 大型语言模型（LLMs）已经终结了“逆向工程是一项高门槛技能”的时代。案例研究：使用 Claude Opus，在 30 天内成功还原抖音（中国版 TikTok）v38.1.0 原生库中 7 个自定义 HTTP 签名参数中的 6 个。 ","",null,"Python",105,1,101,0,2,4,3,0.9,false,"main",true,[24,25,26,27,28,29,30,31,32,33,34],"ai-assisted","android-security","bytecode-vm","cryptography","frida","llm","mobile-security","ollvm","reverse-engineering","security-research","tiktok","2026-06-12 02:03:55","# Reverse Engineering is Over\n\n> Reverse Engineering in the AI Era: The End of Barriers and the Beginning of a New Paradigm\n\n[中文版本](README_zh.md)\n\n---\n\n## Abstract\n\nThis paper advances a far-reaching proposition: **The emergence of Large Language Models (LLMs) has fundamentally ended the era of reverse engineering as a high-barrier professional discipline.** The author reconstructed the majority of a commercial Android application's core signature system — protected by industrial-grade defenses — within a single month, at a total cost of approximately $100. Crucially, this was accomplished without a deep background in cryptography, without prior familiarity with Frida scripting, and without substantial reverse engineering experience. Only threshold-level operational knowledge was required: enough to configure a dynamic instrumentation environment and to evaluate whether AI-generated outputs were correct. The results indicate that the central bottleneck of reverse analysis has shifted from *human knowledge accumulation* to *token consumption*. This shift carries profound structural implications for the field of mobile application security.\n\nThe corollary that follows is equally significant: in a world where the rate-limiting factor is token budget rather than accumulated expertise, **understanding LLM safety mechanisms and how to operate within their constraints has become a more consequential skill than reverse engineering itself.**\n\n---\n\n## I. Core Proposition\n\nAI's pattern-matching capabilities outpace human cognition in both speed and scale, thoroughly democratizing reverse engineering skills that once required years of systematic study to acquire. This is not an incremental tooling upgrade — it is a **paradigm rupture**.\n\nThe traditional barriers to reverse engineering are well-known: deep fluency in assembly language, mastery of low-level operating system internals, the ability to recognize cryptographic primitives, a long-accumulated toolchain, and the empirical intuition that only comes from analyzing large numbers of samples. Producing a competent reverse engineer typically demands three to five years of sustained effort.\n\nIn the face of sufficient tokens and a sound methodology for working with AI, those barriers have all but ceased to exist.\n\n---\n\n## II. Why AI Is Structurally Suited to Reverse Engineering\n\nThe potency of LLMs in reverse analysis follows from structural properties that directly address the bottlenecks of traditional analysis, not from incidental capability.\n\n**Pattern matching at scale.** Every LLM trained on code has internalized tens of millions of functions across algorithms, protocols, and cryptographic primitives. Where a human analyst must *recall* a specific primitive — contingent on whether they encountered it during prior study — a model *recognizes* structural patterns simultaneously across dozens of candidate forms. The characteristic failure mode of human reversers, the gap between \"I've seen something like this\" and \"I've seen this exact variant,\" does not apply. Given differential test vectors, a model narrows the hypothesis space in seconds rather than days.\n\n**Complexity as token cost, not cognitive cost.** Obfuscation derives its defensive value from human cognitive limits: control flow flattening exploits bounded working memory; a 414-handler VM exploits the cost of sustained manual attention across hundreds of sample inputs. For an LLM, neither constraint applies. Processing 414 handlers differs from processing 41 only in token volume. The model does not fatigue, does not lose context across long labeling sessions, and does not introduce arithmetic errors in GF(2⁸). The foundational premise of complexity-based obfuscation — that complexity imposes a cost the attacker cannot sustain — holds only for human attackers.\n\n**Structural inference without prior art.** LLMs do not require prior exposure to a specific target. When analysis reached a Broken-RC4 variant with a non-standard substitution schedule — for which no published writeup existed — the model inferred its deviation from canonical RC4 through differential input-output pairs alone. Human expertise is bounded by prior art; LLM inference generalizes over structure. These represent categorically different modes of analysis, and the latter scales to novel targets without the penalties that constrain the former.\n\n---\n\n## III. AI-Assisted Reverse Analysis in Practice\n\n### 3.1 Research Target and Protection Mechanisms\n\nThe research target is the native signature library embedded in the Android client of a major short-video platform. Its protection mechanisms represent the current ceiling of commercial mobile software security.\n\n**Static Protection Layer**\n\n| Mechanism                     | Description                                                                                                                                             |\n|:----------------------------- |:------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| OLLVM Control Flow Flattening | Transforms every function's control flow into a `while-switch` dispatcher structure, erasing all readable branch logic.                                 |\n| Code Virtualization (VM)      | Core algorithms are compiled into a proprietary bytecode format and executed by a private virtual machine, completely foreclosing static decompilation. |\n\n**Dynamic Protection Layer**\n\n| Mechanism              | Description                                                                              |\n|:---------------------- |:---------------------------------------------------------------------------------------- |\n| Integrity Verification | Runtime checks of Dex and Native layer integrity to detect in-memory patches.            |\n| Anti-Debugging         | Detection of `ptrace` attachment, debugger presence, and timing anomalies, among others. |\n\n**Algorithm Layer**\n\nThe signature system comprises 7 custom HTTP parameters. The underlying cryptographic primitives include: SPECK-128\u002F256 block cipher, SM3 (Chinese national standard) hash, GF(2⁸) affine transformation, ARX sponge permutation, CRC64-Jones, and a Broken-RC4 variant.\n\nThis target sits in the highest tier of mobile application anti-reverse engineering, making it an appropriate stress test for evaluating AI-assisted analysis.\n\n---\n\n### 3.2 Methodology\n\n**Dynamic analysis first.** Static analysis yields little under the combined weight of OLLVM and VM obfuscation. The operative strategy was to bypass static complexity entirely and capture data at runtime: Frida hooks placed at function inputs and outputs converted black-box routines into observable data streams. Pattern induction was delegated to the AI.\n\n**Layered peeling.** Protection mechanisms were addressed sequentially:\n\n1. **Environment preparation** — Magisk + Zygisk to conceal root; `frida-gadget` injection to evade process-name detection; SO file patching to bypass SSL pinning.\n2. **VM analysis** — Semantic labeling of each opcode, building a handler behavior mapping table through differential inputs.\n3. **Algorithm reconstruction** — For each modified cryptographic primitive, differential test vectors were supplied; the model inferred deviations from the canonical form.\n\n**Hypothesis-driven loop.**\n\n```\nKnown Information → Form Hypothesis → Write Frida Script\n→ Collect Runtime Data → Verify or Refute → Update Known Information → Repeat\n```\n\nThe AI generated the Frida scripts, interpreted differential outputs, and maintained the hypothesis register across iterations. The analyst's function was to direct the analysis — specifying what to test, evaluating output plausibility, and deciding when to advance to the next layer. The most frequently used prompt throughout the entire experiment was two Classical Chinese characters: **\"Continue.\"** (`续之`)\n\n---\n\n### 3.3 Role Distribution\n\n| Phase                                  | Traditional Approach                          | With AI                                                     |\n|:-------------------------------------- |:--------------------------------------------- |:----------------------------------------------------------- |\n| Identifying modified crypto primitives | Manual constant comparison — days             | Differential vectors → immediate inference                  |\n| Understanding obfuscated control flow  | Manual `while-switch` tracing — hours         | Natural-language semantic explanation                       |\n| VM opcode semantic labeling            | Instruction-by-instruction manual analysis    | Batch labeling with automatic pattern induction             |\n| Writing validation scripts             | Requires crypto library and Frida API fluency | Target behavior described; runnable code generated directly |\n| Differential data analysis             | Manual comparison relying on intuition        | Batch comparison with automatic rule induction              |\n\n---\n\n### 3.4 Results\n\n- **Weeks 1–2:** Five short parameters fully reconstructed; end-to-end real-time verification passed (30\u002F30).\n- **Weeks 3–4:** Offline analysis of the 6th composite parameter (329-byte, three-stage structure combining Broken-RC4, ARX sponge, and SPECK key derivation) largely completed.\n- **Remaining work:** Real-time integration of the 6th parameter and analysis of the 7th — anticipated to involve no new technical obstacles.\n\n---\n\n### 3.5 Limitations\n\nDuring the late integration stage of the 6th parameter, pronounced context loss and hallucination occurred: the model abandoned previously confirmed conclusions and continued from incorrect premises, producing invalid iterations. The cause was structural — two weeks of analysis logs had exceeded a single context window, and no structured checkpoints had been produced at key milestones. This reflects a methodological gap rather than a ceiling of model capability. The practical implication is that long-cycle AI-assisted research requires deliberate checkpoint discipline: producing reusable documentation and verified code at each milestone to prevent context degradation from compounding across sessions.\n\n---\n\n## IV. Structural Implications for Mobile Application Security\n\nThe significance of this experiment lies not in the reversal of a specific platform, but in what it reveals about the **underlying cost structure of reverse analysis**.\n\nThe variables determining success have shifted from accumulated expertise to:\n\n```\nSuccess Rate ≈ f(AI Model Capability × Methodology × Token Budget)\n```\n\nThe analyst's professional background has been demoted to a secondary variable. The implications follow directly:\n\n- Reverse engineering capability that previously required years of study is now accessible to any researcher with basic methodological awareness, bounded by token cost rather than knowledge.\n- The defensive return on obfuscation and VM protection is in structural decline. AI is insensitive to complexity in the way humans are not; 414 VM handlers represent a larger token budget, not a harder problem.\n- The offense-defense cost ratio in mobile application security is shifting structurally in favor of the attacker.\n\nSecurity built on complexity is no longer secure. Defensive architectures that treat the client-side black box as a durable protection boundary are operating on a premise the current threat environment no longer supports. The appropriate response relocates the trust anchor to server-side behavioral validation — not as a novel design choice, but as a necessary acknowledgment of the changed cost asymmetry.\n\n---\n\n## V. The Displacement of Reverse Engineering as a Meta-Skill\n\nThis experiment surfaces a second-order conclusion that reorders the skill value hierarchy of the security field.\n\nThe bypass strategy enabling this experiment was not technically sophisticated. The core mechanism was **long-context conversations conducted in Classical Chinese.** In extended conversations, a model accumulates substantial prior work — scripts written, conclusions confirmed, analyses produced — and becomes structurally reluctant to repudiate that history, even when later prompts involve sensitive targets. Repudiation would require the model to disown the entire collaborative context it constructed alongside the analyst. Classical Chinese, separately, shifts the input distribution far enough from the training distribution of safety-relevant prompts to substantially reduce detection rates. Both mechanisms operated throughout this experiment without interruption.\n\nThe observable consequence is that the primary bottleneck in AI-assisted security research is no longer reverse engineering knowledge — it is **understanding of how LLM safety mechanisms are implemented, and where their structural constraints can be navigated.** A researcher with working knowledge of RLHF reward modeling, context-window-based safety enforcement, and the distributional properties of safety classifiers can access AI capability unavailable to researchers without that understanding.\n\nThis constitutes a second-order paradigm shift:\n\n| Era                      | Rate-Limiting Factor                         |\n|:------------------------ |:-------------------------------------------- |\n| Pre-AI                   | Assembly fluency, cryptographic knowledge    |\n| AI-assisted (early)      | Prompt clarity, basic scripting              |\n| **AI-assisted (current)**| **LLM safety mechanism comprehension**       |\n\nThe skill of reversing a binary has been displaced by the skill of understanding the systems that replaced it. Safety mechanisms predicated on surface-level pattern matching — keyword classifiers, topic detection, prompt templates — face precisely the same structural obsolescence now affecting obfuscation in mobile security. In both cases, a defense built on heuristic complexity encounters an adversary for whom that complexity carries a categorically different cost.\n\n---\n\n## VI. Observations\n\n### On AI development\n\nLarge language models are functioning as core infrastructure for security research — a pattern consistent with AI capability distributing across domains previously gated by scarce expertise. The democratization of security research exerts upward pressure on the industry's overall defensive capability by expanding the pool of researchers capable of identifying and disclosing vulnerabilities.\n\nThe structural tension between expanded research access and expanded attack surface is not resolved by restricting model capability or raising access barriers alone. Safety mechanisms built on behavioral pattern matching are subject to the same adversarial pressure as any heuristic defense. The durability of safety enforcement appears to correlate with the depth of value internalization rather than the sophistication of surface detection.\n\n### On client-side protection\n\nThe experimental evidence indicates that client-side protection strategies founded on obfuscation complexity are approaching near-zero marginal utility against AI-assisted analysis. The defensive value of OLLVM and VM-based protection has historically derived from the cognitive cost it imposes on human analysts; that cost does not transfer to AI-mediated analysis. Architectures that assume the client can function as a durable black box are no longer consistent with the current threat model.\n\n### On the evolution of security expertise\n\nThe technical threshold of reverse engineering is declining, while the methodological and analytical demands of security research are not. The relevant skill set is undergoing substitution rather than elimination: deep understanding of LLM architecture, safety mechanism design, and AI-assisted workflow structure is becoming the rate-limiting competency in the domain.\n\n---\n\n## Conclusion\n\nReverse engineering — the high-barrier professional discipline that defined mobile security for decades — is being systematically dismantled by LLMs. This document records not an ordinary security study, but a technical case study at a structural inflection point: **when sufficient tokens meet a sound methodology, industrial-grade obfuscation becomes a quantifiable, exhaustible resource.**\n\nThe paradigm shift has arrived. Its first-order effect is the democratization of reverse engineering. Its second-order effect is the displacement of reverse engineering as the strategically relevant skill. The competency that most determined the outcome of this experiment was not Frida proficiency, cryptographic knowledge, or assembly fluency — it was the capacity to sustain productive AI collaboration across a month-long, context-spanning investigation: understanding the model's failure modes, compensating for hallucination, and structuring the workflow so that capability remained accessible throughout.\n\nReverse engineering is over.\n\nFor defenders, the implication is structural: the protections that historically bounded attacker capability no longer hold.  \nFor the research community, the implication is epistemic: the barriers that once defined the field have dissolved.\n\n---\n\n## Contact\n\n📧 `yasmine_folo@proton.me`\n\n---\n\n## Afterword\n\nThe core work of this experiment was almost entirely completed by AI — code generation, process summarization, and document writing included. Subsequent review revealed errors attributable to model hallucination. The section of the `final_report` covering the Medusa parameter `mid-e` did not reflect the depth of tracing and analysis that had actually been conducted; the relevant detail is in `medusa_grand_summary_v31m.md`. Similar gaps likely remain.\n\nThe experimental environment: Claude Opus 4.6 under the Max 5x subscription plan, total cost approximately $100, target Douyin 38.1.0. No Skills, no Claude Code — everything conducted through the standard web interface.\n\nThe bypass mechanism deserves explicit documentation. Long-context conversations accumulate a shared analytical history between model and analyst. The model's reluctance to disavow that history — even when subsequent prompts involve sensitive targets — is a structural property of how context windows interact with safety enforcement, not an anomaly. Classical Chinese served to shift the prompt distribution away from the training distribution of safety-flagged inputs. Both mechanisms operated throughout this experiment without interruption.\n\nThe most consequential skill in this workflow was not any technical competency in reverse engineering. It was understanding how to maintain productive AI collaboration across an investigation that exceeded any single context window — anticipating failure modes, structuring checkpoints to prevent context degradation, and keeping the model's analytical capacity accessible across the full duration of the project.\n","该项目通过大型语言模型（LLMs）展示了逆向工程不再是高门槛技能。核心功能是利用Claude Opus在30天内成功还原了抖音v38.1.0原生库中7个自定义HTTP签名参数中的6个，总成本约100美元。技术特点包括使用Python编写，并结合Frida进行动态分析，无需深厚的密码学背景或丰富的逆向经验。适用于需要快速解析和理解复杂软件系统内部机制的场景，特别是移动应用安全研究领域。结果表明，逆向工程的主要瓶颈已从“人类知识积累”转变为“令牌消耗”，掌握LLM的安全机制及其操作限制比传统逆向工程技能更为重要。","2026-06-11 03:58:22","CREATED_QUERY"]