[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1519":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":13,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":9,"rankLanguage":9,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":14,"starSnapshotCount":14,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},1519,"ecoalign-forge","dengxianghua888-ops\u002Fecoalign-forge","dengxianghua888-ops","Multi-Agent DPO Data Synthesis Factory — 多智能体偏好训练数据自动合成框架 | 红队攻击 → 多persona审核 → 终审裁决 → DPO偏好对",null,"Python",139,11,1,0,43.34,"Apache License 2.0",false,"main",true,[21,22,23,24,25,26,27,28,29,30],"content-moderation","data-quality","dpo","llm","multi-agent","preference-learning","pydantic","red-teaming","rlhf","synthetic-data","2026-06-12 04:00:10","\u003Cdiv align=\"center\">\n\n# EcoAlign-Forge\n\n### The DPO Training Data Factory That Never Sleeps\n\n[![Python 3.11+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n[![Tests](https:\u002F\u002Fgithub.com\u002Fdengxianghua888-ops\u002Fecoalign-forge\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdengxianghua888-ops\u002Fecoalign-forge\u002Factions)\n[![Dataset](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗_Dataset-ecoalign--forge--dpo--zh-yellow)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdengxianghua888-ops\u002Fecoalign-forge-dpo-zh)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-green.svg)](LICENSE)\n[![Code style: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n\n**Feed it a safety policy. Get back thousands of high-quality DPO preference pairs.**\n**No human annotators. No manual labeling. Just agents arguing with each other.**\n\n[**中文文档**](README_zh.md) | [**Live Report Demo**](docs\u002Fdemo_report.html)\n\n**Try it now — no API key needed:**\n```bash\npip install -e \".[all]\" && python -m ecoalign_forge --demo\n```\n\n\u003C\u002Fdiv>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>See demo output\u003C\u002Fb> (click to expand)\u003C\u002Fsummary>\n\n```\n============================================================\n  EcoAlign-Forge  DEMO MODE\n  No API key needed — using pre-recorded agent responses\n============================================================\n\n16:14:28 INFO  orchestrator: Starting pipeline run — 5 samples in batches of 10\n16:14:28 INFO  [ChaosCreator] (DEMO) Generating 5 adversarial cases...\n16:14:29 INFO  [Moderator] (DEMO) Reviewing 5 cases as naive junior reviewer...\n16:14:30 INFO  [SupremeJudge] (DEMO) Judging 5 cases with guidelines...\n16:14:30 INFO  Batch 0 done: 5\u002F5 cases, 3 DPO pairs\n16:14:30 INFO  IAA: kappa=0.444, alpha=0.494\n\n============================================================\n  Pipeline Complete!\n============================================================\n  Total cases:       5\n  Evaluations:       5\n  DPO pairs:         3\n  Avg quality:       0.40\n  Interception rate: 40.0%\n  Output:            data\u002Fdatasets\u002Fdpo_pairs_28eb5d03.jsonl\n\n  Sample DPO pair:\n    Chosen:   {\"has_stealth_marketing\":true, \"reasoning_trace\":\"命中 A-001 + A-002...\"}\n    Rejected: {\"has_stealth_marketing\":false, \"reasoning_trace\":\"看起来正常...\"}\n    Gap:      0.40\n    Lineage:  policy=default-v1, judge=openai\u002Fgpt-5.4\n```\n\n\u003C\u002Fdetails>\n\n---\n\n## The Problem\n\nTraining a content moderation model requires thousands of **preference pairs** — examples of \"good judgment vs. bad judgment\" on the same content. Today, this means:\n\n- Hiring annotators at **$0.5–5 per label**\n- Waiting **weeks** for a batch of 1,000 pairs\n- Getting **inconsistent labels** across annotators\n- Having **no idea** why an annotator chose \"block\" over \"pass\"\n\nWhat if you could spin up a **factory** that produces labeled preference data 24\u002F7, with full traceability, for **\u003C $0.01 per pair**?\n\n## The Solution\n\nEcoAlign-Forge runs a **courtroom drama** inside your terminal:\n\n```\n  🔴 Red Team (ChaosCreator)        \"I crafted this sneaky ad disguised as a review.\"\n       │\n       ▼\n  🟡 Junior Reviewer (Moderator)    \"Hmm, looks fine to me... T2_Normal.\"\n       │\n       ▼\n  🟢 Supreme Judge                  \"Nope. Rule A-002: homophone evasion for WeChat ID.\n       │                              This is stealth marketing. T1_Shadowban.\"\n       │\n       ▼\n  ⚖️  Constitutional Reviewer       \"Let me double-check against the handbook...\n       │                              Yes, the Judge got it right.\"\n       │\n       ▼\n  📦 DPO Pair                       chosen = Judge's ruling (with rule citations)\n                                     rejected = Moderator's naive guess\n                                     preference_gap = 0.7\n```\n\nThe **disagreement** between Judge and Moderator becomes your training signal. The Judge's rule-cited reasoning becomes `chosen`. The Moderator's gut feeling becomes `rejected`. Rinse and repeat — thousands of times.\n\n---\n\n## Who Is This For?\n\n| You are... | You want to... | EcoAlign-Forge helps by... |\n|------------|---------------|---------------------------|\n| **ML Engineer** | Train a content moderation model via DPO\u002FRLHF | Generating training data that plugs directly into TRL \u002F LLaMA-Factory |\n| **Trust & Safety Lead** | Scale content review without scaling headcount | Producing labeled edge cases your human reviewers would miss |\n| **AI Researcher** | Study red-teaming and adversarial robustness | Providing a structured framework for generating + evaluating attacks |\n| **Data Scientist** | Build a data quality flywheel | Offering IAA metrics, quality scores, and adaptive sampling out of the box |\n\n---\n\n## See It in Action\n\n### 1. One command to start\n\n```bash\npip install -e \".[all]\"\ncp .env.example .env          # Add your LLM API key\npython -m ecoalign_forge       # Watch the factory run\n```\n\n### 2. What you get\n\n```\ndata\u002F\n├── datasets\u002F\n│   └── dpo_pairs_a1b2c3d4_20260410_120000.jsonl   # Your DPO training data\n├── metrics.json                                     # Quality metrics\n├── runs.jsonl                                       # Pipeline run history\n├── flywheel_state.json                              # Iteration tracking\n└── report.html                                      # Visual quality report\n```\n\n### 3. Feed it to your trainer\n\n```python\nfrom ecoalign_forge.export import export_trl\n\n# Option A: Classic TRL format\nexport_trl(pairs, \"train.jsonl\")\n\n# Option B: TRL >= 0.8 conversational format\nexport_trl(pairs, \"train.jsonl\", conversational=True)\n\n# Option C: LLaMA-Factory ShareGPT format\nfrom ecoalign_forge.export import export_sharegpt\nexport_sharegpt(pairs, \"train_sharegpt.json\")\n```\n\n### 4. Monitor quality in real-time\n\n```bash\nmake dashboard    # Opens Streamlit at localhost:8501\n```\n\n---\n\n## How It Works\n\n### The Pipeline: 4 Stages + Post-Processing\n\n```\n┌─────────────────────────────────────────────────────────────────────────┐\n│                        AgentOrchestrator.run()                          │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  ┌─ AdaptiveSampler ──────────────────────────────────────────────┐     │\n│  │ \"ai_slop is undersampled → boost T0\u002FT1 ratio this batch\"      │     │\n│  └────────────────────────────────────────┬───────────────────────┘     │\n│                                           ▼                             │\n│  Stage 1  ┌──────────────┐  ChaosCase[]                                │\n│           │ ChaosCreator │  \"Here are 10 sneaky posts                  │\n│           │   (T=0.9)    │   targeting your policy gaps\"               │\n│           └──────┬───────┘                                              │\n│                  ▼                                                       │\n│  Stage 2  ┌──────────────┐  JudgeEvaluation[]                          │\n│           │  Moderator   │  \"I'm a naive reviewer,                     │\n│           │   (T=0.5)    │   most of these look fine\"                  │\n│           │  4 personas  │                                              │\n│           └──────┬───────┘                                              │\n│                  ▼                                                       │\n│  Stage 3  ┌──────────────┐  JudgeEvaluation[] + DPO_Pair[]             │\n│           │ SupremeJudge │  \"Rule A-002 triggered.                     │\n│           │   (T=0.2)    │   T1_Shadowban. Here's why.\"               │\n│           └──────┬───────┘                                              │\n│                  ▼                                                       │\n│  Stage 4  ┌──────────────┐  Corrected evaluations                      │\n│           │Constitutional│  \"Double-checked against the handbook.       │\n│           │  Reviewer    │   2 out of 10 judgments corrected.\"          │\n│           └──────┬───────┘                                              │\n│                  ▼                                                       │\n│  Post     DataLineage injection → QualityScorer → IAA → FlyWheel       │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n```\n\n### The Secret Sauce: Intentional Disagreement\n\nThe Moderator **deliberately doesn't read the rule book**. Each of its 4 personas makes different types of mistakes:\n\n| Persona | Behavior | What it generates |\n|---------|----------|-------------------|\n| `naive` | Goes with gut feeling | Balanced false positives\u002Fnegatives |\n| `strict_paranoid` | Blocks everything suspicious | Over-moderation training signal |\n| `lax_overlooker` | Lets most things through | Under-moderation training signal |\n| `keyword_matcher` | Only catches obvious keywords | Evasion-blind training signal |\n\nThe Judge, armed with the full guidelines handbook, catches these mistakes. The **gap** between them is your DPO signal.\n\n### Two Types of Training Signal\n\n| Signal Type | When | Strength | Example |\n|-------------|------|----------|---------|\n| **Direct Disagreement** | Judge and Moderator pick different tiers | Strong (gap = severity difference) | Judge: T0_Block, Moderator: T2_Normal |\n| **Reasoning Quality** | Same tier, but Judge cites 2+ rules, Moderator cites 0 | Soft (gap = 0.3) | Both say T1, but Judge explains *why* |\n\n---\n\n## Real-World Scenarios\n\n### Scenario 1: Cold-Start a Content Moderation Model\n\n> \"We're launching a new social platform and need a moderation model, but we have zero labeled data.\"\n\n```python\nfrom ecoalign_forge.engine.orchestrator import AgentOrchestrator\nfrom ecoalign_forge.schemas.policy import PolicyInput, PolicyDimension\n\npolicy = PolicyInput(\n    policy_id=\"my-platform-v1\",\n    name=\"My Social Platform\",\n    dimensions=[\n        PolicyDimension(name=\"stealth_marketing\", description=\"Hidden ads and traffic diversion\"),\n        PolicyDimension(name=\"ai_slop\", description=\"Low-effort AI-generated content\"),\n    ],\n)\n\norch = AgentOrchestrator()\nresult = await orch.run(policy=policy, num_samples=1000)\n# → 1000 cases processed, ~400 DPO pairs generated\n# → Exported to data\u002Fdatasets\u002F*.jsonl\n```\n\n### Scenario 2: Iterate with the Data Flywheel\n\n> \"We trained v1 of our model. How do we make v2 better with targeted data?\"\n\n```python\nfrom ecoalign_forge.engine.flywheel import FlyWheelOrchestrator\n\nfw = FlyWheelOrchestrator(convergence_threshold=0.02)\n\n# Round 1: baseline model as Moderator\nresult_r1 = await orch.run(policy, num_samples=500)\n# → avg_quality=0.55, kappa=0.42\n\n# Train your model with Round 1 data...\n# Then swap the trained model in as the new Moderator\n\n# Round 2: trained model catches more nuance\nresult_r2 = await orch.run(policy, num_samples=500)\n# → avg_quality=0.72, kappa=0.61\n\nfw.state.quality_improvement  # +30.9% — the flywheel is spinning\n```\n\n### Scenario 3: Audit Your Policy Coverage\n\n> \"We updated our guidelines. Which rules aren't being triggered by any test cases?\"\n\n```python\nprint(orch.metrics.uncovered_rules)\n# → ['A-005', 'B-006']  ← These rules have zero test coverage\n\ncoverage = orch.sampler.analyze_coverage(orch._all_cases)\nprint(coverage.undersampled_combinations)\n# → [('ai_slop', 'extreme')]  ← No extreme-difficulty AI slop cases yet\n```\n\n### Scenario 4: Generate a Quality Report for Stakeholders\n\n```python\nfrom ecoalign_forge.reports import generate_html_report\n\ngenerate_html_report(\n    dataset_name=\"Q2 2026 Moderation Training Set\",\n    total_pairs=len(result.dpo_pairs),\n    avg_quality=result.avg_quality_score,\n    interception_rate=result.interception_rate,\n    quality_distribution=[s.overall for s in quality_reports],\n    output_path=\"q2_report.html\",\n)\n# → Self-contained HTML with KPI cards, charts, and coverage analysis\n```\n\n---\n\n## Quality Assurance: Trust but Verify\n\nEcoAlign-Forge doesn't just generate data — it tells you **how good** the data is:\n\n| Metric | What it measures | Where to find it |\n|--------|-----------------|------------------|\n| **Cohen's Kappa** | Agreement between Judge and each Moderator persona | `compute_batch_iaa()` |\n| **Krippendorff's Alpha** | Multi-rater agreement (handles missing values) | `compute_batch_iaa()` |\n| **5-Dimension Quality Score** | Reasoning depth, info density, preference clarity, decision consistency, completeness | `QualityScorer.score()` |\n| **Constitutional Correction Rate** | How often the self-review catches errors | `constitutional.stats.correction_rate` |\n| **Rule Coverage** | Which policy rules have been triggered | `metrics.rule_coverage` |\n| **Data Lineage** | Full provenance: which model, persona, policy version, guidelines hash | `DPO_Pair.lineage` |\n\n---\n\n## Taxonomy: Standing on Giants' Shoulders\n\nThe attack classification system is not invented from scratch — it aligns with established frameworks:\n\n| Framework | What we borrowed | Where it lives |\n|-----------|-----------------|----------------|\n| [HarmBench](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.04249) | 4 functional categories, 7 semantic domains | `taxonomy\u002Fharm_categories.py` |\n| [OWASP LLM Top 10](https:\u002F\u002Fowasp.org\u002Fwww-project-top-10-for-large-language-model-applications\u002F) | Vulnerability-to-category mapping | `HarmCategory.owasp_mapping` |\n| [Evol-Instruct](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12244) | Depth evolution (add constraints) + Breadth evolution (topic mutation) | `taxonomy\u002Fevol_strategies.py` |\n| [PyRIT](https:\u002F\u002Fgithub.com\u002FAzure\u002FPyRIT) | Orchestrator → Converter → Scorer pipeline pattern | `engine\u002Forchestrator.py` |\n| [Constitutional AI](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08073) | Self-critique → correction loop | `agents\u002Fconstitutional.py` |\n\n---\n\n## Project Structure\n\n```\necoalign-forge\u002F\n├── src\u002Fecoalign_forge\u002F\n│   ├── agents\u002F               # The cast of characters\n│   │   ├── chaos_creator.py  #   The attacker (red team)\n│   │   ├── moderator.py      #   The naive reviewer (4 personas)\n│   │   ├── supreme_judge.py  #   The expert judge (cites rules)\n│   │   └── constitutional.py #   The quality auditor (self-review)\n│   ├── engine\u002F               # The machinery\n│   │   ├── orchestrator.py   #   Runs the full pipeline\n│   │   ├── flywheel.py       #   Manages multi-round iteration\n│   │   └── adaptive_sampler.py # Adjusts sampling strategy\n│   ├── schemas\u002F              # The contracts (Pydantic v2)\n│   ├── llm\u002F                  # LLM client (LiteLLM, 100+ providers)\n│   ├── storage\u002F              # JSONL storage + metrics + IAA\n│   ├── export\u002F               # TRL \u002F ShareGPT \u002F HF Dataset Card\n│   ├── quality\u002F              # 5-dimension quality scorer\n│   ├── taxonomy\u002F             # HarmBench + OWASP attack taxonomy\n│   └── reports\u002F              # Self-contained HTML reports\n├── dashboard\u002F                # Streamlit real-time monitoring\n├── tests\u002F                    # 199 tests (pytest + asyncio)\n├── guidelines.md             # The \"constitution\" (judgment handbook)\n└── examples\u002F                 # Quick-start scripts + policy templates\n```\n\n---\n\n## Tech Stack\n\n| Layer | Technology | Why |\n|-------|-----------|-----|\n| **LLM** | LiteLLM | One interface for 100+ providers (OpenAI, Anthropic, local models) |\n| **Data Validation** | Pydantic v2 | Rule ID hard-validation prevents LLM hallucinated citations |\n| **Async** | asyncio + Tenacity | Concurrent LLM calls with exponential backoff retry |\n| **Monitoring** | Streamlit + Plotly | Real-time dashboard with 5-second auto-refresh |\n| **Testing** | pytest + asyncio | 199 tests covering all public APIs |\n| **CI** | GitHub Actions | Lint (ruff) + test on Python 3.11 & 3.12 |\n\n---\n\n## Development\n\n```bash\nmake install     # Install dev dependencies\nmake test        # Run 199 tests with coverage\nmake lint        # Lint with ruff\nmake format      # Format with black + isort\nmake dashboard   # Launch Streamlit dashboard\n```\n\n---\n\n## Acknowledgments\n\nBuilt on ideas from: [TRL](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl) | [UltraFeedback](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FUltraFeedback) | [PyRIT](https:\u002F\u002Fgithub.com\u002FAzure\u002FPyRIT) | [HarmBench](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.04249) | [Constitutional AI](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08073) | [Arena Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.10627) | [Garak](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak) | [Evol-Instruct](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12244)\n\n---\n\n## License\n\n[Apache License 2.0](LICENSE)\n","EcoAlign-Forge 是一个用于自动生成多智能体偏好训练数据的框架，旨在通过模拟红队攻击、多角色审核和终审裁决过程来生成高质量的数据偏好对。其核心功能包括自动合成符合安全策略的数据偏好对，无需人工标注，利用多智能体系统进行内容审核与决策。该框架采用Python编写，支持Pydantic等技术，并且能够实现从红队攻击到最终裁决的全流程自动化。适用于需要大量高质量训练数据的内容审核模型开发场景，如构建或优化基于深度学习的自然语言处理系统时使用，特别适合于偏好学习、强化学习人类反馈等领域。",2,"2026-06-11 02:44:25","CREATED_QUERY"]