[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80923":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":13,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":16,"rankGlobal":10,"rankLanguage":10,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":37,"readmeContent":38,"aiSummary":39,"trendingCount":15,"starSnapshotCount":15,"syncStatus":13,"lastSyncTime":40,"discoverSource":41},80923,"PersonaLingo","orzcls\u002FPersonaLingo","orzcls","AI-powered personalized IELTS corpus distillation — install-only skill (zero-backend Agent loop) + runnable backend with QMD-RAG. 7-stage pipeline producing schema-validated corpus.json & static profile site.","",null,"Python",34,2,32,0,1.43,"MIT License",false,"main",true,[22,23,24,25,26,27,28,29,30,31,32,33,34,35,36],"agent-skill","anthropic","corpus","distillation","ielts","ielts-learning","install-only","llm","mbti","openai","persona","personalization","qmd-rag","rag","skills","2026-06-12 02:04:08","# PersonaLingo v2\n\n> AI-Powered Personalized IELTS Speaking Corpus Generator with RAG & Memory System\n\n## ✨ Features\n\n- **Smart Corpus Generation** — 5-step LLM-driven pipeline (Persona → Anchors → Bridges → Vocabulary → Patterns)\n- **Dual LLM Support** — OpenAI & Anthropic with seamless switching\n- **QMD RAG Engine** — Query-Match-Decide 3-layer retrieval (Query Expansion + BM25\u002FTF-IDF dual-channel + LLM Reranking)\n- **Dynamic Topic Bank** — P1\u002FP2 IELTS topics with season\u002Fcategory filtering\n- **NotebookLM-style Chat** — Conversational corpus maintenance with style learning\n- **Material Upload** — Parse .txt\u002F.md\u002F.docx\u002F.pdf to enrich your corpus\n- **Smart Notes** — Auto-generated learning notes & Mermaid mind maps\n- **Band Score Strategies** — Differentiated output for 6.0\u002F6.5\u002F7.0\u002F7.5+ targets\n- **Skill Export** — Export as Markdown or JSON for AI agent integration\n\n## 🏗️ Architecture\n\n```mermaid\ngraph TB\n    subgraph Frontend[\"Frontend (Vue 3 + Vite + Tailwind)\"]\n        UI[User Interface]\n        Router[Vue Router]\n        Store[Pinia Store]\n        I18N[i18n Internationalization]\n    end\n\n    subgraph Backend[\"Backend (FastAPI + Python)\"]\n        API[REST API Layer]\n\n        subgraph CoreServices[\"Core Services\"]\n            CG[\"Corpus Generator\u003Cbr\u002F>5-step Pipeline\"]\n            CE[\"Conversation Engine\u003Cbr\u002F>NotebookLM-style\"]\n            TM[\"Topic Manager\u003Cbr\u002F>P1+P2+P3\"]\n            NE[\"Note Generator\u003Cbr\u002F>Mermaid Mindmap\"]\n            SE[\"Skill Exporter\u003Cbr\u002F>MD+JSON\"]\n        end\n\n        subgraph RAGLayer[\"QMD RAG Engine\"]\n            QE[\"Query Expansion\u003Cbr\u002F>Semantic Expansion\"]\n            MS[\"Multi-signal Match\u003Cbr\u002F>BM25 + TF-IDF + RRF\"]\n            RR[\"Reranker\u003Cbr\u002F>LLM Reranking\"]\n        end\n\n        subgraph LLMLayer[\"LLM Adapter Layer\"]\n            OA[OpenAI Compatible]\n            AN[Anthropic Claude]\n        end\n\n        TK[\"Token Manager\u003Cbr\u002F>Threshold Alert + Auto Compression\"]\n        SL[\"Style Learner\u003Cbr\u002F>Style Learning\"]\n    end\n\n    subgraph Data[\"Data Layer\"]\n        DB[(SQLite)]\n        Topics[\"Topic Bank JSON\u003Cbr\u002F>86 topics\"]\n        Vocab[\"Idiomatic Vocab\u003Cbr\u002F>132 words\"]\n        QTypes[\"Question Types\u003Cbr\u002F>7 categories\"]\n    end\n\n    UI --> Router --> API\n    Store --> API\n    API --> CoreServices\n    CG --> RAGLayer\n    CE --> RAGLayer\n    CE --> TK\n    CE --> SL\n    RAGLayer --> LLMLayer\n    CG --> LLMLayer\n    CoreServices --> DB\n    TM --> Topics\n    CG --> Vocab\n    CG --> QTypes\n```\n\n## 🔍 QMD RAG Engine\n\nPersonaLingo features a custom-built **QMD (Query-Match-Decide)** 3-layer retrieval-augmented architecture, achieving high-quality corpus retrieval with zero external model dependencies:\n\n### Three-Layer Architecture\n\n| Layer | Function | Implementation |\n|-------|----------|----------------|\n| **Q - Query Expansion** | Query expansion | LLM semantic expansion + synonym rule fallback |\n| **M - Multi-signal Match** | Multi-signal matching | BM25 (term frequency) + TF-IDF (semantic) + RRF fusion ranking |\n| **D - Decide\u002FRerank** | Intelligent reranking | LLM relevance scoring + rule fallback |\n\n### Workflow\n\n```\nUser Query → [Q Layer] Expand into multiple search terms\n           → [M Layer] BM25 + TF-IDF dual-channel retrieval → RRF fusion\n           → [D Layer] LLM reranking → Top-K results\n```\n\n### Design Philosophy\n\n- **Lightweight**: No dependency on embedding models or vector databases — pure algorithms + LLM API\n- **Graceful Degradation**: Each layer has fallback mechanisms; degrades to pure rule-based retrieval without LLM\n- **Fast Mode**: Provides `search_fast()` for conversation scenarios, skipping Q\u002FD layers for rapid retrieval\n\n## 🛠️ Tech Stack\n\n| Layer | Technology |\n|-------|-----------|\n| Frontend | Vue 3, Vite, Tailwind CSS, Pinia, Mermaid.js |\n| Backend | FastAPI, Python 3.11+, aiosqlite |\n| LLM | OpenAI API, Anthropic Claude API |\n| Database | SQLite (async) |\n| Search | QMD RAG (BM25 + TF-IDF + RRF fusion, pure Python) |\n| File Parsing | python-docx, PyPDF2 |\n| Deployment | Docker, nginx |\n\n## 📸 Screenshots\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fscreenshots\u002Fv3_06_questionnaire_entry.png\" width=\"260\"\u002F>\n  \u003Cimg src=\"docs\u002Fscreenshots\u002Fv3_10_chat_extractor.png\" width=\"260\"\u002F>\n  \u003Cimg src=\"docs\u002Fscreenshots\u002Fv3_11_topics_filter_P1_active.png\" width=\"260\"\u002F>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Csub>User Profiling Questionnaire &nbsp;|&nbsp; AI Conversation Engine &nbsp;|&nbsp; Topic Browser\u003C\u002Fsub>\n\u003C\u002Fp>\n\n## 🤖 Skills Integration\n\nPersonaLingo ships two independent skill delivery modes. Pick the one that matches your agent setup.\n\n### Mode 1 · Install-only (recommended, zero backend)\n\nOne-line install to any skill-compatible agent (Claude \u002F Qoder \u002F Cursor \u002F Cline \u002F ...). No project code, no server. The agent itself drives **questionnaire → guided conversation → 7-step distillation → personal profile → static corpus site** using only the shipped prompt assets.\n\n```bash\nnpx skills add orzcls\u002FPersonaLingo\n```\n\nWhat actually lands in `.agents\u002Fskills\u002Fpersonalingo\u002F` (the CLI auto-discovers `skills\u002Fpersonalingo\u002F` and copies only that subdirectory — backend\u002Ffrontend\u002Fdocs are never pulled):\n\n```\nSKILL.md\nskill.json\nskill-assets\u002F\n  ├── questionnaire.json\n  ├── conversation-guide.md\n  ├── distill-protocol.md\n  ├── corpus-schema.json        # mirrors backend models\u002Fcorpus.py field shapes\n  ├── band-strategies.json      # 1:1 copy of backend\u002Fapp\u002Fdata\u002Fband_strategies.json\n  ├── fallback-topics.json      # executable Stage 5 fallback\n  ├── fallback-vocabulary.json  # executable Stage 6 fallback (4 bands × 10 items)\n  ├── fallback-patterns.json    # executable Stage 7 fallback (8 MBTI-agnostic patterns)\n  ├── profile-template.md\n  └── site-template.html\n```\n\n> **Backend architecture equivalence**: install-only `corpus.json` is consumable by the same downstream logic as runnable-export output. `learner_profile` \u002F `capability_framework` \u002F `anchors` \u002F `bridges` \u002F `vocabulary` \u002F `patterns` \u002F `practices` \u002F `band_strategy` all mirror backend [`models\u002Fcorpus.py`](backend\u002Fapp\u002Fmodels\u002Fcorpus.py) & [`services\u002F{learner_researcher,capability_framework,corpus_generator}.py`](backend\u002Fapp\u002Fservices) exact field shapes. Stage 3–7 prompts **must** inject `band_strategy` from `band-strategies.json`.\n\nPer-learner outputs are written to the agent's working directory:\n\n```\ncorpus\u002F\u003Ccorpus_id>\u002F\n  ├── answers.json\n  ├── dialogue.md\n  ├── corpus.json      # validated against skill-assets\u002Fcorpus-schema.json\n  ├── profile.md\n  └── site\u002Findex.html  # open directly in browser\n```\n\nFull runtime spec: [SKILL.md](SKILL.md).\n\n### Mode 2 · Runnable Export (requires running this project)\n\nUse the full backend + frontend to generate a persistent, QMD-RAG-powered skill pack backed by SQLite, then export a zip that a downstream agent consumes.\n\n```bash\n# Start backend (see Quick Start below), then:\ncurl -X POST http:\u002F\u002Flocalhost:9849\u002Fapi\u002Fdistill\u002Fdiagnose\ncurl -X POST \"http:\u002F\u002Flocalhost:9849\u002Fapi\u002Fdistill\u002Frun?questionnaire_id={id}&include_research=true\"\ncurl  http:\u002F\u002Flocalhost:9849\u002Fapi\u002Fdistill\u002Fskill\u002F{corpus_id}\u002Frunnable\u002Fdownload -o skill.zip\n```\n\nExported pack contents: `Skill.md` · `corpus.json` · `runtime_protocol.md` · `prompts\u002F`. See [skills\u002FRUNNABLE_MODE.md](skills\u002FRUNNABLE_MODE.md).\n\n### Mode comparison\n\n| Capability | Install-only | Runnable Export |\n|---|---|---|\n| Backend dependency | None | Python 3.11+ backend at `:9849` |\n| Install command | `npx skills add orzcls\u002FPersonaLingo` | `git clone` + `docker-compose up` |\n| Questionnaire \u002F dialogue \u002F distill | Agent-internal loop | Backend API + Vue UI |\n| Dynamic IELTS topic bank sync | No | Yes (seasonal auto-sync) |\n| QMD RAG retrieval | No | BM25 + TF-IDF + RRF + LLM rerank |\n| Style learning persistence | Session-only | SQLite persisted |\n| Static corpus site output | Yes (`site\u002Findex.html`) | No (use frontend pages) |\n| Best for | Drop-in agent install | Tutoring platforms \u002F long-running learners |\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.11+\n- Node.js 18+\n- OpenAI or Anthropic API key\n\n### Backend\n\n```bash\ncd backend\npip install -r requirements.txt\ncp .env.example .env\n# Edit .env with your API keys\npython run.py\n# Server runs at http:\u002F\u002Flocalhost:9849\n```\n\n### Frontend\n\n```bash\ncd frontend\nnpm install\nnpm run dev\n# Opens at http:\u002F\u002Flocalhost:5273\n```\n\n### Docker\n\n```bash\ndocker-compose up -d\n# Frontend: http:\u002F\u002Flocalhost:5273\n# Backend API: http:\u002F\u002Flocalhost:9849\n```\n\n### Windows (No Docker)\n\nDouble-click `start.bat` or run in PowerShell:\n\n```powershell\n.\\start.ps1\n```\n\nThis will install dependencies and start both services:\n- Backend: http:\u002F\u002Flocalhost:9849\n- Frontend: http:\u002F\u002Flocalhost:5273\n\n## 📁 Project Structure\n\n```\nPersonaLingo\u002F\n├── backend\u002F\n│   ├── app\u002F\n│   │   ├── data\u002F              # SQLite DB & JSON data files\n│   │   ├── db\u002F                # Database CRUD & schemas\n│   │   ├── models\u002F            # Pydantic models\n│   │   ├── routers\u002F           # API route handlers\n│   │   ├── services\u002F          # Core business logic\n│   │   │   ├── llm_adapter.py        # Multi-provider LLM interface\n│   │   │   ├── corpus_generator.py   # 5-step generation pipeline\n│   │   │   ├── corpus_rag.py         # QMD RAG engine (BM25+TF-IDF+RRF)\n│   │   │   ├── qmd_engine.py         # QMD 3-layer engine (Q\u002FM\u002FD)\n│   │   │   ├── conversation_engine.py # Chat with style learning\n│   │   │   ├── note_generator.py     # Notes & mindmap generation\n│   │   │   ├── material_parser.py    # File upload processing\n│   │   │   ├── topic_manager.py      # Topic bank management\n│   │   │   ├── skill_exporter.py     # Export to MD\u002FJSON\n│   │   │   └── token_manager.py      # Token counting & limits\n│   │   ├── config.py          # App configuration\n│   │   ├── database.py        # Async DB setup\n│   │   └── main.py            # FastAPI app entry\n│   ├── .env.example\n│   ├── Dockerfile\n│   ├── requirements.txt\n│   └── run.py\n├── frontend\u002F\n│   ├── src\u002F\n│   │   ├── api\u002F               # API client\n│   │   ├── components\u002F        # Vue components\n│   │   │   ├── chat\u002F          # Chat interface\n│   │   │   ├── corpus\u002F        # Corpus management\n│   │   │   ├── notes\u002F         # Notes viewer\n│   │   │   ├── questionnaire\u002F # User profiling\n│   │   │   └── topics\u002F        # Topic browser\n│   │   ├── router\u002F            # Vue Router\n│   │   ├── stores\u002F            # Pinia state management\n│   │   └── views\u002F             # Page views\n│   ├── Dockerfile\n│   ├── nginx.conf\n│   └── package.json\n├── skills\u002F                    # Exported AI agent skills\n├── docker-compose.yml\n└── README.md\n```\n\n## 🎯 Core Workflows\n\n### 1. Corpus Generation (Three-Stage Distill · v3.0)\n\n> **v3.0 Upgrade**: Inspired by `huashu-nuwa`'s three-stage pattern (Deep Research → Thinking Framework → Runnable Skill), the distill pipeline is front-loaded with two extra stages. The original 5 steps expand into **7 steps**, and a new \"Runnable Skill Pack\" is produced as the third-stage artifact. Stage 1\u002F2 failures gracefully fall back to the legacy 5-step path (backward compatible).\n\n```\nQuestionnaire + Materials + Conversations + Topics\n  → [Stage 1] Deep Research (learner_profile)\n  → [Stage 2] Capability Framework distillation\n  → [Stage 3] User Persona → Anchor Stories → Topic Bridges\n             → Vocabulary Upgrade → Pattern Templates\n  → [Delivery] Corpus + Runnable Skill Pack (4 artifacts)\n```\n\n**Three-Stage API**: `POST \u002Fapi\u002Fdistill\u002Fdiagnose` · `POST \u002Fapi\u002Fdistill\u002Frun` · `GET \u002Fapi\u002Fdistill\u002Fskill\u002F{id}\u002Frunnable[\u002Fdownload]`\n\n### 2. Conversation Maintenance\n\n```\nUser Message → RAG Context Retrieval → LLM Response\n→ Corpus Extraction → Style Learning → Corpus Update\n```\n\n### 3. Skill Export\n\n```\nCorpus Data → Workflow Documentation → MD\u002FJSON Export → AI Agent Integration\n```\n\n## 📄 License\n\nMIT\n\n## 🙏 Acknowledgments\n\n- Distillation pipeline architecture inspired by [nuwa-skill](https:\u002F\u002Fgithub.com\u002Falchaincyf\u002Fnuwa-skill) — the \"Deep Research → Mental Framework → Runnable Skill\" paradigm for distilling human expertise into AI-native skill packages.\n","PersonaLingo 是一个基于AI的个性化雅思口语语料生成器，通过七阶段流水线生成结构化验证的语料库JSON文件及静态个人资料网站。其核心功能包括由LLM驱动的五步智能语料生成流程、支持OpenAI和Anthropic双引擎无缝切换、以及独特的QMD检索增强生成引擎（结合查询扩展、多信号匹配和LLM重排序）。此外，它还提供了动态话题库、笔记型LM风格的对话维护、材料上传解析、自动生成学习笔记与思维导图等功能，并能根据不同目标分数输出差异化的学习策略。此项目适用于需要提高雅思口语成绩的学习者，尤其是希望通过个性化内容提升备考效率的人群。","2026-06-11 04:02:50","CREATED_QUERY"]