[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81049":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":11,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":18,"hasPages":16,"topics":19,"createdAt":9,"pushedAt":9,"updatedAt":20,"readmeContent":21,"aiSummary":22,"trendingCount":14,"starSnapshotCount":14,"syncStatus":23,"lastSyncTime":24,"discoverSource":25},81049,"ds-trainer","fpolles\u002Fds-trainer","fpolles","Fully offline CLI tool to practice data science technical interviews. Covers SQL, Python\u002FPandas, statistics, ML, algorithms, probability, and case studies.       Multiple question types including auto-evaluated code and SQL. Filter by domain and difficulty. Python 3.10+.",null,"Python",28,4,1,0,39.1,false,"main",true,[],"2026-06-12 04:01:31","# ds-trainer\n\nPractice data scientist technical assessments — fully offline.\n\n---\n\n## Quick start\n\n```bash\n# 1. Create and activate a virtual environment\npython3 -m venv .venv\nsource .venv\u002Fbin\u002Factivate   # Windows: .venv\\Scripts\\activate\n\n# 2. Install (editable, so you can add your own questions later)\npip install -e .\n\n# 3. Run a default 5-question session across all domains\nds-trainer train\n\n# 4. Focus on what you need\nds-trainer train --domain sql --difficulty hard --count 3\nds-trainer train --type fill_in_code --domain python\n```\n\n> **No install?** If you skip `pip install -e .`, you can still run the tool from inside the `logic` directory using:\n> ```bash\n> python -m ds_trainer train\n> ```\n> Note: the dependencies (`rich`, `pandas`, `numpy`) must still be installed for this to work.\n\n---\n\n## What it covers\n\nCompanies that hire data scientists typically test these areas:\n\n| Domain | Topics covered |\n|---|---|\n| **SQL** | JOINs, aggregations, HAVING, window functions (RANK, ROW_NUMBER, moving average), CTEs, correlated subqueries |\n| **Python \u002F Pandas** | Boolean indexing, missing data imputation, `groupby`\u002F`transform`, rolling windows, `pivot_table`, MultiIndex |\n| **Statistics** | p-values, CLT, A\u002FB test design, two-sample t-test, multiple comparisons, bootstrap CI, bias-variance tradeoff |\n| **ML** | Overfitting, class imbalance, cross-validation, feature importance, gradient boosting, Ridge + GridSearchCV, data leakage |\n| **Algorithms** | Two-sum, sliding window, binary search, group anagrams, max profit (single pass) |\n| **Case Studies** | North Star metric, DAU drops, funnel analysis, churn prediction lifecycle, RFM segmentation, cold-start |\n| **Probability** | Bayes' theorem, base-rate fallacy, expected value, independence, Monty Hall, birthday problem, Markov chains, law of total expectation |\n\n**Difficulty levels:** easy \u002F medium \u002F hard (filter with `--difficulty`)\n\n---\n\n## Exercise types\n\n| Type | How to answer |\n|---|---|\n| `multiple_choice` | Type a letter: `A`, `B`, `C`, or `D` |\n| `fill_in_code` | Paste a complete function body; submit with a blank line |\n| `sql_challenge` | Write a SQL query; submit with a blank line |\n| `explain_concept` | Write your answer in plain text; submit with a blank line |\n| `take_home` | A dataset CSV is generated locally; work in your editor, press Enter when done |\n\nPython code and SQL queries are evaluated automatically. Concept questions and take-home projects are self-graded — compare your answer to the model solution shown after you submit.\n\n---\n\n## Navigation keys\n\nDuring any session:\n\n| Key | Action |\n|---|---|\n| `h` | Show the next available hint |\n| `s` | Skip this question (counts as wrong) |\n| `q` | Quit the session early (summary is still shown) |\n\n---\n\n## Full CLI reference\n\n```\nds-trainer train   [--domain DOMAIN] [--difficulty LEVEL] [--type TYPE]\n                   [--count N] [--no-shuffle]\nds-trainer list    [--domain DOMAIN] [--difficulty LEVEL] [--type TYPE]\nds-trainer stats\nds-trainer --version\n```\n\n**Options:**\n\n| Flag | Values | Default |\n|---|---|---|\n| `--domain`, `-d` | `sql` `python` `statistics` `ml` `algorithms` `case_studies` `probability` `all` | `all` |\n| `--difficulty`, `-l` | `easy` `medium` `hard` `all` | `all` |\n| `--type`, `-t` | `multiple_choice` `fill_in_code` `explain_concept` `sql_challenge` `take_home` `all` | `all` |\n| `--count`, `-n` | integer | `5` |\n| `--no-shuffle` | flag | (shuffle by default) |\n\n**Examples:**\n\n```bash\n# See what questions are available before committing\nds-trainer list --domain ml --difficulty medium\n\n# Check how many questions exist per domain × difficulty\nds-trainer stats\n\n# Pure coding interview prep\nds-trainer train --type fill_in_code --count 10\n\n# SQL-only hard grind\nds-trainer train --domain sql --difficulty hard --no-shuffle\n```\n\n---\n\n## Architecture\n\n```\nds_trainer\u002F\n├── cli.py           # argparse entry point; dispatches train \u002F list \u002F stats\n├── models.py        # Question, Session, SessionResult dataclasses + enums\n├── registry.py      # load_all(), filter_questions(), sample_questions()\n├── runner.py        # interactive loop, render_question(), evaluate(), show_summary()\n├── evaluators.py    # eval_code() and eval_sql() — sandboxed execution engines\n├── domains\u002F\n│   ├── sql.py           # 12 questions\n│   ├── python_pandas.py # 10 questions\n│   ├── statistics.py    # 10 questions\n│   ├── ml.py            # 10 questions\n│   ├── algorithms.py    # 8 questions\n│   └── case_studies.py  # 9 questions\n└── data\u002F\n    └── generators.py    # CSV dataset generators for take-home exercises\n```\n\n**Key design decisions:**\n\n- **Single flat `Question` dataclass** — no inheritance hierarchy; the `ExerciseType` enum carries the type tag and `__post_init__` validates required fields per type via `match\u002Fcase`.\n- **Sandboxed code execution** — `eval_code` uses `exec()` with a whitelist of ~30 safe builtins (no `open`, no `__import__`). A `threading.Thread` with a 10-second timeout prevents infinite loops.\n- **In-memory SQLite** — `eval_sql` runs both the user query and the model answer against a fresh `sqlite3.connect(\":memory:\")` and compares result sets as `frozenset[tuple]` (order-independent).\n- **Marker dicts** — pandas DataFrames and sklearn arrays in test cases are stored as serialisable dicts (`{\"__pandas_df__\": True, \"data\": {...}}`) and resolved to real objects just before evaluation.\n- **Plain Python question banks** — questions are `Question(...)` literals in domain files, not YAML\u002FJSON; they diff cleanly and are syntax-highlighted in any editor.\n\n---\n\n## Development setup\n\n```bash\npip install -e \".[dev]\"\n\n# Run all tests\npytest tests\u002F -v\n\n# With coverage\npytest tests\u002F --cov=ds_trainer --cov-report=term-missing\n```\n\nPython 3.10+ required (uses `match\u002Fcase` throughout).\n\n---\n\n## Contributing questions\n\n1. Open the relevant domain file under `ds_trainer\u002Fdomains\u002F`.\n2. Add a `Question(...)` literal to the `QUESTIONS` list.\n3. Follow the ID convention: `{domain_abbrev}_{difficulty_abbrev}_{serial}` — e.g. `sql_m_007`.\n4. Run `ds-trainer stats` to confirm the new question appears.\n5. Run `pytest tests\u002F` — the domain smoke tests will verify the model answer passes.\n\nFor **FILL_IN_CODE** questions, always include `test_cases` so the answer can be auto-evaluated. The `model_answer` field is shown to the user after they submit and is also used by the test suite to verify correctness.\n\nFor **TAKE_HOME** questions, add a generator to `ds_trainer\u002Fdata\u002Fgenerators.py` decorated with `@register(\"your_key\")` and reference it via `dataset_generator=\"your_key\"`.\n\n---\n\n## License\n\nMIT\n","ds-trainer 是一个完全离线的命令行工具，用于练习数据科学面试中的技术问题，涵盖SQL、Python\u002FPandas、统计学、机器学习、算法、概率论及案例研究。项目支持多种题型，包括自动评估代码和SQL查询，并允许用户根据领域和难度筛选题目。该工具适合准备数据科学家职位面试的技术评估环节使用，通过实战演练提升技能。基于Python 3.10+开发，易于安装且可扩展自定义题目。",2,"2026-06-11 04:03:19","CREATED_QUERY"]