[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80881":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":12,"lastSyncTime":27,"discoverSource":28},80881,"datacontext","data-context-hq\u002Fdatacontext","data-context-hq","Open-source runtime attribution and context observability for data access by AI agents and applications",null,"Python",38,2,1,0,3,4,9,1.43,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:04:08","[![Tests](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Factions\u002Fworkflows\u002Ftests.yml)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fdatacontext?cacheSeconds=3600)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fdatacontext\u002F)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fdatacontext?cacheSeconds=3600)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fdatacontext\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fl\u002Fdatacontext?cacheSeconds=3600)](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fblob\u002Fmain\u002FLICENSE)\n[![Discussions](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Discussions-2ea44f)](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fdiscussions)\n[![Roadmap](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRoadmap-DataContext-blue)](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fblob\u002Fmain\u002FROADMAP.md)\n# DataContext\n#### Runtime attribution for data access in Python\n\n[Why](#why-datacontext) | [How It Works](#how-it-works) | [Quick Start](#quick-start) | [Event Shape](#event-shape) | [Production Behavior](#production-behavior) | [Roadmap](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fblob\u002Fmain\u002FROADMAP.md)\n\nDataContext helps developers answer a simple question:\n    \n> Which code path, request, job, or agent caused this query?\n\nDataContext gives developers and platform teams more context for understanding data access patterns and improving how production services use databases and data platforms.\n\nDataContext is early and intentionally small. The core event model is designed to stay stable, while integrations and APIs will evolve with real-world usage.\n\n## Install\n\n```bash\npip install datacontext\n```\n\nOptional OpenTelemetry support:\n\n```bash\npip install \"datacontext[otel]\"\n```\n\nOptional SQLAlchemy support:\n\n```bash\npip install \"datacontext[sqlalchemy]\"\n```\n\nOptional PostgreSQL support:\n\n```bash\npip install \"datacontext[postgres]\"\n```\n\nOptional BigQuery support:\n\n```bash\npip install \"datacontext[bigquery]\"\n```\n\nOptional Dagster support:\n\n```bash\npip install \"datacontext[dagster]\"\n```\n\nOptional Snowflake support:\n\n```bash\npip install \"datacontext[snowflake]\"\n```\n\nOptional dbt support:\n\n```bash\npip install \"datacontext[dbt]\"\n```\n\n## Quick Start\n\nConfigure DataContext at an explicit data-access boundary:\n\n```python\nimport datacontext\n\ndatacontext.configure(\n    service_name=\"checkout-api\",\n    environment=\"production\",\n    instruments=[\n        datacontext.instrument_function(\n            target=\"app.db.execute\",\n            query_arg=\"query\",\n            db_system=\"postgres\",\n            client=\"internal-db-wrapper\",\n        )\n    ],\n)\n```\n\nAfter configuration, calls to `app.db.execute(...)` emit one completed query event when the function returns or raises.\n\nWrappers preserve return values and re-raise original exceptions unchanged. If DataContext fails, your application should not.\n\nEmitted event:\n\n```json\n{\n  \"event_name\": \"datacontext.query\",\n  \"timestamp\": \"2026-05-15T10:31:04.203Z\",\n  \"started_at\": \"2026-05-15T10:31:04.182Z\",\n  \"ended_at\": \"2026-05-15T10:31:04.203Z\",\n  \"service_name\": \"checkout-api\",\n  \"environment\": \"production\",\n  \"db_system\": \"postgres\",\n  \"client\": \"internal-db-wrapper\",\n  \"query_fingerprint\": \"sha256:4f5b7f...\",\n  \"query_text\": \"select * from orders where id = ?\",\n  \"duration_ms\": 21.4,\n  \"callsite\": {\n    \"file\": \"checkout.py\",\n    \"path\": \"\u002Fapp\u002Fcheckout.py\",\n    \"line\": 42,\n    \"function\": \"load_cart\",\n    \"stack\": \"checkout:42 load_cart -> routes:88 post_checkout\"\n  },\n  \"status\": \"ok\"\n}\n```\n\n## Why DataContext?\n\nQueries often lose their application context by the time they reach logs, traces, or the data platform itself.\n\nThat makes it hard to answer:\n\n- Which request, job, or agent triggered this query?\n- Which code path caused this unexpected load?\n- Which actor, tenant, or session was involved?\n\nDataContext connects query events to runtime context, source callsites, and OpenTelemetry trace context when available.\n\n## How It Works\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fdata-context-hq\u002Fdatacontext\u002Fmain\u002Fassets\u002Fdatacontext-flow.svg\" alt=\"DataContext query attribution flow\" width=\"680\">\n\u003C\u002Fp>\n\n## Supported Today\n\nDataContext currently supports:\n\n- manual query instrumentation with `trace_query(...)` and `capture_query(...)`,\n- wrapping explicit data-access functions with `instrument_function(...)`,\n- SQLAlchemy engine instrumentation through the optional `sqlalchemy` extra,\n- native PostgreSQL connection instrumentation through the optional `postgres` extra,\n- native BigQuery client instrumentation through the optional `bigquery` extra,\n- Dagster execution context attribution through the optional `dagster` extra,\n- dbt execution context attribution through the optional `dbt` extra,\n- native Snowflake connector instrumentation through the optional `snowflake` extra,\n- JSONL, callback, and OpenTelemetry-oriented sinks,\n- correlating query events with runtime context and active OpenTelemetry spans.\n\nOther database drivers are not automatically instrumented yet.\n\n## Planned Integrations\n\nOther database clients, ORMs, and data-platform libraries will be prioritized from real usage.\n\nUse [GitHub Discussions](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fdiscussions) or [feature requests](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fissues\u002Fnew?template=feature_or_integration_request.md) to share the library, data-access pattern, sync\u002Fasync behavior, and event fields you need.\n\n## Add Runtime Context\n\nDataContext is most useful when queries are connected to runtime context:\n\n```python\nfrom datacontext import context\n\nwith context.use(\n    operation=\"checkout\",\n    actor=\"user:123\",\n    request_id=\"req_abc\",\n    attributes={\"tenant\": \"acme\", \"region\": \"us-east-1\"},\n):\n    run_business_logic()\n```\n\nAny query captured inside the context includes that attribution.\n\n## Event Shape\n\nDataContext emits one final event per query, at finish or error time.\n\nEvery normal event includes:\n\n- `event_name`, `timestamp`, `started_at`, `ended_at`,\n- `service_name`, `environment`, `db_system`, `client`,\n- `query_fingerprint`, `duration_ms`, `callsite`, and `status`.\n\nThe `timestamp` is the event finish time and matches `ended_at`. By default, events also include sanitized `query_text`; it can be disabled globally or per captured query. Optional fields are only present when DataContext can derive them or when the caller supplies them.\n\nExample `datacontext.query` event:\n\n```json\n{\n  \"event_name\": \"datacontext.query\",\n  \"timestamp\": \"2026-05-15T10:31:04.203Z\",\n  \"started_at\": \"2026-05-15T10:31:04.182Z\",\n  \"ended_at\": \"2026-05-15T10:31:04.203Z\",\n  \"service_name\": \"checkout-api\",\n  \"environment\": \"production\",\n  \"db_system\": \"postgres\",\n  \"client\": \"internal-db-wrapper\",\n  \"query_fingerprint\": \"sha256:4f5b7f...\",\n  \"query_text\": \"select * from orders where id = ?\",\n  \"duration_ms\": 21.4,\n  \"callsite\": {\n    \"file\": \"checkout.py\",\n    \"path\": \"\u002Fapp\u002Fcheckout.py\",\n    \"line\": 42,\n    \"function\": \"load_cart\",\n    \"stack\": \"checkout:42 load_cart -> routes:88 post_checkout\"\n  },\n  \"status\": \"ok\",\n  \"trace_id\": \"0af7651916cd43dd8448eb211c80319c\",\n  \"span_id\": \"b7ad6b7169203331\",\n  \"trace_flags\": \"01\",\n  \"operation\": \"checkout\",\n  \"actor\": \"user:123\",\n  \"request_id\": \"req_abc\",\n  \"job_id\": \"job_456\",\n  \"session_id\": \"sess_789\",\n  \"rows\": 12,\n  \"db_name\": \"checkout\",\n  \"db_host\": \"postgres.internal\",\n  \"attributes\": {\n    \"tenant\": \"acme\",\n    \"region\": \"us-east-1\"\n  }\n}\n```\n\nOn errors, DataContext emits `status: \"error\"` and includes compact error metadata before re-raising the original exception.\n\n```json\n{\n  \"status\": \"error\",\n  \"error\": {\n    \"type\": \"ValueError\",\n    \"message\": \"boom\"\n  }\n}\n```\n\n## Production Behavior\n\nDataContext is designed to sit on production data-access paths without changing application behavior:\n\n- wrappers preserve return values and re-raise original exceptions,\n- DataContext capture failures fall back to a minimal event,\n- sink failures are logged and dropped,\n- sanitized `query_text` is emitted by default, while raw SQL is explicit opt-in,\n- OpenTelemetry trace context is used when present, but DataContext does not configure tracing or exporters.\n\n## Schema Philosophy\n\nDataContext uses a small, stable event shape on purpose.\n\nThe core schema answers the questions teams usually need first:\n\n- what query shape ran,\n- where it came from in code,\n- which runtime context caused it,\n- which trace or span it belongs to.\n\nThe schema is meant to work as JSON logs, warehouse rows, debugging artifacts, or observability events. Team-specific metadata belongs in `attributes`, so teams can extend events without changing the common attribution layer.\n\n## Manual Instrumentation\n\nThe Quick Start approach is the recommended default: configure DataContext once and wrap your existing data-access function. When that does not fit, you can instrument directly at the call site with the lower-level APIs:\n\n```python\nwith datacontext.trace_query(\n    db_system=\"postgres\",\n    client=\"internal-db-wrapper\",\n    query=query,\n):\n    db.execute(query)\n```\n\nUse `capture_query(...)` when timing is already measured by your integration:\n\n```python\ndatacontext.capture_query(\n    db_system=\"postgres\",\n    client=\"internal-db-wrapper\",\n    query=query,\n    started_at=started_at,\n    ended_at=ended_at,\n    duration_ms=duration_ms,\n    status=\"ok\",\n    rows=12,\n)\n```\n\n## SQLAlchemy\n\nSQLAlchemy support is optional and only installed with the `sqlalchemy` extra. Pass an engine to `instrument_sqlalchemy(...)` during configuration:\n\n```python\nimport datacontext\n\ndatacontext.configure(\n    service_name=\"checkout-api\",\n    environment=\"production\",\n    instruments=[\n        datacontext.instrument_sqlalchemy(engine),\n    ],\n)\n```\n\nThe integration listens to SQLAlchemy engine events and emits one DataContext event for each completed or failed statement. It also supports async engines by registering listeners on the underlying sync engine.\n\n## PostgreSQL\n\nPostgreSQL support is optional and only installed with the `postgres` extra. It instruments a `psycopg` connection by wrapping connection-level `execute(...)` calls and cursors returned by `cursor()`.\n\n```python\nimport datacontext\nimport psycopg\n\nconn = psycopg.connect(\"postgresql:\u002F\u002Fcheckout@postgres.internal\u002Fcheckout\")\n\ndatacontext.configure(\n    service_name=\"checkout-api\",\n    environment=\"production\",\n)\ndatacontext.instrument_postgres(conn).apply()\n\nwith conn.cursor() as cursor:\n    cursor.execute(\"select * from orders where id = %s\", [order_id])\n```\n\nThe integration emits one DataContext event per completed or failed `execute(...)` or `executemany(...)` call. Events use `db_system: \"postgresql\"`, `client: \"psycopg\"`, and include `db_name`, `db_host`, and `rows` when available from the connection or cursor.\n\n## BigQuery\n\nBigQuery support is optional and only installed with the `bigquery` extra. Pass a `google.cloud.bigquery.Client` to `instrument_bigquery(...)` during configuration:\n\n```python\nfrom google.cloud import bigquery\nimport datacontext\n\nclient = bigquery.Client(project=\"analytics-prod\")\n\ndatacontext.configure(\n    service_name=\"warehouse-loader\",\n    environment=\"production\",\n    instruments=[\n        datacontext.instrument_bigquery(\n            client,\n            labels={\"service\": \"warehouse-loader\"},\n            job_id_prefix=\"warehouse_loader_\",\n        ),\n    ],\n)\n```\n\nThe integration instruments `Client.query_and_wait(...)` and `Client.query(...)`. For `query(...)`, DataContext emits the event when the returned job's `result()` method completes or raises, so the duration follows the waited query rather than only job submission. Captured events use `db_system: \"bigquery\"`, `client: \"google-cloud-bigquery\"`, the client project as `db_name`, and BigQuery job metadata under `attributes`.\n\nBigQuery job labels and `job_id_prefix` are opt-in. When configured, labels are injected through `QueryJobConfig`; if the call already passed a `job_config`, DataContext merges labels into it and user-defined labels win on matching keys. `job_id_prefix` is injected for `Client.query(...)` only if the call did not already pass `job_id` or `job_id_prefix`.\n\n## Dagster\n\nDagster support is optional and only installed with the `dagster` extra. DataContext does not replace Dagster observability, materializations, asset lineage, or run state. Dagster remains the source of truth for orchestration identity; DataContext adds Dagster metadata to query events emitted inside assets and ops.\n\nUse the dependency-free context bridge inside a Dagster asset or op:\n\n```python\nimport datacontext as dc\n\n@asset\ndef orders(context):\n    with dc.use_dagster_context(context):\n        run_queries()\n```\n\nWhen Dagster is installed, you can also use the native resource:\n\n```python\nfrom datacontext import DataContextResource\n\n@asset\ndef orders(context, datacontext: DataContextResource):\n    with datacontext.use_context(context):\n        run_queries()\n```\n\nCaptured queries include the Dagster run id as `job_id`, the asset key or op name as `operation`, and Dagster details under `attributes` such as `dagster.run_id`, `dagster.job_name`, `dagster.op_name`, `dagster.asset_key`, and `dagster.partition_key`. Dagster run tags are included only when `include_run_tags=True`.\n\n## Snowflake\n\nSnowflake connector support is optional and only installed with the `snowflake` extra. Configure it once before creating or using cursors:\n\n```python\nimport snowflake.connector\n\nimport datacontext\n\ndatacontext.configure(\n    service_name=\"analytics-worker\",\n    environment=\"production\",\n    instruments=[\n        datacontext.instrument_snowflake(),\n    ],\n)\n\nconn = snowflake.connector.connect(\n    account=\"acme-prod\",\n    user=\"loader\",\n    password=\"...\",\n    warehouse=\"analytics_wh\",\n    database=\"analytics\",\n    schema=\"public\",\n)\n\ncursor = conn.cursor()\ncursor.execute(\"select count(*) from orders\")\n```\n\nThe integration wraps `snowflake-connector-python` cursor `execute`, `executemany`, and `execute_async`. It emits `db_system: \"snowflake\"`, `client: \"snowflake-connector-python\"`, `rows` from `cursor.rowcount` when available, and Snowflake metadata under `attributes`, including `snowflake.query_id` from `cursor.sfqid`.\n\nRicher Snowflake cost and performance metrics, such as bytes scanned, partitions scanned, execution time, spill bytes, load percent, and cloud-services credits, come from Snowflake Query History. DataContext does not query Query History inside the synchronous cursor wrapper; join those metrics later by `attributes.snowflake.query_id`.\n\n## dbt\n\ndbt support is optional and only installed with the `dbt` extra. DataContext does not replace dbt artifacts, exposures, lineage, or run results. dbt remains the source of truth for transformation identity; DataContext adds dbt metadata to query events emitted inside Python models or other dbt-adjacent execution code.\n\nUse the dependency-free context bridge inside a dbt Python model:\n\n```python\nimport datacontext as dc\n\ndef model(dbt, session):\n    with dc.use_dbt_context(dbt):\n        return run_queries(session)\n```\n\nCaptured queries include the dbt invocation id as `job_id`, the model unique id or relation as `operation`, and dbt details under `attributes` such as `dbt.invocation_id`, `dbt.node.unique_id`, `dbt.node.name`, `dbt.node.resource_type`, `dbt.node.package_name`, `dbt.this`, and `dbt.target.name`.\n\n## Privacy and Query Text\n\nDataContext emits `query_fingerprint` and sanitized `query_text` by default. Raw query text is not emitted unless you explicitly opt in.\n\nTo emit only the fingerprint without sanitized query text, disable query text:\n\n```python\ndatacontext.configure(\n    service_name=\"checkout-api\",\n    environment=\"production\",\n    include_query_text=False,\n)\n```\n\nThe sanitizer uses the same normalization as fingerprinting: it replaces string and numeric literals with `?`, normalizes whitespace, lowercases SQL, and compacts placeholder `IN (...)` lists.\n\nTo include exact raw SQL instead, use the explicit raw-query option:\n\n```python\ndatacontext.capture_query(\n    db_system=\"postgres\",\n    client=\"internal-db-wrapper\",\n    query=query,\n    started_at=started_at,\n    ended_at=ended_at,\n    duration_ms=duration_ms,\n    status=\"ok\",\n    include_raw_query_text=True,\n)\n```\n\n## OpenTelemetry\n\nDataContext uses OpenTelemetry context when it exists. It does not set up tracing, choose exporters, or replace your existing pipeline.\n\nWith an active span, DataContext adds `trace_id`, `span_id`, and `trace_flags` to emitted events. It can also attach compact `datacontext.*` attributes to the active span, including query fingerprint, status, duration, operation, and request ID.\n\n## Sinks\n\nThe default sink writes JSON Lines to stdout. You can send events to a file, a callback, or an OpenTelemetry-oriented sink.\n\nConfigure a file sink:\n\n```python\nimport datacontext\nfrom datacontext.sinks import FileJsonlSink\n\ndatacontext.configure(\n    service_name=\"checkout-api\",\n    environment=\"production\",\n    sink=FileJsonlSink(\"datacontext.jsonl\"),\n)\n```\n\nConfigure a callback sink:\n\n```python\nfrom datacontext.sinks import CallbackSink\n\ndatacontext.configure(\n    service_name=\"checkout-api\",\n    environment=\"production\",\n    sink=CallbackSink(lambda event: send_to_pipeline(event)),\n)\n```\n\nSink failures are dropped and logged. They should not block application work.\n\n## Community\n\nUse [GitHub Discussions](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fdiscussions) for questions, design feedback, and integration ideas.\n\nUse [GitHub Issues](https:\u002F\u002Fgithub.com\u002Fdata-context-hq\u002Fdatacontext\u002Fissues) for bugs and focused feature requests.\n\n## License\n\nApache-2.0\n","DataContext 是一个开源项目，旨在为AI代理和应用程序的数据访问提供运行时归因和上下文可观测性。它通过在数据访问点配置后，能够追踪并记录每次查询的发起者（如代码路径、请求、作业或代理），从而帮助开发者和平台团队更好地理解数据访问模式，并优化生产服务对数据库和数据平台的使用效率。支持多种数据库系统包括PostgreSQL、BigQuery等，并且可以与OpenTelemetry等工具集成以增强监控能力。适用于需要深入了解和控制其数据流量的应用场景，特别是那些依赖于复杂数据处理流程的企业级应用。","2026-06-11 04:02:40","CREATED_QUERY"]