[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2263":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":15,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},2263,"ontobricks","databrickslabs\u002Fontobricks","databrickslabs","OntoBricks is a web application that transforms Databricks tables into a materialized knowledge graph. It lets you design ontologies (OWL), map them to Unity Catalog tables via R2RML, materialize triples into a Delta triple store and graph DB, reason over the graph (OWL 2 RL, SWRL, SHACL), and query it through an auto-generated GraphQL API + MCP","",null,"Python",146,29,9,6,0,10,38,18,65.23,"Other",false,"master",[25,26,27,28],"graph","ontology","owl","triple-store","2026-06-12 04:00:14","\u003Cp align=\"center\">\n  \u003Cimg src=\"src\u002Ffront\u002Fstatic\u002Fglobal\u002Fimg\u002Fontobricks-icon.svg\" alt=\"OntoBricks Logo\" width=\"120\" height=\"120\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">OntoBricks 0.4.0\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Digital Twin Builder for Databricks\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue.svg\" alt=\"Python\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ffastapi-0.109+-green.svg\" alt=\"FastAPI\">\n\u003C\u002Fp>\n\n## Project Description\n\nOntoBricks is a web application that transforms Databricks tables into a materialized knowledge graph. It lets you design ontologies (OWL), map them to Unity Catalog tables via R2RML, materialize triples into a Delta-backed triple store and a Lakebase Postgres graph engine, reason over the graph (OWL 2 RL, SWRL, SHACL), and query it through an auto-generated GraphQL API. The entire pipeline — from metadata import to a queryable knowledge graph — can run in four clicks using LLM-powered automation.\n\n## Project Support\n\nPlease note that all projects in the \u002Fdatabrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.\n\nAny issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.\n\n## Building the Project\n\nOntoBricks uses [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) for dependency management. All dependencies are declared in `pyproject.toml`.\n\n```bash\n# Clone the repository\ngit clone \u003Crepository-url>\ncd OntoBricks\n\n# Install dependencies (uv resolves them from pyproject.toml)\nuv sync\n\n# Or use the setup script\nscripts\u002Fsetup.sh\n```\n\n### Prerequisites\n\n- Python 3.10 or higher\n- Databricks workspace access (Databricks Apps must be enabled). Local\n  development uses a Personal Access Token; production uses the App's\n  service principal.\n- A SQL Warehouse (you'll need its ID for local dev).\n- **Databricks Lakebase Autoscaling** project + branch + Postgres\n  database — **required since v0.4.0** for the domain registry\n  (domains, versions, permissions, schedules, global config) and the\n  Graph DB triple store. Provisioned Lakebase instances are **not**\n  supported. The Postgres driver (`psycopg[binary]` + `psycopg-pool`)\n  is declared as an optional dependency so volume-only forks can opt\n  out — install with `uv sync --extra lakebase` for any normal\n  deployment.\n- **Unity Catalog Volume** in the catalog\u002Fschema that hosts the\n  triplestore VIEWs (`triplestore_\u003Cdomain>_v\u003Cn>`). The volume is\n  reserved for binary artefacts (`documents\u002F` uploads — domain-scoped\n  attachments imported by the ontology designer).\n- `psql` (libpq client) on `PATH` for the Lakebase permission\n  bootstrap scripts (`brew install libpq && brew link --force libpq`\n  on macOS).\n\n## Deploying \u002F Installing the Project\n\n### Local Development\n\n```bash\n# Configure credentials\ncp .env.example .env\n# Edit .env with your Databricks host, token, and warehouse ID\n\n# Start the application\nscripts\u002Fstart.sh\n# Open http:\u002F\u002Flocalhost:8000\n```\n\n### Deploy to Databricks Apps\n\n```bash\n# Install and authenticate the Databricks CLI (>= 0.250.0)\nbrew install databricks            # or curl -fsSL https:\u002F\u002Fdatabricks.com\u002Finstall.sh | sh\ndatabricks auth login --host https:\u002F\u002F\u003Cworkspace>\n\n# Edit scripts\u002Fdeploy.config.sh (warehouse, registry catalog\u002Fschema,\n# Lakebase project\u002Fbranch\u002Fdatabase — see the file header) and then:\nmake deploy\n# Or directly: scripts\u002Fdeploy.sh\n```\n\n`scripts\u002Fdeploy.sh` generates `app.yaml` from `app.yaml.template` +\n`scripts\u002Fdeploy.config.sh`, validates and deploys the DAB bundle on\ntarget `dev-lakebase`, runs `scripts\u002Fbootstrap-app-permissions.sh`\n(app SP `CAN_MANAGE` on itself), then runs\n`scripts\u002Fbootstrap-lakebase-perms.sh` on the registry \u002F graph \u002F sync\nschemas. All steps are idempotent.\n\nAfter the first deploy, bind the **sql-warehouse**, **volume**, and\n**postgres** (Lakebase) resources in the Databricks Apps UI\n(**Compute > Apps > \u003Cyour-app> > Resources**) if the DAB bind did\nnot take. Open the app and click **Settings > Registry > Initialize**\nto create the Lakebase schema; re-run `make bootstrap-lakebase` once\nafterwards so the freshly created schema picks up `USAGE\u002FDML`.\n\n> **Lakebase deploy targets.** Pick a Databricks Lakebase Autoscaling\n> project + branch and a Postgres database, then set the\n> `LAKEBASE_PROJECT`, `LAKEBASE_BRANCH`,\n> `LAKEBASE_DATABASE_RESOURCE_SEGMENT` (the `db-…` id from\n> `databricks postgres list-databases \"projects\u002F\u003Cid>\u002Fbranches\u002F\u003Cbranch>\" -o json`,\n> **not** the Postgres database name shown in the SQL UI), and\n> `LAKEBASE_REGISTRY_SCHEMA` defaults in `scripts\u002Fdeploy.config.sh`.\n> The DAB composes the full Apps `postgres.database` path and binds a\n> `postgres` Apps resource so the runtime auto-injects\n> `PGHOST` \u002F `PGPORT` \u002F `PGDATABASE` \u002F `PGUSER`; the app mints the\n> Lakebase JWT automatically (no user secret required).\n\n> **Upgrading from a pre-v0.4.0 deployment.** Pre-v0.4.0 stored the\n> entire registry as JSON on the Unity Catalog Volume. Run\n> `scripts\u002Fmigrate-registry-to-lakebase.sh` once before upgrading to\n> v0.4.0+ to copy every JSON-shaped artefact (domains, versions,\n> permissions, schedules, global config) into Lakebase. Binary\n> artefacts on the Volume are left untouched.\n\n> **First deploy only:** `make deploy` runs `scripts\u002Fbootstrap-app-permissions.sh` automatically, which grants each app's service principal `CAN_MANAGE` on itself. Without that grant the middleware cannot read the app's own ACL and every first-time visitor — including the deploying `CAN_MANAGE` user — lands on the access-denied page. If you deploy via `databricks bundle deploy` directly, run `make bootstrap-perms` once afterwards (it is idempotent).\n\nSee [Deployment Guide](docs\u002Fdeployment.md) for the full checklist including resource configuration and permissions.\n\n## Releasing the Project\n\n1. Ensure all tests pass: `make test`\n2. Update the version in `pyproject.toml`\n3. Commit, tag, and push:\n\n```bash\ngit add -A && git commit -m \"Release vX.Y.Z\"\ngit tag vX.Y.Z\ngit push origin main --tags\n```\n\n4. Deploy the new version: `make deploy`\n\n## Using the Project\n\n### Automated Pipeline (4 clicks)\n\n| Step | Action | What Happens |\n|------|--------|--------------|\n| **1** | **Import Metadata** (Domain > Metadata) | Fetches table and column metadata from Unity Catalog |\n| **2** | **Generate Ontology** (Ontology > Wizard) | LLM designs entities, relationships, and attributes from your metadata |\n| **3** | **Auto-Map** (Mapping > Auto-Map) | LLM generates SQL mappings for every entity and relationship |\n| **4** | **Synchronize** (Digital Twin > Status) | Executes mappings and populates the triple store |\n\n### Domain & registry (0.1.2 UX)\n\n- **Ontology Designer** — the main ontology graph view lives under **Ontology → Designer** (visual canvas + AI Assistant).\n- **Domain Cockpit (Validation)** — **Active Version** shows which registry version is exposed via **API \u002F MCP**; it can differ from the version you have loaded in the editor.\n- **Registry → Browse** — only place to **set the Active (API\u002FMCP) version** for a domain; **Domain → Versions** shows that status as a read-only badge.\n- **New domain** — after **New Domain**, a full-page loading overlay runs until Domain Information finishes its first load.\n- **Domain Information** — triple-store \u002F snapshot \u002F local graph paths update when you **commit** the domain name (blur or change) or change version (aligned with naming rules before save).\n- **Duplicate names** — **Save to Unity Catalog** is blocked if the sanitized domain name already exists in the registry (inline check + confirmation before POST).\n- **Navbar** — domain name and version in the top bar refresh after load, save, clear, import, and version switches (browser cache invalidated on those actions).\n\n### Graph DB engine (Settings → Graph DB)\n\nThe **graph** triple-store backend is pluggable; the abstraction (`GraphDBFactory` \u002F `GraphDBBackend`) is preserved so additional engines can be added in the future. Today only one engine ships:\n\n- **Lakebase (Postgres)** — default; **three Postgres objects per domain version** (`*_sync` bulk-data table, `*__app` companion for reasoning\u002Fcohort writes, `g_\u003Cdom>_v\u003Cn>` UNION view for reads) inside a configurable Postgres schema on the **App-bound** Lakebase database (same connection as the optional Lakebase registry backend). Requires the `lakebase` extra (`uv sync --extra lakebase`) so `psycopg` is installed.\n\nEngine-specific options are stored as global JSON (`graph_engine_config`). For Lakebase the supported keys are **`database`** (optional override of `PGDATABASE`), **`schema`** (optional, default `ontobricks_graph`), **`sync_mode`** (`app_managed` default, or `managed_synced` to delegate bulk ingest to a Databricks Lakeflow snapshot pipeline), **`sync_table_mode`** (`snapshot` \u002F `triggered` \u002F `continuous` — `snapshot` is the recommended mode), **`sync_timeout_s`** (default 600), **`sync_uc_catalog`** (UC catalog the synced table is registered in; defaults to the snapshot Delta catalog when unset), and **`sync_uc_schema`** (UC schema segment for the synced-table FQN; defaults to the registry UC schema so the Lakeflow object lands in the same UC namespace as other registry artefacts). See `docs\u002Flakebase-graphdb.md` for the full reference.\n\n> **Lakebase permission grants (three schemas).** The app service principal needs `USAGE + DML` on up to three Postgres schemas — each covered by one run of `scripts\u002Fbootstrap-lakebase-perms.sh`:\n>\n> | Schema | When to run | Deploy config var |\n> |---|---|---|\n> | Registry schema (e.g. `ontobricks_registry`) | After `Settings → Registry → Initialize` | `LAKEBASE_BOOTSTRAP_SCHEMA` |\n> | Graph schema (e.g. `ontobricks_graph`) | After first Digital Twin `Build` | `LAKEBASE_GRAPH_SCHEMA` |\n> | Sync schema (e.g. `ontobricks`) | After first Lakeflow snapshot (`managed_synced` only) | `LAKEBASE_SYNC_SCHEMA` |\n>\n> `scripts\u002Fdeploy.sh` calls the bootstrap for all three automatically. If the Graph DB is on a **separate Lakebase instance** from the registry, set `LAKEBASE_GRAPH_PROJECT`, `LAKEBASE_GRAPH_BRANCH`, and `LAKEBASE_GRAPH_DATABASE` in `scripts\u002Fdeploy.config.sh` so the second and third grants target the correct instance.\n\n> **Lakebase build performance.** When the active engine is Lakebase, the Digital Twin build streams warehouse rows in `fetchmany` batches (`SQLWarehouse.iter_rows`) and ingests them via `COPY FROM STDIN` into a per-batch temp table followed by `INSERT … ON CONFLICT DO NOTHING` (and the symmetrical `DELETE … USING` for incremental removes). The FastAPI process never holds the full graph or the full diff: snapshot CTAS and `EXCEPT` execution stay warehouse-side, the app pipes one batch at a time. There is no Volume archive thread — Postgres is the system of record for the graph.\n\n> **Lakebase managed-synced mode.** When `graph_engine_config.sync_mode = \"managed_synced\"`, the bulk R2RML data movement is moved entirely off the app: a Databricks Lakeflow snapshot pipeline keeps a Postgres synced table in lock-step with the R2RML view, and the FastAPI process only orchestrates (`SyncedTableManager.ensure` + `trigger_and_wait`). Reasoning + cohort writes stay on the direct PG path through a writable companion table; readers see both via a UNION view (back-compat name). PG layout per graph version: `g_\u003Cdom>_v\u003Cn>_sync` (Lakeflow), `g_\u003Cdom>_v\u003Cn>__app` (app), `g_\u003Cdom>_v\u003Cn>` (UNION view). See `docs\u002Fgraphdb-integration.md §9` for the full architecture.\n\n### Manual Workflow\n\n1. **Design** an ontology visually using the OntoViz canvas, or import OWL\u002FRDFS\u002Findustry standards (FIBO, CDISC, IOF, HL7 FHIR R4\u002FR4B\u002FR5)\n2. **Map** ontology entities to Databricks tables with column-level precision\n3. **Build** the Digital Twin — materializes triples into the triple store (incremental by default)\n4. **Query** through the GraphQL playground or explore the interactive knowledge graph\n5. **Reason** over the graph — run OWL 2 RL inference, SWRL rules, SHACL validation, and constraint checks\n\n### Knowledge Graph Features\n\n- **Two-phase search** — preview matching entities in a flat list, then select specific ones to expand into the full graph with relationships and neighbors\n- **Configurable search depth** — control the maximum traversal depth and entity cap for graph expansion\n- **Right-click \"Expand neighbours\"** — enrich the current graph in place with N-hop neighbours of any selected node (depth follows the right-pane Depth slider, default 2); newly added entities are highlighted and the camera zooms to frame them, with a non-blocking spinner in the canvas top-right while the request runs\n- **Bridge navigation** — follow cross-domain bridges to automatically switch domains and focus on the target entity in the knowledge graph\n- **Data cluster detection** — detect communities in the knowledge graph using Louvain, Label Propagation, or Greedy Modularity algorithms; available client-side (Graphology) for the visible subgraph and server-side (NetworkX) for the full graph; cluster results can be visualized with color-by-cluster mode and collapsed into super-nodes\n- **Cohort discovery** — group entities that travel together using rule-based linkage (shared resources via predicates) and compatibility constraints (same-value, value-equals, value-in, value-range); deterministic, explainable cohorts with live counters, why\u002Fwhy-not explainers, and idempotent materialisation as graph triples (`:inCohort`) or Unity Catalog Delta tables. See [`docs\u002Fcohort_discovery.md`](docs\u002Fcohort_discovery.md).\n- **Data quality violation limits** — cap the number of violations displayed per rule (configurable via dropdown, default 10) for faster quality checks\n- **Per-rule progress tracking** — SWRL inference and data quality checks report progress for each individual rule\n\n### AI Assistant\n\nThe **Ontology Designer** view (**Ontology → Designer**) includes a floating AI Assistant (bottom-right of the canvas) that lets you modify your ontology through natural language commands — add entities, remove orphans, list relationships, and more. Conversation history is maintained within the session.\n\n### Navigation & Performance\n\n- **Deep-linked sidebar sections** — shareable URLs, browser Back\u002FForward support\n- **Breadcrumb navigation** — always see your position (Registry > Domain > Ontology > Section)\n- **Keyboard shortcuts** — `Cmd\u002FCtrl+S` save, `Cmd\u002FCtrl+K` search, `?` help overlay\n- **SQL connection pooling** — reusable database connections, no per-query TLS handshake\n- **CSRF protection** — double-submit cookie for all state-changing requests\n- **Structured JSON logging** — set `LOG_FORMAT=json` for production-grade observability\n\n### MCP Integration\n\nOntoBricks exposes the knowledge graph to LLM agents via the [Model Context Protocol](https:\u002F\u002Fmodelcontextprotocol.io\u002F). Deploy the companion `mcp-ontobricks` app and connect from Cursor, Claude Desktop, or the Databricks Playground.\n\n### Registry OBX Export \u002F Import (UI)\n\nExport one or more domains directly from **Registry → Browse** to a portable\n`.obx` file with per-domain version-mode selection (Latest \u002F Active \u002F All \u002F\nChoose). Import with per-domain conflict resolution (Skip \u002F Overwrite \u002F Rename).\nNo command line required — ideal for ad-hoc transfers and cross-tenant sharing.\n\n### Registry Import \u002F Export (CLI)\n\nFor automated promotion pipelines use the\n`scripts\u002Fregistry_transfer.sh` command-line tool — export a curated subset\nof domains\u002Fversions from a source registry into a `.zip`, then preview and\ncommit it into the target registry. See\n[Registry Import \u002F Export](docs\u002Fimport-export.md) for the full reference,\nexamples, and a comparison of the OBX UI vs CLI approaches.\n\n### Ontology Pitfalls Detector\n\nDetect 19 structural, logical, and semantic pitfalls (P1.1–P4.7) in your\nontology from the **Ontology → Pitfalls** sidebar panel. Fast graph-only\nchecks run immediately; ML-heavy checks (semantic similarity, NLP naming)\nrequire installing the optional extra:\n\n```bash\nuv sync --extra pitfalls\n```\n\n### Documentation\n\nFull documentation is available in [`docs\u002F`](docs\u002FREADME.md). For a comprehensive feature list and architecture details, see [INFO.md](docs\u002FINFO.md).\n","OntoBricks 是一个将 Databricks 表转换为实体化知识图谱的 Web 应用程序。它支持设计本体（OWL），通过 R2RML 将其映射到 Unity Catalog 表，将三元组实体化到 Delta 支持的三元存储和 Lakebase Postgres 图数据库中，并基于 OWL 2 RL、SWRL 和 SHACL 进行图推理，同时提供自动生成的 GraphQL API 供查询。该项目采用 Python 编写，并利用 FastAPI 构建后端服务。OntoBricks 适用于需要从 Databricks 数据构建数字孪生模型或知识图谱的场景，尤其是在数据集成、分析及语义推理方面有需求的企业级应用中。",2,"2026-06-11 02:49:08","CREATED_QUERY"]