[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72217":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},72217,"nano-graphrag","gusye1234\u002Fnano-graphrag","gusye1234","A simple, easy-to-hack GraphRAG implementation","",null,"Python",3869,416,20,68,0,8,12,30,24,84.86,"MIT License",false,"main",true,[27,28,29,30,31,32],"gpt","gpt-4o","graphrag","learning-by-doing","llm","rag","2026-06-12 04:01:04","\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgusye1234\u002Fnano-graphrag\">\n    \u003Cpicture>\n      \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fassets.memodb.io\u002Fnano-graphrag-dark.png\">\n      \u003Cimg alt=\"Shows the MemoDB logo\" src=\"https:\u002F\u002Fassets.memodb.io\u002Fnano-graphrag.png\" width=\"512\">\n    \u003C\u002Fpicture>\n  \u003C\u002Fa>\n  \u003Cp>\u003Cstrong>A simple, easy-to-hack GraphRAG implementation\u003C\u002Fstrong>\u003C\u002Fp>\n  \u003Cp>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython->=3.9.11-blue\">\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fnano-graphrag\u002F\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fnano-graphrag.svg\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcodecov.io\u002Fgithub\u002Fgusye1234\u002Fnano-graphrag\" > \n     \u003Cimg src=\"https:\u002F\u002Fcodecov.io\u002Fgithub\u002Fgusye1234\u002Fnano-graphrag\u002Fgraph\u002Fbadge.svg?token=YFPMj9uQo7\"\u002F> \n \t\t\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fproject\u002Fnano-graphrag\">\n      \u003Cimg src=\"https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fnano-graphrag\u002Fmonth\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp>\n  \t\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FsqCVzAhUY6\">\n      \u003Cimg src=\"https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002FsqCVzAhUY6?style=flat\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgusye1234\u002Fnano-graphrag\u002Fissues\u002F8\">\n       \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F群聊-wechat-green\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\n\n\n\n\n\n\n\n😭 [GraphRAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.16130) is good and powerful, but the official [implementation](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fgraphrag\u002Ftree\u002Fmain) is difficult\u002Fpainful to **read or hack**.\n\n😊 This project provides a **smaller, faster, cleaner GraphRAG**, while remaining the core functionality(see [benchmark](#benchmark) and [issues](#Issues) ).\n\n🎁 Excluding `tests` and prompts,  `nano-graphrag` is about **1100 lines of code**.\n\n👌 Small yet [**portable**](#Components)(faiss, neo4j, ollama...), [**asynchronous**](#Async) and fully typed.\n\n\n\n> If you're looking for a multi-user RAG solution for long-term user memory, have a look at this project: [memobase](https:\u002F\u002Fgithub.com\u002Fmemodb-io\u002Fmemobase) :)\n\n## Install\n\n**Install from source** (recommend)\n\n```shell\n# clone this repo first\ncd nano-graphrag\npip install -e .\n```\n\n**Install from PyPi**\n\n```shell\npip install nano-graphrag\n```\n\n\n\n## Quick Start\n\n> [!TIP]\n>\n> **Please set OpenAI API key in environment: `export OPENAI_API_KEY=\"sk-...\"`.** \n\n> [!TIP]\n> If you're using Azure OpenAI API, refer to the [.env.example](.\u002F.env.example.azure) to set your azure openai. Then pass `GraphRAG(...,using_azure_openai=True,...)` to enable.\n\n> [!TIP]\n> If you're using Amazon Bedrock API, please ensure your credentials are properly set through commands like `aws configure`. Then enable it by configuring like this: `GraphRAG(...,using_amazon_bedrock=True, best_model_id=\"us.anthropic.claude-3-sonnet-20240229-v1:0\", cheap_model_id=\"us.anthropic.claude-3-haiku-20240307-v1:0\",...)`. Refer to an [example script](.\u002Fexamples\u002Fusing_amazon_bedrock.py).\n\n> [!TIP]\n>\n> If you don't have any key, check out this [example](.\u002Fexamples\u002Fno_openai_key_at_all.py) that using `transformers` and `ollama` . If you like to use another LLM or Embedding Model, check [Advances](#Advances).\n\ndownload a copy of A Christmas Carol by Charles Dickens:\n\n```shell\ncurl https:\u002F\u002Fraw.githubusercontent.com\u002Fgusye1234\u002Fnano-graphrag\u002Fmain\u002Ftests\u002Fmock_data.txt > .\u002Fbook.txt\n```\n\nUse the below python snippet:\n\n```python\nfrom nano_graphrag import GraphRAG, QueryParam\n\ngraph_func = GraphRAG(working_dir=\".\u002Fdickens\")\n\nwith open(\".\u002Fbook.txt\") as f:\n    graph_func.insert(f.read())\n\n# Perform global graphrag search\nprint(graph_func.query(\"What are the top themes in this story?\"))\n\n# Perform local graphrag search (I think is better and more scalable one)\nprint(graph_func.query(\"What are the top themes in this story?\", param=QueryParam(mode=\"local\")))\n```\n\nNext time you initialize a `GraphRAG` from the same `working_dir`, it will reload all the contexts automatically.\n\n#### Batch Insert\n\n```python\ngraph_func.insert([\"TEXT1\", \"TEXT2\",...])\n```\n\n\u003Cdetails>\n\u003Csummary> Incremental Insert\u003C\u002Fsummary>\n\n`nano-graphrag` supports incremental insert, no duplicated computation or data will be added:\n\n```python\nwith open(\".\u002Fbook.txt\") as f:\n    book = f.read()\n    half_len = len(book) \u002F\u002F 2\n    graph_func.insert(book[:half_len])\n    graph_func.insert(book[half_len:])\n```\n\n> `nano-graphrag` use md5-hash of the content as the key, so there is no duplicated chunk.\n>\n> However, each time you insert, the communities of graph will be re-computed and the community reports will be re-generated\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> Naive RAG\u003C\u002Fsummary>\n\n`nano-graphrag` supports naive RAG insert and query as well:\n\n```python\ngraph_func = GraphRAG(working_dir=\".\u002Fdickens\", enable_naive_rag=True)\n...\n# Query\nprint(rag.query(\n      \"What are the top themes in this story?\",\n      param=QueryParam(mode=\"naive\")\n)\n```\n\u003C\u002Fdetails>\n\n\n### Async\n\nFor each method `NAME(...)` , there is a corresponding async method `aNAME(...)`\n\n```python\nawait graph_func.ainsert(...)\nawait graph_func.aquery(...)\n...\n```\n\n### Available Parameters\n\n`GraphRAG` and `QueryParam` are `dataclass` in Python. Use `help(GraphRAG)` and `help(QueryParam)` to see all available parameters!  Or check out the [Advances](#Advances) section to see some options.\n\n\n\n## Components\n\nBelow are the components you can use:\n\n| Type            |                             What                             |                       Where                       |\n| :-------------- | :----------------------------------------------------------: | :-----------------------------------------------: |\n| LLM             |                            OpenAI                            |                     Built-in                      |\n|                 |                        Amazon Bedrock                        |                     Built-in                      |\n|                 |                           DeepSeek                           |              [examples](.\u002Fexamples)               |\n|                 |                           `ollama`                           |              [examples](.\u002Fexamples)               |\n| Embedding       |                            OpenAI                            |                     Built-in                      |\n|                 |                        Amazon Bedrock                        |                     Built-in                      |\n|                 |                    Sentence-transformers                     |              [examples](.\u002Fexamples)               |\n| Vector DataBase | [`nano-vectordb`](https:\u002F\u002Fgithub.com\u002Fgusye1234\u002Fnano-vectordb) |                     Built-in                      |\n|                 |        [`hnswlib`](https:\u002F\u002Fgithub.com\u002Fnmslib\u002Fhnswlib)        |         Built-in, [examples](.\u002Fexamples)          |\n|                 |  [`milvus-lite`](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus-lite)   |              [examples](.\u002Fexamples)               |\n|                 | [faiss](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss?tab=readme-ov-file) |              [examples](.\u002Fexamples)               |\n| Graph Storage   | [`networkx`](https:\u002F\u002Fnetworkx.org\u002Fdocumentation\u002Fstable\u002Findex.html) |                     Built-in                      |\n|                 |                [`neo4j`](https:\u002F\u002Fneo4j.com\u002F)                 | Built-in([doc](.\u002Fdocs\u002Fuse_neo4j_for_graphrag.md)) |\n| Visualization   |                           graphml                            |              [examples](.\u002Fexamples)               |\n| Chunking        |                        by token size                         |                     Built-in                      |\n|                 |                       by text splitter                       |                     Built-in                      |\n\n- `Built-in` means we have that implementation inside `nano-graphrag`. `examples` means we have that implementation inside an tutorial under [examples](.\u002Fexamples) folder.\n\n- Check [examples\u002Fbenchmarks](.\u002Fexamples\u002Fbenchmarks) to see few comparisons between components.\n- **Always welcome to contribute more components.**\n\n## Advances\n\n\n\n\u003Cdetails>\n\u003Csummary>Some setup options\u003C\u002Fsummary>\n\n- `GraphRAG(...,always_create_working_dir=False,...)` will skip the dir-creating step. Use it if you switch all your components to non-file storages.\n\n\u003C\u002Fdetails>\n\n\n\n\u003Cdetails>\n\u003Csummary>Only query the related context\u003C\u002Fsummary>\n\n`graph_func.query` return the final answer without streaming. \n\nIf you like to interagte `nano-graphrag` in your project, you can use `param=QueryParam(..., only_need_context=True,...)`, which will only return the retrieved context from graph, something like:\n\n````\n# Local mode\n-----Reports-----\n```csv\nid,\tcontent\n0,\t# FOX News and Key Figures in Media and Politics...\n1, ...\n```\n...\n\n# Global mode\n----Analyst 3----\nImportance Score: 100\nDonald J. Trump: Frequently discussed in relation to his political activities...\n...\n````\n\nYou can integrate that context into your customized prompt.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Prompt\u003C\u002Fsummary>\n\n`nano-graphrag` use prompts from `nano_graphrag.prompt.PROMPTS` dict object. You can play with it and replace any prompt inside.\n\nSome important prompts:\n\n- `PROMPTS[\"entity_extraction\"]` is used to extract the entities and relations from a text chunk.\n- `PROMPTS[\"community_report\"]` is used to organize and summary the graph cluster's description.\n- `PROMPTS[\"local_rag_response\"]` is the system prompt template of the local search generation.\n- `PROMPTS[\"global_reduce_rag_response\"]` is the system prompt template of the global search generation.\n- `PROMPTS[\"fail_response\"]` is the fallback response when nothing is related to the user query.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Customize Chunking\u003C\u002Fsummary>\n\n\n`nano-graphrag` allow you to customize your own chunking method, check out the [example](.\u002Fexamples\u002Fusing_custom_chunking_method.py).\n\nSwitch to the built-in text splitter chunking method:\n\n```python\nfrom nano_graphrag._op import chunking_by_seperators\n\nGraphRAG(...,chunk_func=chunking_by_seperators,...)\n```\n\n\u003C\u002Fdetails>\n\n\n\n\u003Cdetails>\n\u003Csummary>LLM Function\u003C\u002Fsummary>\n\nIn `nano-graphrag`, we requires two types of LLM, a great one and a cheap one. The former is used to plan and respond, the latter is used to summary. By default, the great one is `gpt-4o` and the cheap one is `gpt-4o-mini`\n\nYou can implement your own LLM function (refer to `_llm.gpt_4o_complete`):\n\n```python\nasync def my_llm_complete(\n    prompt, system_prompt=None, history_messages=[], **kwargs\n) -> str:\n  # pop cache KV database if any\n  hashing_kv: BaseKVStorage = kwargs.pop(\"hashing_kv\", None)\n  # the rest kwargs are for calling LLM, for example, `max_tokens=xxx`\n\t...\n  # YOUR LLM calling\n  response = await call_your_LLM(messages, **kwargs)\n  return response\n```\n\nReplace the default one with:\n\n```python\n# Adjust the max token size or the max async requests if needed\nGraphRAG(best_model_func=my_llm_complete, best_model_max_token_size=..., best_model_max_async=...)\nGraphRAG(cheap_model_func=my_llm_complete, cheap_model_max_token_size=..., cheap_model_max_async=...)\n```\n\nYou can refer to this [example](.\u002Fexamples\u002Fusing_deepseek_as_llm.py) that use [`deepseek-chat`](https:\u002F\u002Fplatform.deepseek.com\u002Fapi-docs\u002F) as the LLM model\n\nYou can refer to this [example](.\u002Fexamples\u002Fusing_ollama_as_llm.py) that use [`ollama`](https:\u002F\u002Fgithub.com\u002Follama\u002Follama) as the LLM model\n\n#### Json Output\n\n`nano-graphrag` will use `best_model_func` to output JSON with params `\"response_format\": {\"type\": \"json_object\"}`. However there are some open-source model maybe produce unstable JSON. \n\n`nano-graphrag` introduces a post-process interface for you to convert the response to JSON. This func's signature is below:\n\n```python\ndef YOUR_STRING_TO_JSON_FUNC(response: str) -> dict:\n  \"Convert the string response to JSON\"\n  ...\n```\n\nAnd pass your own func by `GraphRAG(...convert_response_to_json_func=YOUR_STRING_TO_JSON_FUNC,...)`.\n\nFor example, you can refer to [json_repair](https:\u002F\u002Fgithub.com\u002Fmangiucugna\u002Fjson_repair) to repair the JSON string returned by LLM. \n\u003C\u002Fdetails>\n\n\n\n\u003Cdetails>\n\u003Csummary>Embedding Function\u003C\u002Fsummary>\n\nYou can replace the default embedding functions with any `_utils.EmbedddingFunc` instance.\n\nFor example, the default one is using OpenAI embedding API:\n\n```python\n@wrap_embedding_func_with_attrs(embedding_dim=1536, max_token_size=8192)\nasync def openai_embedding(texts: list[str]) -> np.ndarray:\n    openai_async_client = AsyncOpenAI()\n    response = await openai_async_client.embeddings.create(\n        model=\"text-embedding-3-small\", input=texts, encoding_format=\"float\"\n    )\n    return np.array([dp.embedding for dp in response.data])\n```\n\nReplace default embedding function with:\n\n```python\nGraphRAG(embedding_func=your_embed_func, embedding_batch_num=..., embedding_func_max_async=...)\n```\n\nYou can refer to an [example](.\u002Fexamples\u002Fusing_local_embedding_model.py) that use `sentence-transformer` to locally compute embeddings.\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>Storage Component\u003C\u002Fsummary>\n\nYou can replace all storage-related components to your own implementation, `nano-graphrag` mainly uses three kinds of storage:\n\n**`base.BaseKVStorage` for storing key-json pairs of data** \n\n- By default we use disk file storage as the backend. \n- `GraphRAG(.., key_string_value_json_storage_cls=YOURS,...)`\n\n**`base.BaseVectorStorage` for indexing embeddings**\n\n- By default we use [`nano-vectordb`](https:\u002F\u002Fgithub.com\u002Fgusye1234\u002Fnano-vectordb) as the backend.\n- We have a built-in [`hnswlib`](https:\u002F\u002Fgithub.com\u002Fnmslib\u002Fhnswlib) storage also, check out this [example](.\u002Fexamples\u002Fusing_hnsw_as_vectorDB.py).\n- Check out this [example](.\u002Fexamples\u002Fusing_milvus_as_vectorDB.py) that implements [`milvus-lite`](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus-lite) as the backend (not available in Windows).\n- `GraphRAG(.., vector_db_storage_cls=YOURS,...)`\n\n**`base.BaseGraphStorage` for storing knowledge graph**\n\n- By default we use [`networkx`](https:\u002F\u002Fgithub.com\u002Fnetworkx\u002Fnetworkx) as the backend.\n- We have a built-in `Neo4jStorage` for graph, check out this [tutorial](.\u002Fdocs\u002Fuse_neo4j_for_graphrag.md).\n- `GraphRAG(.., graph_storage_cls=YOURS,...)`\n\nYou can refer to `nano_graphrag.base` to see detailed interfaces for each components.\n\u003C\u002Fdetails>\n\n\n\n## FQA\n\nCheck [FQA](.\u002Fdocs\u002FFAQ.md).\n\n\n\n## Roadmap\n\nSee [ROADMAP.md](.\u002Fdocs\u002FROADMAP.md)\n\n\n\n## Contribute\n\n`nano-graphrag` is open to any kind of contribution. Read [this](.\u002Fdocs\u002FCONTRIBUTING.md) before you contribute.\n\n\n\n\n## Benchmark\n\n- [benchmark for English](.\u002Fdocs\u002Fbenchmark-en.md)\n- [benchmark for Chinese](.\u002Fdocs\u002Fbenchmark-zh.md)\n- [An evaluation](.\u002Fexamples\u002Fbenchmarks\u002Feval_naive_graphrag_on_multi_hop.ipynb) notebook on a [multi-hop RAG task](https:\u002F\u002Fgithub.com\u002Fyixuantt\u002FMultiHop-RAG)\n\n\n\n## Projects that used `nano-graphrag`\n\n- [Medical Graph RAG](https:\u002F\u002Fgithub.com\u002FMedicineToken\u002FMedical-Graph-RAG): Graph RAG for the Medical Data\n- [LightRAG](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FLightRAG): Simple and Fast Retrieval-Augmented Generation\n- [fast-graphrag](https:\u002F\u002Fgithub.com\u002Fcirclemind-ai\u002Ffast-graphrag): RAG that intelligently adapts to your use case, data, and queries\n- [HiRAG](https:\u002F\u002Fgithub.com\u002Fhhy-huang\u002FHiRAG): Retrieval-Augmented Generation with Hierarchical Knowledge\n\n> Welcome to pull requests if your project uses `nano-graphrag`, it will help others to trust this repo❤️\n\n\n\n## Issues\n\n- `nano-graphrag` didn't implement the `covariates` feature of `GraphRAG`\n- `nano-graphrag` implements the global search different from the original. The original use a map-reduce-like style to fill all the communities into context, while `nano-graphrag` only use the top-K important and central communites (use `QueryParam.global_max_consider_community` to control, default to 512 communities).\n\n","nano-graphrag 是一个简单易修改的GraphRAG实现。该项目以约1100行代码提供了核心功能，具有更小、更快和更清晰的特点，同时保持了GraphRAG的主要功能。它支持多种后端（如faiss, neo4j, ollama等），并且是异步且全类型化的。适用于需要快速集成或自定义GraphRAG功能的场景，特别适合那些寻找易于理解和修改的GraphRAG库的开发者。",2,"2026-06-11 03:40:54","high_star"]