[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73181":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},73181,"pg_textsearch","timescale\u002Fpg_textsearch","timescale","PostgreSQL extension for BM25 relevance-ranked full-text search. Postgres OSS licensed.","",null,"C",3798,109,9,19,0,10,21,67,30,95.32,"PostgreSQL License",false,"main",true,[27,28,29,30],"bm25","c-extension","full-text-search","postgresql","2026-06-12 04:01:07","# pg_textsearch\n\n[![CI](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Factions\u002Fworkflows\u002Fci.yml)\n[![Benchmarks](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Factions\u002Fworkflows\u002Fbenchmark.yml\u002Fbadge.svg)](https:\u002F\u002Ftimescale.github.io\u002Fpg_textsearch\u002Fbenchmarks\u002F)\n[![Coverity Scan](https:\u002F\u002Fscan.coverity.com\u002Fprojects\u002F32822\u002Fbadge.svg)](https:\u002F\u002Fscan.coverity.com\u002Fprojects\u002Fpg_textsearch)\n\nModern ranked text search for Postgres.\n\n- Simple syntax: `ORDER BY content \u003C@> 'search terms'`\n- BM25 ranking with configurable parameters (k1, b)\n- Works with Postgres text search configurations (english, french, german, etc.)\n- Expression indexes for JSONB fields, multi-column search, and text transformations\n- Partial indexes for scoped search and multilingual tables\n- Fast top-k queries via Block-Max WAND optimization\n- Parallel index builds for large tables\n- Supports partitioned tables\n- Best in class performance and scalability\n\n🚀 **Status**: v1.3.0-dev - Production ready.\n\n![Tapir and Friends](images\u002Ftapir_and_friends_v1.2.0.png)\n\n## Historical note\n\nThe original name of the project was Tapir - **T**extual **A**nalysis for **P**ostgres **I**nformation **R**etrieval.  We still use the tapir as our\nmascot and the name occurs in various places in the source code.\n\n## PostgreSQL Version Compatibility\n\npg_textsearch supports PostgreSQL 17 and 18.\n\n## Installation\n\n### Pre-built Binaries\n\nDownload pre-built binaries from the\n[Releases page](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Freleases).\nAvailable for Linux and macOS (amd64 and arm64), PostgreSQL 17 and 18.\n\n### Build from Source\n\n```sh\ncd \u002Ftmp\ngit clone https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\ncd pg_textsearch\nmake\nmake install # may need sudo\n```\n\n## Getting Started\n\npg_textsearch must be loaded via `shared_preload_libraries`. Add the following\nto `postgresql.conf` and restart the server:\n\n```\nshared_preload_libraries = 'pg_textsearch'  # add to existing list if needed\n```\n\nThen enable the extension (once per database):\n\n```sql\nCREATE EXTENSION pg_textsearch;\n```\n\nCreate a table with text content\n\n```sql\nCREATE TABLE documents (id bigserial PRIMARY KEY, content text);\nINSERT INTO documents (content) VALUES\n    ('PostgreSQL is a powerful database system'),\n    ('BM25 is an effective ranking function'),\n    ('Full text search with custom scoring');\n```\n\nCreate a pg_textsearch index on the text column\n\n```sql\nCREATE INDEX docs_idx ON documents USING bm25(content) WITH (text_config='english');\n```\n\n## Querying\n\nGet the most relevant documents using the `\u003C@>` operator\n\n```sql\nSELECT * FROM documents\nORDER BY content \u003C@> 'database system'\nLIMIT 5;\n```\n\nNote: `\u003C@>` returns the negative BM25 score since Postgres only supports `ASC` order index scans on operators. Lower scores indicate better matches.\n\nThe index is automatically detected from the column. For explicit index specification:\n```sql\nSELECT * FROM documents\nORDER BY content \u003C@> to_bm25query('database system', 'docs_idx')\nLIMIT 5;\n```\n\nSupported operations:\n- `text \u003C@> 'query'` - Score text against a query (index auto-detected)\n- `text \u003C@> bm25query` - Score text with explicit index specification\n\n### Verifying Index Usage\n\nCheck query plan with EXPLAIN:\n```sql\nEXPLAIN SELECT * FROM documents\nORDER BY content \u003C@> 'database system'\nLIMIT 5;\n```\n\nFor small datasets, PostgreSQL may prefer sequential scans. Force index usage:\n```sql\nSET enable_seqscan = off;\n```\n\nNote: Even if EXPLAIN shows a sequential scan, `\u003C@>` and `to_bm25query` always use the index for corpus statistics (document counts, average length) required for BM25 scoring.\n\n### Filtering with WHERE Clauses\n\nThere are two ways filtering interacts with BM25 index scans:\n\n**Pre-filtering** uses a separate index (B-tree, etc.) to reduce rows before scoring:\n```sql\n-- Create index on filter column\nCREATE INDEX ON documents (category_id);\n\n-- Query filters first, then scores matching rows\nSELECT * FROM documents\nWHERE category_id = 123\nORDER BY content \u003C@> 'search terms'\nLIMIT 10;\n```\n\n**Post-filtering** applies the BM25 index scan first, then filters\nresults. Columns without their own index are filtered after the BM25\nscan:\n```sql\nSELECT * FROM documents\nWHERE length(content) > 100\nORDER BY content \u003C@> 'search terms'\nLIMIT 10;\n```\n\n**Performance considerations**:\n\n- **Pre-filtering tradeoff**: If the filter matches many rows (e.g., 100K+), scoring\n  all of them can be expensive. The BM25 index is most efficient when it can use\n  top-k optimization (ORDER BY + LIMIT) to avoid scoring every matching document.\n\n- **Post-filtering tradeoff**: The index returns top-k results *before* filtering.\n  If your WHERE clause eliminates most results, you may get fewer rows than\n  requested. Increase LIMIT to compensate, then re-limit in application code.\n\n- **Best case**: Pre-filter with a selective condition (matches \u003C10% of rows), then\n  let BM25 score the reduced set with ORDER BY + LIMIT.\n\nThis is similar to the [filtering behavior in pgvector](https:\u002F\u002Fgithub.com\u002Fpgvector\u002Fpgvector?tab=readme-ov-file#filtering),\nwhere approximate indexes also apply filtering after the index scan.\n\n## Indexing\n\nCreate a BM25 index on your text columns:\n\n```sql\nCREATE INDEX ON documents USING bm25(content) WITH (text_config='english');\n```\n\n### Index Options\n\n- `text_config` - PostgreSQL text search configuration to use (required)\n- `k1` - term frequency saturation parameter (1.2 by default)\n- `b` - length normalization parameter (0.75 by default)\n\n```sql\nCREATE INDEX ON documents USING bm25(content) WITH (text_config='english', k1=1.5, b=0.8);\n```\n\nAlso supports different text search configurations:\n\n```sql\n-- English documents with stemming\nCREATE INDEX docs_en_idx ON documents USING bm25(content) WITH (text_config='english');\n\n-- Simple text processing without stemming\nCREATE INDEX docs_simple_idx ON documents USING bm25(content) WITH (text_config='simple');\n\n-- Language-specific configurations\nCREATE INDEX docs_fr_idx ON french_docs USING bm25(content) WITH (text_config='french');\nCREATE INDEX docs_de_idx ON german_docs USING bm25(content) WITH (text_config='german');\n```\n\n### Expression Indexes\n\nIndex expressions instead of plain columns — useful for JSONB fields,\nmulti-column concatenation, and text transformations:\n\n```sql\n-- JSONB field extraction\nCREATE INDEX ON events USING bm25 ((data->>'description'))\n    WITH (text_config='english');\n\nSELECT * FROM events\nORDER BY (data->>'description') \u003C@> to_bm25query('network error', 'events_expr_idx')\nLIMIT 10;\n\n-- Multi-column search\nCREATE INDEX ON articles USING bm25 ((coalesce(title, '') || ' ' || coalesce(body, '')))\n    WITH (text_config='english');\n\n-- Text transformation\nCREATE INDEX ON docs USING bm25 ((lower(content)))\n    WITH (text_config='simple');\n```\n\nThe expression must evaluate to `text` and use only IMMUTABLE functions.\nQueries must repeat the same expression in the `ORDER BY` clause.\n\n### Partial Indexes\n\nIndex a subset of rows by adding a `WHERE` clause. Partial indexes are\nsmaller and faster when queries always target a specific subset:\n\n```sql\nCREATE INDEX ON docs USING bm25 (content)\n    WITH (text_config='english')\n    WHERE status = 'published';\n\nSELECT * FROM docs\nWHERE status = 'published'\nORDER BY content \u003C@> to_bm25query('search terms', 'docs_content_idx')\nLIMIT 10;\n```\n\nPartial indexes require explicit index naming via `to_bm25query()` — the\nimplicit `text \u003C@> 'query'` syntax skips them.\n\nExpression and partial indexes can be combined:\n\n```sql\nCREATE INDEX ON events USING bm25 ((data->>'message'))\n    WITH (text_config='english')\n    WHERE (data->>'severity') = 'error';\n```\n\n### Multilingual Tables\n\nFor tables with documents in multiple languages, create one partial index\nper language, each with the appropriate text search configuration:\n\n```sql\nALTER TABLE docs ADD COLUMN lang CHAR(2) NOT NULL DEFAULT 'en';\n\nCREATE INDEX docs_en_idx ON docs USING bm25 (content)\n    WITH (text_config='english') WHERE lang = 'en';\nCREATE INDEX docs_de_idx ON docs USING bm25 (content)\n    WITH (text_config='german')  WHERE lang = 'de';\nCREATE INDEX docs_fr_idx ON docs USING bm25 (content)\n    WITH (text_config='french')  WHERE lang = 'fr';\n```\n\nEach index applies language-appropriate stemming and stop words. Query\nwith the matching predicate and index name:\n\n```sql\nSELECT * FROM docs\nWHERE lang = 'en'\nORDER BY content \u003C@> to_bm25query('databases', 'docs_en_idx')\nLIMIT 10;\n```\n\n## Data Types\n\n### bm25query\n\nThe `bm25query` type represents queries for BM25 scoring with optional index context:\n\n```sql\n-- Create a bm25query with index name (required for WHERE clause and standalone scoring)\nSELECT to_bm25query('search query text', 'docs_idx');\n-- Returns: docs_idx:search query text\n\n-- Embedded index name syntax (alternative form using cast)\nSELECT 'docs_idx:search query text'::bm25query;\n-- Returns: docs_idx:search query text\n\n-- Create a bm25query without index name (only works in ORDER BY with index scan)\nSELECT to_bm25query('search query text');\n-- Returns: search query text\n```\n\n**Note**: In PostgreSQL 18, the embedded index name syntax using single colon (`:`) allows the\nquery planner to determine the index name even when evaluating SELECT clause expressions early.\nThis ensures compatibility across different query evaluation strategies.\n\n#### bm25query Functions\n\nFunction | Description\n--- | ---\nto_bm25query(text) → bm25query | Create bm25query without index name (for ORDER BY only)\nto_bm25query(text, text) → bm25query | Create bm25query with query text and index name\ntext \u003C@> bm25query → double precision | BM25 scoring operator (returns negative scores)\nbm25query = bm25query → boolean | Equality comparison\n\n## Performance\n\npg_textsearch indexes use an on-disk paged memtable (the L0 of an LSM)\nfor efficient writes. The memtable is mutated under standard buffer\nlocks and WAL-logged via `GenericXLog`. Like other index types, it is\nfaster to create an index after loading your data.\n\n```sql\n-- Load data first\nINSERT INTO documents (content) VALUES (...);\n\n-- Then create index\nCREATE INDEX docs_idx ON documents USING bm25(content) WITH (text_config='english');\n```\n\n### Parallel Index Builds\n\npg_textsearch supports parallel index builds for faster indexing of large tables.\nPostgres automatically uses parallel workers based on table size and configuration.\n\n```sql\n-- Configure parallel workers (optional, uses server defaults otherwise)\nSET max_parallel_maintenance_workers = 4;\nSET maintenance_work_mem = '256MB';  -- At least 64MB required for parallel builds\n\n-- Create index (parallel workers used automatically for large tables)\nCREATE INDEX docs_idx ON documents USING bm25(content) WITH (text_config='english');\n```\n\n**Note:** The planner requires `maintenance_work_mem >= 64MB` to enable parallel index\nbuilds. With insufficient memory, builds fall back to serial mode silently.\n\nYou'll see a notice when parallel build is used:\n```\nNOTICE:  parallel index build: launched 4 of 4 requested workers\n```\n\nFor partitioned tables, each partition builds its index independently with parallel\nworkers if the partition is large enough. This allows efficient indexing of very\nlarge partitioned datasets.\n\n### Performance Tuning\n\n#### Force-merging segments\n\nThe index stores data in multiple segments across levels (similar to an LSM\ntree). After bulk loads or sustained incremental inserts, multiple segments\nmay accumulate; consolidating them into one improves query speed by reducing\nthe number of segments scanned:\n\n```sql\nSELECT bm25_force_merge('docs_idx');\n```\n\nThis is analogous to Lucene's `forceMerge(1)`. It rewrites all segments into\na single segment and reclaims the freed pages. Best used after large batch\ninserts, not during ongoing write traffic.\n\n#### Use LIMIT with ORDER BY\n\nTop-k queries (`ORDER BY ... LIMIT n`) enable Block-Max WAND optimization,\nwhich skips blocks of postings that cannot contribute to the top results.\nWithout a LIMIT clause, the index falls back to scoring all matching\ndocuments up to `pg_textsearch.default_limit`.\n\n```sql\n-- Fast: BMW skips non-competitive blocks\nSELECT * FROM documents ORDER BY content \u003C@> 'search terms' LIMIT 10;\n\n-- Slower: scores up to default_limit documents\nSELECT * FROM documents ORDER BY content \u003C@> 'search terms';\n```\n\n#### Segment compression\n\nCompression is on by default and generally improves both index size and query\nperformance (fewer pages to read). Disable only if you observe that\ndecompression overhead is a bottleneck for your workload:\n\n```sql\nSET pg_textsearch.compress_segments = off;\n```\n\n#### Postgres settings that affect index builds\n\nSetting | Effect\n--- | ---\n`max_parallel_maintenance_workers` | Number of parallel workers for CREATE INDEX (default 2)\n`maintenance_work_mem` | Memory per worker; must be >= 64MB for parallel builds\n\n#### pg_textsearch GUCs\n\nSetting | Default | Description\n--- | --- | ---\n`pg_textsearch.default_limit` | 1000 | Max documents scored when no LIMIT clause is present\n`pg_textsearch.compress_segments` | on | Compress posting blocks in new segments\n`pg_textsearch.segments_per_level` | 8 | Segments per level before automatic compaction (2-64)\n`pg_textsearch.bulk_load_threshold` | 100000 | Terms per transaction before auto-spill (0 = disable)\n`pg_textsearch.memtable_pages_threshold` | 64 | Chain pages before auto-spill (0 = disable)\n\n#### Memtable architecture\n\nStarting in 1.3.0, the L0 memtable lives in the index relation itself\nas a chain of doc-record pages, mutated under standard buffer locks\nand WAL-logged via `GenericXLog`. There is no shared-memory memtable,\nno custom WAL resource manager, and no docid-page recovery scaffold.\nPostgreSQL's stock WAL replay (including the single-page reconstruction\nhelper used by online-page-fix tooling) reconstructs every page without\nneeding to load `pg_textsearch.so`. See\n[`docs\u002Fmemtable_v2.md`](docs\u002Fmemtable_v2.md) for the spec.\n\nAuto-spill is governed by two complementary triggers:\n\n- `memtable_pages_threshold` — fires after each insert when the chain\n  has grown past the configured page count. Default 64 pages\n  (~512 KB at 8 KB blocks) keeps query latency bounded since the\n  chain stays small.\n- `bulk_load_threshold` — fires at COMMIT when a single transaction\n  accumulates many terms in the memtable; useful for COPY \u002F bulk\n  INSERT to bound chain-page growth.\n\n```sql\n-- Manual spill (forces the current chain to a new L0 segment)\nSELECT bm25_spill_index('docs_idx');\n```\n\nVACUUM (including autovacuum's insert-threshold path) also spills the\nmemtable when it runs, so the amount of un-spilled state between\n`CREATE INDEX` and the next server restart stays bounded.\n\n**Crash recovery**: The on-disk memtable chain is itself the durable\nrecord. After a crash, stock PostgreSQL replay restores every page;\nno rebuild is needed at first backend open.\n\n**Streaming replication**: All page mutations are replicated via the\nstandard WAL stream. Standbys reconstruct every page natively.\n\n## Monitoring\n\n```sql\n-- Check index usage\nSELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch\nFROM pg_stat_user_indexes\nWHERE indexrelid::regclass::text ~ 'pg_textsearch';\n```\n\n## Examples\n\n### Basic Search\n\n```sql\nCREATE TABLE articles (id serial PRIMARY KEY, title text, content text);\nCREATE INDEX articles_idx ON articles USING bm25(content) WITH (text_config='english');\n\nINSERT INTO articles (title, content) VALUES\n    ('Database Systems', 'PostgreSQL is a powerful relational database system'),\n    ('Search Technology', 'Full text search enables finding relevant documents quickly'),\n    ('Information Retrieval', 'BM25 is a ranking function used in search engines');\n\n-- Find relevant documents\nSELECT title, content \u003C@> 'database search' as score\nFROM articles\nORDER BY score;\n```\n\nAlso supports different languages and custom parameters:\n\n```sql\n-- Different languages\nCREATE INDEX fr_idx ON french_articles USING bm25(content) WITH (text_config='french');\nCREATE INDEX de_idx ON german_articles USING bm25(content) WITH (text_config='german');\n\n-- Custom parameters\nCREATE INDEX custom_idx ON documents USING bm25(content)\n    WITH (text_config='english', k1=2.0, b=0.9);\n```\n\n\n## Limitations\n\n### No Phrase Queries\n\nThe BM25 index stores term frequencies but not term positions, so it cannot\nnatively evaluate phrase queries like `\"database system\"`. You can emulate\nphrase matching by combining BM25 ranking with a post-filter:\n\n```sql\n-- BM25 ranks candidates; subquery over-fetches to account for\n-- post-filter eliminating non-phrase matches\nSELECT * FROM (\n    SELECT *, content \u003C@> 'database system' AS score\n    FROM documents\n    ORDER BY score\n    LIMIT 100  -- over-fetch\n) sub\nWHERE content ILIKE '%database system%'\nORDER BY score\nLIMIT 10;\n```\n\nBecause the post-filter eliminates some results, the inner LIMIT should\nbe larger than the desired result count.\n\n### No Built-in Faceted Search\n\npg_textsearch does not provide dedicated faceting operators, but standard\nPostgres query machinery handles common faceting patterns:\n\n```sql\n-- Filter by category (assumes a B-tree index on category)\nSELECT * FROM documents\nWHERE category = 'engineering'\nORDER BY content \u003C@> 'search terms'\nLIMIT 10;\n\n-- Compute facet counts over top search results\nSELECT category, count(*)\nFROM (\n    SELECT category FROM documents\n    ORDER BY content \u003C@> 'search terms'\n    LIMIT 100\n) matches\nGROUP BY category;\n```\n\n### Insert\u002FUpdate Performance\n\nThe memtable architecture is designed to support efficient writes, but\nsustained write-heavy workloads are not yet fully optimized. For initial\ndata loading, creating the index after loading data is faster than\nincremental inserts. This is an active area of development.\n\n### No Background Compaction\n\nSegment compaction currently runs synchronously during memtable spill\noperations. Write-heavy workloads may observe compaction latency during\nspills. Background compaction is planned for a future release.\n\n### Partitioned Tables\n\nBM25 indexes on partitioned tables use **partition-local statistics**. Each\npartition maintains its own:\n- Document count (`total_docs`)\n- Average document length (`avg_doc_len`)\n- Per-term document frequencies for IDF calculation\n\nThis means:\n- Queries targeting a single partition compute accurate BM25 scores using that\n  partition's statistics\n- Queries spanning multiple partitions return scores computed independently per\n  partition, which may not be directly comparable across partitions\n\n**Example**: If partition A has 1000 documents and partition B has 10 documents,\nthe term \"database\" would have different IDF values in each partition. Results\nfrom both partitions would have scores on different scales.\n\n**Recommendations**:\n- For time-partitioned data, query individual partitions when score comparability\n  matters\n- Use partitioning schemes where queries naturally target single partitions\n- Consider this behavior when designing partition strategies for search workloads\n\n```sql\n-- Query single partition (scores are accurate within partition)\nSELECT * FROM docs\nWHERE created_at >= '2024-01-01' AND created_at \u003C '2025-01-01'\nORDER BY content \u003C@> 'search terms'\nLIMIT 10;\n\n-- Cross-partition query (scores computed per-partition)\nSELECT * FROM docs\nORDER BY content \u003C@> 'search terms'\nLIMIT 10;\n```\n\n### Word Length Limit\n\npg_textsearch inherits PostgreSQL's tsvector word length limit of 2047 characters.\nWords exceeding this limit are ignored during tokenization (with an INFO message).\nThis is defined by `MAXSTRLEN` in PostgreSQL's text search implementation.\n\nFor typical natural language text, this limit is never encountered. It may affect\ndocuments containing very long tokens such as base64-encoded data, long URLs, or\nconcatenated identifiers.\n\nThis behavior is similar to other search engines:\n- Elasticsearch: Truncates tokens (configurable via `truncate` filter, default 10 chars)\n- Tantivy: Truncates to 255 bytes by default\n\n### Large Documents and Chunked Tokenization\n\npg_textsearch calls Postgres's `to_tsvector` to tokenize document text.\nPostgres caps a single `tsvector`'s lexeme dictionary at 1 MB\n(`MAXSTRPOS`). Documents whose unique-token volume would exceed that cap\nare split into chunks (currently 256 KB) before tokenization, then the\nper-chunk term frequencies are merged.\n\nChunk boundaries are chosen at the last ASCII whitespace inside each\nwindow. This is correct for whitespace-delimited scripts (Latin,\nCyrillic, Greek, Arabic, etc.). For non-whitespace-delimited scripts\n(CJK, Thai, Lao, Khmer), oversize documents are still indexed, but the\nchunk boundary may fall in the middle of what a language-aware tokenizer\nwould treat as a word. In practice this is acceptable because Postgres's\ndefault text-search parser does not emit per-word tokens for those\nscripts anyway. If you use a custom text search configuration with a\nparser that produces word-level tokens for one of these scripts, very\nlarge documents may produce slightly different lexeme counts than a\nsingle-shot tokenization would.\n\n**Workaround for large CJK (or other non-whitespace-delimited)\ndocuments:** split the document into smaller pieces in the application\nlayer and index a `text[]` column instead of `text`. pg_textsearch\nindexes arrays element-by-element and BM25 scores match what you'd get\nfrom concatenating the elements into a single `text` value, so you keep\nranking quality while controlling where chunk boundaries fall. Pair\nthis with a CJK-aware text search configuration from an extension such\nas [zhparser](https:\u002F\u002Fgithub.com\u002Famutu\u002Fzhparser) (Chinese) so that each\nchunk gets word-level tokenization:\n\n```sql\nCREATE EXTENSION zhparser;\nCREATE TEXT SEARCH CONFIGURATION chinese_zh (PARSER = zhparser);\nALTER TEXT SEARCH CONFIGURATION chinese_zh\n    ADD MAPPING FOR n,v,a,i,e,l WITH simple;\n\nCREATE TABLE docs (id bigserial PRIMARY KEY, content text[]);\nCREATE INDEX docs_bm25 ON docs USING bm25(content)\n    WITH (text_config='chinese_zh');\n```\n\n### PL\u002FpgSQL and Stored Procedures\n\nThe implicit `text \u003C@> 'query'` syntax relies on planner hooks to automatically\ndetect the BM25 index. These hooks don't run inside PL\u002FpgSQL DO blocks, functions,\nor stored procedures.\n\n**Inside PL\u002FpgSQL**, use explicit index names with `to_bm25query()`:\n\n```sql\n-- This won't work in PL\u002FpgSQL:\n-- SELECT * FROM docs ORDER BY content \u003C@> 'search terms' LIMIT 10;\n\n-- Use explicit index name instead:\nSELECT * FROM docs\nORDER BY content \u003C@> to_bm25query('search terms', 'docs_idx')\nLIMIT 10;\n```\n\nRegular SQL queries (outside PL\u002FpgSQL) support both forms.\n\n## Troubleshooting\n\n```sql\n-- List available text search configurations\nSELECT cfgname FROM pg_ts_config;\n\n-- List BM25 indexes\nSELECT indexname FROM pg_indexes WHERE indexdef LIKE '%USING bm25%';\n```\n\n\n## Installation Notes\n\nIf your machine has multiple Postgres installations, specify the path to `pg_config`:\n\n```sh\nexport PG_CONFIG=\u002FLibrary\u002FPostgreSQL\u002F18\u002Fbin\u002Fpg_config  # or 17\nmake clean && make && make install\n```\n\nIf you get compilation errors, install Postgres development files:\n\n```sh\n# Ubuntu\u002FDebian\nsudo apt install postgresql-server-dev-17  # for PostgreSQL 17\nsudo apt install postgresql-server-dev-18  # for PostgreSQL 18\n```\n\n## Reference\n\n### Index Options\n\nOption | Type | Default | Description\n--- | --- | --- | ---\ntext_config | string | required | PostgreSQL text search configuration to use\nk1 | real | 1.2 | Term frequency saturation parameter (0.1 to 10.0)\nb | real | 0.75 | Length normalization parameter (0.0 to 1.0)\n\n### Text Search Configurations\n\nAvailable configurations depend on your Postgres installation:\n```\n# SELECT cfgname FROM pg_ts_config;\n  cfgname\n------------\n simple\n arabic\n armenian\n basque\n catalan\n danish\n dutch\n english\n finnish\n french\n german\n greek\n hindi\n hungarian\n indonesian\n irish\n italian\n lithuanian\n nepali\n norwegian\n portuguese\n romanian\n russian\n serbian\n spanish\n swedish\n tamil\n turkish\n yiddish\n(29 rows)\n```\nFurther language support is available via extensions such as [zhparser](https:\u002F\u002Fgithub.com\u002Famutu\u002Fzhparser).\n\n### Development Functions\n\nThese functions are for debugging and development use only. Their interface may\nchange in future releases without notice. Functions marked with † require\nsuperuser privileges.\n\nFunction | Description\n--- | ---\nbm25_force_merge(index_name) → void | Merge all segments into one (improves query speed)\nbm25_spill_index(index_name) → int4 | Force memtable spill to disk segment\nbm25_dump_index(index_name) † → text | Dump internal index structure (truncated)\nbm25_summarize_index(index_name) † → text | Show index statistics without content\n\nAdditional file-writing debug functions (`bm25_dump_index(text, text)` and\n`bm25_debug_pageviz`) are available in debug builds only (compile with\n`-DDEBUG_DUMP_INDEX`).\n\n```sql\n-- Merge all segments into one (best after bulk loads)\nSELECT bm25_force_merge('docs_idx');\n\n-- Force spill to disk (returns number of entries spilled)\nSELECT bm25_spill_index('docs_idx');\n\n-- Quick overview of index statistics\nSELECT bm25_summarize_index('docs_idx');\n\n-- Detailed dump for debugging (truncated output)\nSELECT bm25_dump_index('docs_idx');\n```\n\n## Extension Compatibility\n\npg_textsearch uses fixed LWLock tranche IDs 1001-1008 to support large numbers\nof indexes (e.g., partitioned tables with hundreds of partitions). If you use\nanother Postgres extension that also registers fixed tranche IDs in this range,\nwait event names in `pg_stat_activity` may be incorrect. Core Postgres tranches\nuse IDs below 100. If you encounter a conflict, please\n[open an issue](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Fissues).\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, code style, and\nhow to submit pull requests.\n\n- **Bug Reports**: [Create an issue](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Fissues\u002Fnew?labels=bug&template=bug_report.md)\n- **Feature Requests**: [Request a feature](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Fissues\u002Fnew?labels=enhancement&template=feature_request.md)\n- **General Discussion**: [Start a discussion](https:\u002F\u002Fgithub.com\u002Ftimescale\u002Fpg_textsearch\u002Fdiscussions)\n","pg_textsearch 是一个为 PostgreSQL 提供 BM25 排序的全文搜索扩展。它支持通过简单的语法 `ORDER BY content \u003C@> 'search terms'` 实现基于 BM25 算法的文本相关性排名，并允许用户自定义参数（如 k1, b）。此扩展能够与多种语言的 Postgres 文本搜索配置兼容，支持 JSONB 字段、多列搜索和文本转换的表达式索引，以及用于范围搜索和多语言表的部分索引。此外，它还优化了快速 top-k 查询并通过并行构建索引来提高大型表的处理效率。适用于需要高效且可扩展的文本搜索解决方案的应用场景，特别是那些对搜索结果的相关性和性能有较高要求的情况。",2,"2026-06-11 03:44:22","high_star"]