[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75416":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":12,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},75416,"php-fts","olivier-ls\u002Fphp-fts","olivier-ls","A self-contained full-text search engine in pure PHP. No extensions, no dependencies.",null,"PHP",132,3,1,0,2,27,6,1.81,"MIT License",false,"main",true,[],"2026-06-12 02:03:33","# php-fts\n\nA self-contained full-text search engine written in pure PHP.  \nNo extensions. No external services. No dependencies. Just files.\n\n---\n\n## Who is this for?\n\nphp-fts is designed for projects where deploying a dedicated search service is not an option — shared hosting, small VPS, or simply situations where you want to keep your stack minimal and portable.\n\nIf you have access to Elasticsearch, Meilisearch or Typesense and the infrastructure to run them, use those. They are more powerful and built for high-traffic, large-scale workloads.\n\nIf you don't — or if you'd rather not — php-fts gives you solid full-text search with ranked results, filters, and tolerant matching, with nothing to install and nothing to configure beyond a directory path.\n\n**It is a good fit if:**\n- You are on shared hosting (OVH, Infomaniak, o2switch, etc.)\n- You want zero infrastructure overhead\n- Your dataset is in the range of hundreds to tens of thousands of documents\n- You index offline or on a schedule, and serve searches at runtime\n\n**It is not a good fit if:**\n- You need real-time indexing under heavy concurrent write load\n- Your dataset is in the millions of documents\n- You need geo search or multi-tenant isolation\n\n---\n\n## Features\n\n- **Full-text search** with trigram indexing — tolerant to typos and partial matches\n- **BM25 + IDF scoring** — industry-standard relevance ranking (same algorithm as Lucene \u002F Elasticsearch)\n- **Per-document score** — exposed in results, usable to build facet counts, sorting, or custom ranking\n- **Field boosting** — weight some fields (e.g. title) more than others\n- **Filters** — exact match, comparisons, range, `in`, `not in`, `contains` on array fields\n- **Combined AND \u002F OR filtering** — flexible condition logic\n- **Bulk insertion** — up to 12× faster than individual inserts, single lock for the whole batch, crash-safe\n- **Soft delete** with tombstones — fast deletes, cleaned up on compaction\n- **Atomic update** — soft delete + re-insert in a single lock\n- **Compaction** — rebuilds index files cleanly, removes deleted documents and fragmentation\n- **Fragmentation monitoring** — know when to compact\n- **Binary file storage** — portable across servers, no rebuild needed\n- **O(1) trigram lookup** — fixed-size index (~810 KB), no tree traversal\n- **No extensions required** — runs on any standard PHP 8.1+ installation\n\n---\n\n## Requirements\n\n- PHP **8.1** or higher\n- Read\u002Fwrite access to a directory for index files\n\n---\n\n## Installation\n\n**Via Composer**\n\n```bash\ncomposer require ols\u002Fphp-fts\n```\n\n**Manual install** — if you are not using Composer, copy the `src\u002F` directory into your project and include the autoloader:\n\n```php\nrequire '\u002Fpath\u002Fto\u002Fphp-fts\u002Fsrc\u002Fautoload.php';\n```\n\n---\n\n## Quick start\n\n```php\nuse Ols\\PhpFts\\SearchEngine;\n\n$engine = new SearchEngine();\n$engine->open('.\u002Fsearch_data');\n\n\u002F\u002F Insert a document\n$docId = $engine->insert([\n    'title'       => 'Brown leather shoe',\n    'description' => 'Elegant city shoe in soft leather',\n    'price'       => 129.90,\n    'stock'       => 42,\n    'active'      => true,\n    'category'    => 'Shoes',\n    'brand'       => 'Adidas',\n    'tags'        => ['summer', 'luxury', 'city'],\n]);\n\n\u002F\u002F Search\n$results = $engine->search('leather shoe', limit: 20, boosts: [\n    'title'       => 3.0,\n    'description' => 1.0,\n]);\n\nforeach ($results as $result) {\n    echo $result['document']['title'] . ' — score: ' . $result['score'] . PHP_EOL;\n}\n\n$engine->close();\n```\n\n---\n\n## API Reference\n\n### Open \u002F Close\n\n```php\n$engine->open('.\u002Fsearch_data');   \u002F\u002F Creates directory and files if they don't exist\n$engine->close();                 \u002F\u002F Flushes and closes all file handles\n```\n\n### Insert\n\n```php\n\u002F\u002F Single document — returns the doc ID (binary offset, keep it if you need update\u002Fdelete)\n$docId = $engine->insert([\n    'title'  => 'My product',\n    'price'  => 49.90,\n    'active' => true,\n    'tags'   => ['new', 'sale'],\n]);\n\n\u002F\u002F Bulk insert — one lock for the entire batch, significantly faster\n$docIds = $engine->insertBulk([\n    ['title' => 'Product A', 'price' => 29.90],\n    ['title' => 'Product B', 'price' => 59.90],\n]);\n```\n\nSupported field types: `string`, `int`, `float`, `bool`, `array` of strings.\n\n### Search\n\n```php\n$results = $engine->search(\n    query:         'leather shoe',\n    limit:         20,\n    maxCandidates: 5000,\n    boosts:        ['title' => 3.0, 'description' => 1.0],\n    filters:       [...],\n);\n```\n\nEach result:\n\n```php\n[\n    'docId'    => 942222,   \u002F\u002F document identifier\n    'score'    => 43.74,    \u002F\u002F BM25+IDF relevance score, 0-100\n    'document' => [...],    \u002F\u002F original document array\n]\n```\n\nThe `score` field is available on every result and can be used to build facet counts, custom sorting, or relevance thresholds.\n\n### Filters\n\n```php\n$results = $engine->search('shoe', filters: [\n\n    'and' => [\n        ['field' => 'active',   'op' => '=',        'value' => true],\n        ['field' => 'stock',    'op' => '>',         'value' => 0],\n        ['field' => 'price',    'op' => '\u003C=',        'value' => 300],\n        ['field' => 'category', 'op' => 'in',        'value' => ['Shoes', 'Sport']],\n        ['field' => 'tags',     'op' => 'contains',  'value' => 'luxury'],\n    ],\n\n    'or' => [\n        ['field' => 'brand', 'op' => '=', 'value' => 'Adidas'],\n        ['field' => 'brand', 'op' => '=', 'value' => 'Puma'],\n    ],\n\n]);\n```\n\nBoth `and` and `or` are optional, but at least one must be present.  \nWhen both are used: all AND conditions must pass **and** at least one OR condition must pass.  \nA document missing a filtered field is excluded from results.\n\n| Operator                  | Supported types          |\n|---------------------------|--------------------------|\n| `=` `!=`                  | int, float, bool, string |\n| `>` `>=` `\u003C` `\u003C=`         | int, float               |\n| `in` `not in`             | int, float, string       |\n| `contains` `not contains` | array (document field)   |\n\n### Update \u002F Delete\n\n```php\n\u002F\u002F Atomic update: soft delete + re-insert in a single lock\n$newDocId = $engine->update($docId, ['title' => 'Updated title', 'price' => 149.90]);\n\n\u002F\u002F Soft delete (cleaned up on compaction)\n$engine->delete($docId);\n```\n\n### Maintenance\n\n```php\n$count = $engine->count();               \u002F\u002F Number of live documents\n$rate  = $engine->fragmentationRate();   \u002F\u002F Fragmentation percentage (0 = clean, 100 = all deleted)\n\nif ($engine->fragmentationRate() > 20) {\n    $engine->compact();                  \u002F\u002F Rebuild index files, remove deleted documents\n}\n\n$engine->reset();                        \u002F\u002F Wipe all index files and start fresh\n```\n\n---\n\n## Index files\n\n```\nsearch_data\u002F\n  documents.bin    — serialized documents (JSON, binary format)\n  trigrams.bin     — fixed-size trigram index ~810 KB (37^3 entries, O(1) access)\n  postings.bin     — doc_id lists per trigram\n  tombstones.bin   — deleted doc_ids (cleared on compaction)\n```\n\nFiles are fully portable — copy them between servers without rebuilding.\n\n---\n\n## Scoring\n\nRelevance is computed using **BM25 + IDF**:\n\n- **BM25** — term frequency saturation (a word appearing 10x doesn't score 10x higher) and document length normalization. Parameters: k1 = 1.5, b = 0.75 (standard Lucene defaults).\n- **IDF** — a trigram present in every document contributes little; a rare trigram contributes a lot.\n- The final score is normalized between 0 and 100.\n\n---\n\n## Benchmark\n\nBenchmarks were run on two environments:\n\n- **Windows 11** — local machine, NVMe SSD, PHP 8.3\n- **Linux (OVH shared hosting)** — standard shared plan, PHP 8.3\n\n### Insertion\n\n| Volume  | insert() Win | insert() Linux | insertBulk() Win | insertBulk() Linux | Gain Win | Gain Linux |\n|---------|-------------|----------------|------------------|--------------------|----------|------------|\n| 1 000   | 3.23 s      | 8.58 s         | 274 ms           | 167 ms             | 11.8×    | 51.3×      |\n| 5 000   | 17.43 s     | 38.94 s        | 1.35 s           | 952 ms             | 12.9×    | 40.9×      |\n| 10 000  | 35.72 s     | 66.15 s        | 2.97 s           | 1.62 s             | 12×      | 40.8×      |\n| 20 000  | 72.87 s     | 129.09 s       | 5.78 s           | 3.44 s             | 12.6×    | 37.5×      |\n\n> Always prefer `insertBulk()` over `insert()` in production: it acquires a single lock\n> for the entire batch and is consistently faster — up to **12×** on local NVMe,\n> up to **51×** on shared Linux hosting. Both are designed for offline or scheduled\n> use; keep them out of critical request paths on high-traffic setups.\n\n### Index size\n\n| Volume  | Index size |\n|---------|------------|\n| 1 000   | 2.32 MB    |\n| 5 000   | 8.14 MB    |\n| 10 000  | 18.97 MB   |\n| 20 000  | 36.58 MB   |\n\n### Search\n\n| Volume  | Median Win | Median Linux | P95 Win  | P95 Linux | P99 Win   | P99 Linux |\n|---------|-----------|--------------|----------|-----------|-----------|-----------|\n| 1 000   | 3.51 ms   | 2.06 ms      | 8.21 ms  | 4.41 ms   | 8.66 ms   | 6.64 ms   |\n| 5 000   | 4.52 ms   | 2.99 ms      | 22.79 ms | 8.7 ms    | 23.62 ms  | 16.89 ms  |\n| 10 000  | 5.92 ms   | 4.02 ms      | 41.89 ms | 15.23 ms  | 44.37 ms  | 33.67 ms  |\n| 20 000  | 7.62 ms   | 4.76 ms      | 62.88 ms | 19.76 ms  | 106.67 ms | 21.39 ms  |\n\n> 200 queries, 10 distinct queries in rotation (including typos and out-of-corpus queries).  \n> Measured with `hrtime()`.\n\n### Compaction\n\n| Volume  | Windows  | Linux    |\n|---------|----------|----------|\n| 1 000   | 571.6 ms | 449.3 ms |\n| 5 000   | 1.25 s   | 897.9 ms |\n| 10 000  | 2.12 s   | 1.63 s   |\n| 20 000  | 4.03 s   | 2.68 s   |\n\n> Compaction rewrites the index from scratch — it is an occasional maintenance operation, not a request-time concern.  \n> Run it when `fragmentationRate()` exceeds your threshold (e.g. 20%).\n\n---\n\n## Example application\n\nThe gif below shows one possible use of php-fts — a product search interface with filters and ranked results, built on top of a fake shoe catalogue.\n\nIt is just an illustration. php-fts is an engine, not an interface. You can use it to power a product search, a documentation search, an admin filter, a CLI tool, or anything else that needs full-text matching over a set of documents.\n\nTo run it locally:\n\n```bash\nphp demo\u002Fseed.php\nphp -S localhost:8000 -t demo\n```\n\n![Demo](docs\u002Fdemo.gif)\n\n> No database. No external service. The filters, scores, and result counts are all computed by the engine.\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","php-fts 是一个纯 PHP 实现的全文搜索引擎，无需任何扩展或外部依赖。其核心功能包括基于三元组索引的全文搜索、BM25+IDF 排序算法、字段加权、过滤器支持以及批量插入等。特别适合在共享主机环境（如 OVH, Infomaniak, o2switch 等）、小型 VPS 或者需要保持技术栈简洁可移植的情况下使用。适用于数据集规模在数百到数万文档之间，并且可以在离线或定时任务中进行索引，在运行时提供搜索服务的场景。不适用于需要实时索引、大规模数据集处理或地理搜索等功能的高并发写入负载情况。","2026-06-11 03:52:41","CREATED_QUERY"]