[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73466":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},73466,"reader","jina-ai\u002Freader","jina-ai","Convert any URL to an LLM-friendly input with a simple prefix https:\u002F\u002Fr.jina.ai\u002F","https:\u002F\u002Fjina.ai\u002Freader",null,"TypeScript",11162,829,52,18,0,132,194,372,396,118.76,"Apache License 2.0",false,"main",[26,27],"llm","proxy","2026-06-12 04:01:09","# Reader\n\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fjina-ai\u002Freader\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fjina-ai\u002Freader)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fjina-ai\u002Freader)\n\nYour LLMs deserve better input.\n\nReader does two things:\n- **Read**: It converts any URL to an **LLM-friendly** input with `https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fyour.url`. Get improved output for your agent and RAG systems at no cost.\n- **Search**: It searches the web for a given query with `https:\u002F\u002Fs.jina.ai\u002Fyour+query`. This allows your LLMs to access the latest world knowledge from the web.\n\nCheck out [the live demo](https:\u002F\u002Fjina.ai\u002Freader#demo)\n\nOr just visit these URLs (**Read**) https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fgithub.com\u002Fjina-ai\u002Freader, (**Search**) https:\u002F\u002Fs.jina.ai\u002FWho%20will%20win%202024%20US%20presidential%20election%3F and see yourself.\n\n> Feel free to use Reader API in production. It is free, stable and scalable. We are maintaining it actively as one of the core products of Jina AI. [Check out rate limit](https:\u002F\u002Fjina.ai\u002Freader#pricing)\n\n\u003Cimg width=\"973\" alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Freader\u002Fassets\u002F2041322\u002F2067c7a2-c12e-4465-b107-9a16ca178d41\">\n\u003Cimg width=\"973\" alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Freader\u002Fassets\u002F2041322\u002F675ac203-f246-41c2-b094-76318240159f\">\n\n> This repository is the open source branch of the codebase behind `https:\u002F\u002Fr.jina.ai` and `https:\u002F\u002Fs.jina.ai`. It runs in stateless or bucket-cached mode; the MongoDB-backed SaaS storage layer is not included here.\n\n## Updates\n\n- **2026-04** — Re-synchronized the open source branch with the SaaS code. The MongoDB-backed storage layer is stripped; the oss branch runs in stateless mode out of the box, with optional MinIO\u002FS3-compatible bucket caching via `docker compose`. See [Local development](#local-development).\n- **2025-12** — Storage layer decoupled and binary file uploads landed. PDFs and MS Office documents (Word, Excel, PowerPoint) can now be POSTed directly via the `file` body field — no need to host them first. See [cookbooks.md](.\u002Fcookbooks.md#pdf-ms-office-and-raw-html-uploads).\n- **2025-03** — Major refactor: Reader is no longer a Firebase application. The SaaS migrated off Firestore + Cloud Functions to a Cloud Run image with MongoDB Atlas, removing the platform-coupled bits and unblocking the local-Docker path above.\n- **2024-05** — `s.jina.ai` launched, extending Reader from URL→markdown to search→markdown. PDFs added the same month — any URL ending in `.pdf` is parsed with PDF.js and returned as markdown.\n- **2024-04** — Reader released and `r.jina.ai` went live as Jina AI's first SaaS API for converting URLs to LLM-friendly input.\n\n## What Reader can read\n\n- **Web pages** — rendered with headless Chrome, or fetched lightweight via `curl-impersonate`. Reader picks intelligently between the two.\n- **PDFs** — any URL, parsed with PDF.js. [See this NASA PDF result](https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fwww.nasa.gov\u002Fwp-content\u002Fuploads\u002F2023\u002F01\u002F55583main_vision_space_exploration2.pdf) vs [the original](https:\u002F\u002Fwww.nasa.gov\u002Fwp-content\u002Fuploads\u002F2023\u002F01\u002F55583main_vision_space_exploration2.pdf).\n- **MS Office documents** — Word, Excel, PowerPoint, converted via LibreOffice and then processed as HTML\u002FPDF.\n- **Images** — captioned by a vision-language model, so your downstream text-only LLM gets *just enough* hints to reason about them.\n\n## Usage\n\n### Using `r.jina.ai` for single URL fetching\nSimply prepend `https:\u002F\u002Fr.jina.ai\u002F` to any URL. For example, to convert the URL `https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FArtificial_intelligence` to an LLM-friendly input, use the following URL:\n\n[https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FArtificial_intelligence](https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FArtificial_intelligence)\n\n### [Using `r.jina.ai` for a full website fetching (Google Colab)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1uoBy6_7BhxqpFQ45vuhgDDDGwstaCt4P#scrollTo=5LQjzJiT9ewT)\n\n### Using `s.jina.ai` for web search\nSimply prepend `https:\u002F\u002Fs.jina.ai\u002F` to your search query. Note that if you are using this in the code, make sure to encode your search query first, e.g. if your query is `Who will win 2024 US presidential election?` then your url should look like:\n\n[https:\u002F\u002Fs.jina.ai\u002FWho%20will%20win%202024%20US%20presidential%20election%3F](https:\u002F\u002Fs.jina.ai\u002FWho%20will%20win%202024%20US%20presidential%20election%3F)\n\nBehind the scenes, Reader searches the web, fetches the top 5 results, visits each URL, and applies `r.jina.ai` to it. This is different from many `web search function-calling` in agent\u002FRAG frameworks, which often return only the title, URL, and description provided by the search engine API. If you want to read one result more deeply, you have to fetch the content yourself from that URL. With Reader, `http:\u002F\u002Fs.jina.ai` automatically fetches the content from the top 5 search result URLs for you (reusing the tech stack behind `http:\u002F\u002Fr.jina.ai`). This means you don't have to handle browser rendering, blocking, or any issues related to JavaScript and CSS yourself.\n\n### Using `s.jina.ai` for in-site search\nSimply specify `site` in the query parameters such as:\n\n```bash\ncurl 'https:\u002F\u002Fs.jina.ai\u002FWhen%20was%20Jina%20AI%20founded%3F?site=jina.ai&site=github.com'\n```\n\n### [Interactive Code Snippet Builder](https:\u002F\u002Fjina.ai\u002Freader#apiform)\n\nWe highly recommend using the code builder to explore different parameter combinations of the Reader API.\n\n\u003Ca href=\"https:\u002F\u002Fjina.ai\u002Freader#apiform\">\u003Cimg width=\"973\" alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Freader\u002Fassets\u002F2041322\u002Fa490fd3a-1c4c-4a3f-a95a-c481c2a8cc8f\">\u003C\u002Fa>\n\n### Using request headers\n\nYou can control the behavior of the Reader API using request headers. The list below covers the most useful ones — for the full surface with up-to-date defaults and validation rules, see the live API docs at [https:\u002F\u002Fr.jina.ai\u002Fdocs](https:\u002F\u002Fr.jina.ai\u002Fdocs), or the source of truth in [`src\u002Fdto\u002Fcrawler-options.ts`](.\u002Fsrc\u002Fdto\u002Fcrawler-options.ts).\n\n- `x-respond-with` — select the output format.\n  - `markdown` returns markdown *without* going through `readability`\n  - `html` returns `documentElement.outerHTML`\n  - `text` returns `document.body.innerText`\n  - `screenshot` returns the URL of the webpage's screenshot\n  - `pageshot` similar to `screenshot` but tries to capture the whole page instead of just the viewport\n  - `frontmatter` returns **Markdown with a YAML frontmatter block**. The default plain-text response uses a custom `Title: …` \u002F `URL Source: …` header format; `frontmatter` replaces that with a front matter block. Example:\n\n    ```bash\n    curl -H 'X-Respond-With: frontmatter' 'https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fexample.com'\n    ```\n\n    ```markdown\n    ---\n    title: \"Example Domain\"\n    description: \"This domain is for use in illustrative examples.\"\n    url: \"https:\u002F\u002Fexample.com\u002F\"\n    ---\n\n    ## Example Domain\n\n    This domain is for use in illustrative examples in documents. ...\n    ```\n\n  - `markdown+frontmatter` — like `frontmatter` but covers the full page without readability filtering.\n- `x-engine` — enforces a fetching engine: `browser` (headless Chrome), `curl` (lightweight, no JS), or `auto` (the default — Combined use of both browser and curl).\n- `x-proxy-url` — route the traffic through your designated proxy.\n- `x-cache-tolerance` — integer seconds; how stale a cached page is acceptable.\n- `x-no-cache: true` — bypass the cached page (lifetime 3600s). Equivalent to `x-cache-tolerance: 0`.\n- `x-target-selector` — a CSS selector. Reader returns content within the matched element instead of the full page. Useful when automatic content extraction misses what you want.\n- `x-wait-for-selector` — a CSS selector. Reader waits until the matched element is rendered before returning. If `x-target-selector` is set, this can be omitted to wait for the same element.\n- `x-timeout` — integer seconds (max 180). When set, Reader will not return early; it waits for network idle or until the timeout is reached.\n- `x-max-tokens` — integer (≥500). Trim the response so it never exceeds this many tokens. Useful as a per-request guardrail when feeding a fixed-size context window — Reader truncates rather than rejects.\n- `x-token-budget` — integer. Reject the request if the resulting content would exceed this many tokens. Use this when *over*-budget output is worse than no output (e.g. cost control). Ignored on the search endpoint.\n- `x-respond-timing` — explicit control over *when* Reader is willing to return. Trade off latency against completeness:\n  - `html` — return as soon as the raw HTML lands. No JS execution, no waiting.\n  - `visible-content` — return the moment readable content is parseable. Lowest latency that still produces text.\n  - `mutation-idle` — wait for DOM mutations to settle for ≥0.2s. Good default for SPAs that lazy-render above the fold.\n  - `resource-idle` — wait for content-affecting resources to finish loading (≥0.5s quiet). The default heuristic for content-shaped requests.\n  - `media-idle` — wait for media (images, video, fonts) to also finish. Use with `screenshot` \u002F `pageshot` \u002F `vlm`.\n  - `network-idle` — full `networkidle0`. Slowest, most complete. Implied when `x-timeout` ≥ 20.\n\n  When omitted, Reader picks one based on `x-respond-with`, `x-timeout`, and `x-with-iframe`. See `presumedRespondTiming` in [src\u002Fdto\u002Fcrawler-options.ts](.\u002Fsrc\u002Fdto\u002Fcrawler-options.ts) for the exact rules.\n- `x-with-generated-alt: true` — caption images on the page with a VLM.\n- `x-retain-images` — control how images survive into the output:\n  - `all` (default) — keep `![alt](url)` markdown for every image.\n  - `none` — drop images entirely.\n  - `alt` — keep alt text only, no URLs. Cheap on tokens; useful when the downstream LLM has no use for the image link.\n- `x-retain-links` — control how links survive into the output:\n  - `all` (default) — keep `[text](url)` markdown.\n  - `none` — drop links entirely.\n  - `text` — keep link anchor text only, drop URLs. Best for embedding \u002F semantic-index pipelines where URLs are noise.\n  - `gpt-oss` — emit citations in gpt-oss's `【{id}†...】` format and append a numbered URL footer (also auto-enables `x-with-links-summary`).\n- `x-retain-media` — control how `\u003Cvideo>`, `\u003Caudio>`, and embedded video iframes (`\u003Ciframe>` from YouTube, Vimeo, Bilibili, etc.) appear in the output:\n  - `link` (default) — markdown link, e.g. `[Video 1](url)`. Embedded iframes are rewritten to their canonical watch URL. Respects `x-md-link-style`.\n  - `none` — drop media entirely; non-video iframes fall back to their inner text content.\n  - `text` — bare label only, e.g. `Video 1` or `Audio 1`. No URL.\n  - `image` — markdown image syntax, e.g. `![Video 1](url)`.\n  - `html` — the original HTML element with cosmetic attributes (`class`, `id`, `style`, `data-*`, `aria-*`) stripped. Embedded video iframes keep their original embed `src` rather than the canonical watch URL.\n- `x-with-links-summary` \u002F `x-with-images-summary` — append a deduplicated footer of all links \u002F images to the output. Combine with `x-retain-links: text` or `x-retain-images: alt` to get inline anchor\u002Falt text plus *one* canonical URL list at the end — convenient when you want the model to see URLs without paying for them inline. `x-with-links-summary: all` keeps every link instead of only the unique ones.\n- `x-markdown-chunking` — opt-in semantic chunking of the markdown response. Returns a JSON array (or `\u001d`-delimited text) of chunks instead of one blob:\n  - `true` \u002F `h1` … `h5` — heading-based split at the given heading level (e.g. `h3` chunks at `#`, `##`, and `###`).\n  - `structured` \u002F `s1` … `s5` — block-level structured split. `s1` is coarsest, `s5` finest.\n- `x-preset` — apply a pre-packaged option bundle for common scenarios. Preset values only take effect for options the caller does *not* set explicitly (via body or another header). See [cookbooks.md](.\u002Fcookbooks.md#using-presets) for examples.\n  - `reader` — for displaying content to human users.\n  - `index` — for semantic indexing \u002F embedding pipelines.\n  - `research` — for AI research agents needing structured, citable output.\n  - `agent` — for AI agents doing everyday browsing tasks.\n  - `spider` — for recursive site crawling with a full link inventory.\n- `x-detach-invisibles` — detach elements with eventual `display:none` before snapshotting. Implies browser engine; disables caching.\n- `x-set-cookie` — forward cookie settings. Requests with cookies are not cached.\n- `x-md-*` — fine-tune markdown output (heading style, bullet markers, link style, etc.). See [src\u002Fdto\u002Fturndown-tweakable-options.ts](.\u002Fsrc\u002Fdto\u002Fturndown-tweakable-options.ts).\n\n### Using `r.jina.ai` for single page application (SPA) fetching\nMany websites nowadays rely on JavaScript frameworks and client-side rendering, usually known as Single Page Applications (SPA). Thanks to [Puppeteer](https:\u002F\u002Fgithub.com\u002Fpuppeteer\u002Fpuppeteer) and headless Chrome, Reader natively supports fetching these websites. However, due to specific approaches some SPAs are developed with, there may be some extra precautions to take.\n\n#### SPAs with hash-based routing\nBy definition of the web standards, content after `#` in a URL is not sent to the server. To mitigate this, use `POST` with the `url` parameter in the body:\n\n```bash\ncurl -X POST 'https:\u002F\u002Fr.jina.ai\u002F' -d 'url=https:\u002F\u002Fexample.com\u002F#\u002Froute'\n```\n\n#### SPAs with preloading contents\nSome SPAs (and even some non-SPAs) show preload content before later loading the main content dynamically. In this case, Reader may capture the preload content instead. Two ways to mitigate:\n\n```bash\n# wait for network idle or until timeout\ncurl 'https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fexample.com\u002F' -H 'x-timeout: 10'\n\n# wait for a specific element\ncurl 'https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fexample.com\u002F' -H 'x-wait-for-selector: #content'\n\n# combined use of both to wait for non-existent element (which means waiting for the full timeout duration)\ncurl 'https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fexample.com\u002F' -H 'x-timeout: 30' -H 'x-wait-for-selector: non-existent-element'\n```\n\n### JSON mode\n\nUse the accept-header to control the output format:\n\n```bash\ncurl -H \"Accept: application\u002Fjson\" https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fen.m.wikipedia.org\u002Fwiki\u002FMain_Page\n```\n\n### Generated alt\n\nAll images on a page that lack an `alt` tag can be auto-captioned by a VLM (vision-language model) and formatted as `![Image [idx]: [VLM_caption]](img_URL)`. This should give your downstream text-only LLM *just enough* hints to include those images in reasoning, selection, and summarization:\n\n```bash\ncurl -H \"X-With-Generated-Alt: true\" https:\u002F\u002Fr.jina.ai\u002Fhttps:\u002F\u002Fen.m.wikipedia.org\u002Fwiki\u002FMain_Page\n```\n\n## Cookbooks\n\nFor pipeline-specific recipes — RAG, semantic indexing, deep research, agentic browsing, visual snapshots, PDF\u002FOffice\u002FHTML uploads, and more — see [cookbooks.md](.\u002Fcookbooks.md). Each entry is a short curl example with the header combination that fits the use case and a paragraph explaining the trade-offs.\n\n## Self-host with Docker\n\nA prebuilt image of the open-source branch is published to GitHub Container Registry. It bundles headless Chrome, LibreOffice, and CJK fonts, so you can run Reader without building it yourself.\n\n```bash\ndocker pull ghcr.io\u002Fjina-ai\u002Freader:oss\n```\n\n### Run\n\nThe image exposes two ports:\n\n- `8080` — **h2c** (HTTP\u002F2 cleartext). Production-grade, multiplexed; this is what Cloud Run talks to. Plain `curl` won't speak it without `--http2-prior-knowledge`.\n- `8081` — **HTTP\u002F1.1** fallback. Same handler, same routes; use this from anything that doesn't speak h2c.\n\nFor a quick try-out from `curl` or a browser, map the HTTP\u002F1.1 port:\n\n```bash\ndocker run --rm -p 3000:8081 ghcr.io\u002Fjina-ai\u002Freader:oss\n# then: curl http:\u002F\u002Flocalhost:3000\u002Fhttps:\u002F\u002Fexample.com\n```\n\nFor load-testing or production-shape traffic, map the h2c port instead (or both):\n\n```bash\ndocker run --rm -p 3000:8080 -p 3001:8081 ghcr.io\u002Fjina-ai\u002Freader:oss\n```\n\nWith no extra config the container is fully stateless — every request hits the live URL, no cache, no rate limiting. That's the right default for a quick try-out, CI, or throwaway environments.\n\n### Run with caching\n\nPoint Reader at an S3-compatible bucket to cache fetched pages and reuse them across requests:\n\n```bash\ndocker run --rm -p 3000:8081 \\\n  -e GCP_STORAGE_ENDPOINT=https:\u002F\u002Fs3.example.com \\\n  -e GCP_STORAGE_BUCKET=reader-cache \\\n  -e GCP_STORAGE_ACCESS_KEY=... \\\n  -e GCP_STORAGE_SECRET_KEY=... \\\n  ghcr.io\u002Fjina-ai\u002Freader:oss\n```\n\nSee [CONTRIBUTING.md](.\u002FCONTRIBUTING.md#environment-variables) for the full env-var table.\n\n## Local development\n\nRequirements:\n- nvm use\n- Docker *(optional — only if you want a local MinIO bucket cache)*\n\n```bash\ngit clone git@github.com:jina-ai\u002Freader.git\ncd reader\nnpm install\n# Optional, for bucket-cached mode:\ndocker compose up -d\n```\n\nThen either press `F5` in VSCode to launch the debugger, or after setting up the appropriate environment variables:\n\n```bash\nnpm run dev\n```\n\nFor a deeper tour of the codebase — engines, formatting profiles, abuse alleviation, deployment topology — see [architecture.md](.\u002Farchitecture.md). For dev workflow, env vars, and tests, see [CONTRIBUTING.md](.\u002FCONTRIBUTING.md).\n\n### Licensed assets\n\nA few non-redistributable artifacts live in `licensed\u002F` and are needed at build\u002Fruntime:\n\n- `GeoLite2-City.mmdb` and `geolite2-asn.mmdb` — MaxMind GeoLite databases (geolocation + ASN lookups).\n- `SourceHanSansSC-Regular.otf` — Source Han Sans (CJK rendering for PDFs\u002Fscreenshots).\n- `gsa_useragents.txt` — user-agent list used by the curl engine.\n\nFetch them in one shot:\n\n```bash\nnpm run assets:download\n```\n\nThe script (`download-external-assets.sh`) is idempotent — it skips files already present and exits 0 even on partial network failure. Set `FORCE_DOWNLOAD_EXTERNAL=1` to overwrite, or `SKIP_DOWNLOAD_EXTERNAL=1` to bypass entirely if you supply your own copies. The repo's CI fetches the same URLs inline; this script exists for local convenience.\n\n## How it works\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fjina-ai\u002Freader)\n\n## Having trouble on some websites?\n\nSome sites push back against scrapers — bot challenges, geo blocks, stale CDN edges. A few knobs to try, in roughly increasing order of \"this is bothering me\":\n\n- **Use an API key.** Anonymous traffic is the most aggressively rate-limited and lands in the lowest-trust pool. Authenticated requests get a higher quota and access to features like the internal proxy. Get one at [jina.ai\u002Freader](https:\u002F\u002Fjina.ai\u002Freader#pricing).\n- **Bypass the cache** with `-H 'x-no-cache: true'`. If a stale or already-blocked response got cached, this forces a fresh fetch.\n- **Force the browser engine** with `-H 'x-engine: browser'`. The default `auto` engine prefers the lightweight curl path when it can; some sites only serve real content to a JS-capable browser.\n- **Route through the SaaS proxy** with `-H 'x-proxy: auto'` (key required). Reader's hosted proxy pool rotates residential \u002F datacenter IPs and handles common anti-bot challenges automatically. You can also pin a country, e.g. `x-proxy: us` (see [Geo- and locale-sensitive scraping](.\u002Fcookbooks.md#geo--and-locale-sensitive-scraping)).\n- **Bring your own proxy** with `-H 'x-proxy-url: \u003Curl>'`. As a last resort — when even the hosted proxy can't get through — buy a residential or ISP-grade proxy from a third-party provider (BrightData, Thordata, Oxylabs, etc.) and pass the URL directly. Supports `http`, `https`, `socks4`, `socks5`; for auth use `https:\u002F\u002Fuser:pass@host:port`.\n\nIf none of those help, please open an issue with the URL and the headers you tried — we'll take a look.\n\n## License\n\nReader is backed by [Jina AI](https:\u002F\u002Fjina.ai) and licensed under [Apache-2.0](.\u002FLICENSE).\n","jina-ai\u002Freader 是一个将任意URL转换为适合大语言模型（LLM）输入的工具。其核心功能包括通过简单的前缀 `https:\u002F\u002Fr.jina.ai\u002F` 将网页内容转换为易于处理的格式，以及通过 `https:\u002F\u002Fs.jina.ai\u002F` 搜索网络以获取最新信息。技术上，该项目使用TypeScript编写，并支持PDF和MS Office文档直接上传解析。适用于需要提升基于LLM的应用程序或检索增强生成系统性能的场景，如构建更智能的聊天机器人或知识管理系统。此项目开源且免费，具有良好的稳定性和可扩展性，由Jina AI积极维护。",2,"2026-06-11 03:45:41","high_star"]