[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-4078":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":35,"readmeContent":36,"aiSummary":37,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":38,"discoverSource":39},4078,"jsoup","jhy\u002Fjsoup","jhy","jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.","https:\u002F\u002Fjsoup.org",null,"Java",11368,2290,391,3,0,2,5,16,7,45,"MIT License",false,"master",[26,27,28,29,30,31,5,32,33,34],"css","css-selectors","dom","html","java","java-html-parser","parser","xml","xpath","2026-06-12 02:00:58","# jsoup: Java HTML Parser\n\n**jsoup** is a Java library that makes it easy to work with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors.\n\n**jsoup** implements the [WHATWG HTML5](https:\u002F\u002Fhtml.spec.whatwg.org\u002Fmultipage\u002F) specification, and parses HTML to the same DOM as modern browsers.\n\n* scrape and [parse](https:\u002F\u002Fjsoup.org\u002Fcookbook\u002Finput\u002Fparse-document-from-string) HTML from a URL, file, or string\n* find and [extract data](https:\u002F\u002Fjsoup.org\u002Fcookbook\u002Fextracting-data\u002Fselector-syntax), using DOM traversal or CSS selectors\n* manipulate the [HTML elements](https:\u002F\u002Fjsoup.org\u002Fcookbook\u002Fmodifying-data\u002Fset-html), attributes, and text\n* [clean](https:\u002F\u002Fjsoup.org\u002Fcookbook\u002Fcleaning-html\u002Fsafelist-sanitizer) user-submitted content against a safe-list, to prevent XSS attacks\n* output tidy HTML\n\njsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.\n\nSee [**jsoup.org**](https:\u002F\u002Fjsoup.org\u002F) for downloads and the full [API documentation](https:\u002F\u002Fjsoup.org\u002Fapidocs\u002F).\n\n[![Build Status](https:\u002F\u002Fgithub.com\u002Fjhy\u002Fjsoup\u002Fworkflows\u002FBuild\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fjhy\u002Fjsoup\u002Factions?query=workflow%3ABuild)\n\n## Example\nFetch the [Wikipedia](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FMain_Page) homepage, parse it to a [DOM](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FDocument_Object_Model\u002FIntroduction), and select the headlines from the *In the News* section into a list of [Elements](https:\u002F\u002Fjsoup.org\u002Fapidocs\u002Forg\u002Fjsoup\u002Fselect\u002FElements.html):\n\n```java\nDocument doc = Jsoup.connect(\"https:\u002F\u002Fen.wikipedia.org\u002F\").get();\nlog(doc.title());\nElements newsHeadlines = doc.select(\"#mp-itn b a\");\nfor (Element headline : newsHeadlines) {\n  log(\"%s\\n\\t%s\", \n    headline.attr(\"title\"), headline.absUrl(\"href\"));\n}\n```\n[Online sample](https:\u002F\u002Ftry.jsoup.org\u002F~LGB7rk_atM2roavV0d-czMt3J_g), [full source](https:\u002F\u002Fgithub.com\u002Fjhy\u002Fjsoup\u002Fblob\u002Fmaster\u002Fsrc\u002Fmain\u002Fjava\u002Forg\u002Fjsoup\u002Fexamples\u002FWikipedia.java).\n\n## Open source\njsoup is an open source project distributed under the liberal [MIT license](https:\u002F\u002Fjsoup.org\u002Flicense). The source code is available on [GitHub](https:\u002F\u002Fgithub.com\u002Fjhy\u002Fjsoup).\n\n## Getting started\n1. [Download](https:\u002F\u002Fjsoup.org\u002Fdownload) the latest jsoup jar (or add it to your Maven\u002FGradle build)\n2. Read the [cookbook](https:\u002F\u002Fjsoup.org\u002Fcookbook\u002F)\n3. Enjoy!\n\n### Android support\nWhen used in Android projects, [core library desugaring](https:\u002F\u002Fdeveloper.android.com\u002Fstudio\u002Fwrite\u002Fjava8-support#library-desugaring) with the [NIO specification](https:\u002F\u002Fdeveloper.android.com\u002Fstudio\u002Fwrite\u002Fjava11-nio-support-table) should be enabled to support Java 8+ features.\n\n## Development and support\nIf you have any questions on how to use jsoup, or have ideas for future development, please get in touch via [jsoup Discussions](https:\u002F\u002Fgithub.com\u002Fjhy\u002Fjsoup\u002Fdiscussions).\n\nIf you find any issues, please file a [bug](https:\u002F\u002Fjsoup.org\u002Fbugs) after checking for duplicates.\n\nThe [colophon](https:\u002F\u002Fjsoup.org\u002Fcolophon) talks about the history of and tools used to build jsoup.\n\n## Status\njsoup is in general, stable release.\n\n## Author\njsoup was created and is maintained by [Jonathan Hedley](\u002F\u002Fjhedley.com), its primary author.\n\njsoup is an open-source project, and many contributors have helped improve it over the years. You can see their contributions and join the development on [GitHub](https:\u002F\u002Fgithub.com\u002Fjhy\u002Fjsoup\u002Fgraphs\u002Fcontributors).\n\n## Citing jsoup\nIf you use jsoup in research or technical documentation, you can cite it as:\n\n> **Jonathan Hedley & jsoup contributors. jsoup: Java HTML Parser (2009–present).** Available at: https:\u002F\u002Fjsoup.org\n\n```plaintext\n@misc{jsoup,\n  author = {Jonathan Hedley and jsoup contributors},\n  title = {jsoup: Java HTML Parser},\n  year = {2025},\n  url = {https:\u002F\u002Fjsoup.org}\n}\n```\n","jsoup 是一个用于处理真实世界 HTML 和 XML 的 Java 库。它提供了易于使用的 API，支持 URL 获取、数据解析、提取和使用 DOM API 方法、CSS 以及 XPath 选择器进行操作。该项目遵循 WHATWG HTML5 规范，并且能够像现代浏览器一样解析 HTML。核心功能包括从 URL、文件或字符串中抓取和解析 HTML，通过 DOM 遍历或 CSS 选择器查找并提取数据，修改 HTML 元素、属性和文本内容，以及通过安全列表清理用户提交的内容以防止 XSS 攻击。jsoup 适用于需要对网页进行爬虫抓取、内容清洗及安全性处理的场景，如 Web 开发、数据分析等。","2026-06-11 02:58:17","top_language"]