[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-6844":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":15,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":22,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},6844,"SwiftSoup","scinfu\u002FSwiftSoup","scinfu","SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)","https:\u002F\u002Fscinfu.github.io\u002FSwiftSoup\u002F",null,"Swift",5083,388,54,1,0,6,38.77,"MIT License",false,"master",true,[24,25,26,27,28,29,30,31],"dom","extract","html","html-document","parse","selector","swift","swiftsoup","2026-06-12 02:01:30","\u003Cp align=\"center\" >\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fscinfu\u002FSwiftSoup\u002Fmaster\u002Fswiftsoup.png\" alt=\"SwiftSoup\" title=\"SwiftSoup\">\n\u003C\u002Fp>\n\n![Platform OS X | iOS | tvOS | watchOS | Linux](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatform-Linux%20%7C%20OS%20X%20%7C%20iOS%20%7C%20tvOS%20%7C%20watchOS-orange.svg)\n[![SPM compatible](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSPM-compatible-4BC51D.svg?style=flat)](https:\u002F\u002Fgithub.com\u002Fapple\u002Fswift-package-manager)\n![🐧 linux: ready](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%90%A7%20linux-ready-red.svg)\n![Carthage compatible](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCarthage-compatible-4BC51D.svg?style=flat)\n[![Build Status](https:\u002F\u002Ftravis-ci.org\u002Fscinfu\u002FSwiftSoup.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002Fscinfu\u002FSwiftSoup)\n[![Version](https:\u002F\u002Fimg.shields.io\u002Fcocoapods\u002Fv\u002FSwiftSoup.svg?style=flat)](http:\u002F\u002Fcocoapods.org\u002Fpods\u002FSwiftSoup)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fcocoapods\u002Fl\u002FSwiftSoup.svg?style=flat)](http:\u002F\u002Fcocoapods.org\u002Fpods\u002FSwiftSoup)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftwitter-@scinfu-blue.svg?style=flat)](http:\u002F\u002Ftwitter.com\u002Fscinfu)\n\n---\n\nSwiftSoup is a pure Swift library designed for seamless HTML parsing and manipulation across multiple platforms, including macOS, iOS, tvOS, watchOS, and Linux. It offers an intuitive API that leverages the best aspects of DOM traversal, CSS selectors, and jQuery-like methods for effortless data extraction and transformation. Built to conform to the **WHATWG HTML5 specification**, SwiftSoup ensures that parsed HTML is structured just like modern browsers do.\n\n### Key Features:\n- **Parse and scrape** HTML from a URL, file, or string.\n- **Find and extract** data using DOM traversal or CSS selectors.\n- **Modify HTML** elements, attributes, and text dynamically.\n- **Sanitize user-submitted content** using a safe whitelist to prevent XSS attacks.\n- **Generate clean and well-structured HTML** output.\n\nSwiftSoup is designed to handle all types of HTML—whether perfectly structured or messy tag soup—ensuring a logical and reliable parse tree in every scenario.\n\n---\n\n## Swift\nSwift 5 ```>=2.0.0```\n\nSwift 4.2 ```1.7.4```\n\n## Installation\n\n### Cocoapods\nSwiftSoup is available through [CocoaPods](http:\u002F\u002Fcocoapods.org). To install\nit, simply add the following line to your Podfile:\n\n```ruby\npod 'SwiftSoup'\n```\n### Carthage\nSwiftSoup is also available through [Carthage](https:\u002F\u002Fgithub.com\u002FCarthage\u002FCarthage). To install\nit, simply add the following line to your Cartfile:\n\n```ruby\ngithub \"scinfu\u002FSwiftSoup\"\n```\n### Swift Package Manager\nSwiftSoup is also available through [Swift Package Manager](https:\u002F\u002Fgithub.com\u002Fapple\u002Fswift-package-manager).\nTo install it, simply add the dependency to your Package.Swift file:\n\n```swift\n...\ndependencies: [\n    .package(url: \"https:\u002F\u002Fgithub.com\u002Fscinfu\u002FSwiftSoup.git\", from: \"2.6.0\"),\n],\ntargets: [\n    .target( name: \"YourTarget\", dependencies: [\"SwiftSoup\"]),\n]\n...\n```\n---\n## Usage Examples\n\n### Parse an HTML Document\n\n```swift\nimport SwiftSoup\n\nlet html = \"\"\"\n\u003Chtml>\u003Chead>\u003Ctitle>Example\u003C\u002Ftitle>\u003C\u002Fhead>\n\u003Cbody>\u003Cp>Hello, SwiftSoup!\u003C\u002Fp>\u003C\u002Fbody>\u003C\u002Fhtml>\n\"\"\"\n\nlet document: Document = try SwiftSoup.parse(html)\nprint(try document.title()) \u002F\u002F Output: Example\n```\n\n---\n### Automatic Format Detection\n\n`SwiftSoup.parse(...)` automatically detects XML input by looking for an `\u003C?xml` declaration at the start of the\ncontent. When detected, the XML parser is used; otherwise the HTML parser is applied. This means feeds, OPML, and\nother XML documents with a standard XML declaration \"just work\":\n\n```swift\nimport SwiftSoup\n\nlet xml = \"\"\"\n\u003C?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\u003Copml version=\"1.0\">\n  \u003Cbody>\n    \u003Clink>I'm link\u003C\u002Flink>\n    \u003Cimg>I'm img\u003C\u002Fimg>\n  \u003C\u002Fbody>\n\u003C\u002Fopml>\n\"\"\"\n\nlet document = try SwiftSoup.parse(xml) \u002F\u002F auto-detects XML\nprint(try document.select(\"link\").first()?.text()) \u002F\u002F Output: I'm link\nprint(try document.select(\"body > img\").first()?.text()) \u002F\u002F Output: I'm img\n```\n\n### Explicit Parse Modes\n\nUse `parseXML(...)` or `parseHTML(...)` when you want to force a specific parser regardless of the content:\n\n```swift\n\u002F\u002F Force XML parsing (no HTML5 tag normalization)\nlet xmlDoc = try SwiftSoup.parseXML(xmlString)\n\n\u002F\u002F Force HTML parsing (always applies HTML5 rules, even if input has \u003C?xml>)\nlet htmlDoc = try SwiftSoup.parseHTML(htmlString)\n\n\u002F\u002F Explicit parser argument (unchanged from before)\nlet doc = try SwiftSoup.parse(input, baseUri, Parser.xmlParser())\n```\n\n---\n### Parse HTML from a URL\n\nIf Foundation cannot determine a page's text encoding, avoid `String(contentsOf:)` and parse the raw response bytes\ninstead:\n\n```swift\nimport SwiftSoup\n\nlet url = URL(string: \"https:\u002F\u002Fexample.com\")!\nlet document = try SwiftSoup.parse(url)\nprint(try document.title())\n```\n\n---\n## Profiling\n\nSwiftSoup includes a lightweight profiler (gated by a compile-time flag) and a small CLI harness for parsing benchmarks.\n\n### CLI parse benchmark\nThis uses the `SwiftSoupProfile` executable target to parse a fixture corpus and report wall time:\n\n```bash\nswift run -c release SwiftSoupProfile --fixtures \u002Fpath\u002Fto\u002Ffixtures\n```\n\nAdd `--text` to include `Document.text()` in the workload.\n\n### In-code profiler\nThe `Profiler` type is only compiled when the `PROFILE` flag is set. Build with:\n\n```bash\nswift run -c release -Xswiftc -DPROFILE SwiftSoupProfile --fixtures \u002Fpath\u002Fto\u002Ffixtures\n```\n\nThen the CLI will print the profiler summary at the end of the run.\n\n---\n\n### Select Elements with CSS Query\n\n```swift\nlet html = \"\"\"\n\u003Chtml>\u003Cbody>\n\u003Cp class='message'>SwiftSoup is powerful!\u003C\u002Fp>\n\u003Cp class='message'>Parsing HTML in Swift\u003C\u002Fp>\n\u003C\u002Fbody>\u003C\u002Fhtml>\n\"\"\"\n\nlet document = try SwiftSoup.parse(html)\nlet messages = try document.select(\"p.message\")\n\nfor message in messages {\n    print(try message.text())\n}\n\u002F\u002F Output:\n\u002F\u002F SwiftSoup is powerful!\n\u002F\u002F Parsing HTML in Swift\n```\n\n---\n\n### Extract Text and Attributes\n\n```swift\nlet html = \"\u003Ca href='https:\u002F\u002Fexample.com'>Visit the site\u003C\u002Fa>\"\nlet document = try SwiftSoup.parse(html)\nlet link = try document.select(\"a\").first()\n\nif let link = link {\n    print(try link.text()) \u002F\u002F Output: Visit the site\n    print(try link.attr(\"href\")) \u002F\u002F Output: https:\u002F\u002Fexample.com\n}\n```\n\n---\n\n### Modify the DOM\n\n```swift\nvar document = try SwiftSoup.parse(\"\u003Cdiv id='content'>\u003C\u002Fdiv>\")\nlet div = try document.select(\"#content\").first()\ntry div?.append(\"\u003Cp>New content added!\u003C\u002Fp>\")\nprint(try document.html())\n\u002F\u002F Output:\n\u002F\u002F \u003Chtml>\u003Chead>\u003C\u002Fhead>\u003Cbody>\u003Cdiv id=\"content\">\u003Cp>New content added!\u003C\u002Fp>\u003C\u002Fdiv>\u003C\u002Fbody>\u003C\u002Fhtml>\n```\n\n---\n\n### Clean HTML for Security (Whitelist)\n\n```swift\nlet dirtyHtml = \"\u003Cscript>alert('Hacked!')\u003C\u002Fscript>\u003Cb>Important text\u003C\u002Fb>\"\nlet cleanHtml = try SwiftSoup.clean(dirtyHtml, Whitelist.basic())\nprint(cleanHtml) \u002F\u002F Output: \u003Cb>Important text\u003C\u002Fb>\n```\n\n```swift\nlet dirtyHtml = #\"\u003Cp style=\"color:red; position:absolute\">Styled text\u003C\u002Fp>\"#\nlet whitelist = try Whitelist()\n    .addTags(\"p\")\n    .addAttributes(\"p\", \"style\")\n    .addCSSProperties(\"p\", \"color\")\nlet cleanHtml = try SwiftSoup.clean(dirtyHtml, whitelist)\nprint(cleanHtml) \u002F\u002F Output: \u003Cp style=\"color:red\">Styled text\u003C\u002Fp>\n```\n\n---\n### Use CSS selectors to find elements\n(from [jsoup](https:\u002F\u002Fjsoup.org\u002Fcookbook\u002Fextracting-data\u002Fselector-syntax))\n\n#### Selector overview\n\n- `tagname`: find elements by tag, e.g. `div`\n- `#id`: find elements by ID, e.g. `#logo`\n- `.class`: find elements by class name, e.g. `.masthead`\n- `[attribute]`: elements with attribute, e.g. `[href]`\n- `[^attrPrefix]`: elements with an attribute name prefix, e.g. `[^data-]` finds elements with HTML5 dataset attributes\n- `[attr=value]`: elements with attribute value, e.g. `[width=500]` (also quotable, like `[data-name='launch sequence']`)\n- `[attr^=value]`, `[attr$=value]`, `[attr*=value]`: elements with attributes that start with, end with, or contain the value, e.g. `[href*=\u002Fpath\u002F]`\n- `[attr~=regex]`: elements with attribute values that match the regular expression; e.g. `img[src~=(?i)\\.(png|jpe?g)]`\n- `*`: all elements, e.g. `*`\n- `[*]` selects elements that have any attribute. e.g. `p[*]` finds paragraphs with at least one attribute, and `p:not([*])` finds those with no attributes.\n- `ns|tag`: find elements by tag in a namespace prefix, e.g. `dc|name` finds `\u003Cdc:name>` elements\n- `*|tag`: find elements by tag in any namespace prefix, e.g. `*|name` finds `\u003Cdc:name>` and `\u003Cname>` elements\n- `:empty`: selects elements that have no children (ignoring blank text nodes, comments, etc.); e.g. `li:empty`\n\n#### Selector combinations\n\n- `el#id`: elements with ID, e.g. `div#logo`\n- `el.class`: elements with class, e.g. `div.masthead`\n- `el[attr]`: elements with attribute, e.g. `a[href]`\n- Any combination, e.g. `a[href].highlight`\n- `ancestor child`: child elements that descend from ancestor, e.g. `.body p` finds `p` elements anywhere under a block with class \"body\"\n- `parent > child`: child elements that descend directly from parent, e.g. `div.content > p` finds `p` elements; and `body > *` finds the direct children of the body tag\n- `siblingA + siblingB`: finds sibling B element immediately preceded by sibling A, e.g. `div.head + div`\n- `siblingA ~ siblingX`: finds sibling X element preceded by sibling A, e.g. `h1 ~ p`\n- `el, el, el`: group multiple selectors, find unique elements that match any of the selectors; e.g. `div.masthead, div.logo`\n\n#### Pseudo selectors\n\n- `:has(selector)`: find elements that contain elements matching the selector; e.g. `div:has(p)`\n- `:is(selector)`: find elements that match any of the selectors in the selector list; e.g. `:is(h1, h2, h3, h4, h5, h6)` finds any heading element\n- `:not(selector)`: find elements that do not match the selector; e.g. `div:not(.logo)`\n- `:lt(n)`: find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than `n`; e.g. `td:lt(3)`\n- `:gt(n)`: find elements whose sibling index is greater than `n`; e.g. `div p:gt(2)`\n- `:eq(n)`: find elements whose sibling index is equal to `n`; e.g. `form input:eq(1)`\n- Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc\n\n#### Text content pseudo selectors\n\n- `:contains(text)`: find elements that contain (directly or via children) the given normalized text. The search is case-insensitive; e.g. `div:contains(jsoup)`\n- `:containsOwn(text)`: find elements whose own text directly contains the given text. e.g. `p:containsOwn(jsoup)`\n- `:containsData(text)`: selects elements that contain the specified data (e.g. within `\u003Cscript>`, `\u003Cstyle>`, or comments); e.g. `script:containsData(jsoup)`\n- `:containsWholeText(text)`: selects elements that contain the exact, non-normalized whole text (case sensitive, preserving whitespace\u002Fnewlines); e.g. `p:containsWholeText(jsoup The Java HTML Parser)`\n- `:containsWholeOwnText(text)`: selects elements whose own text exactly matches the given non-normalized text (case sensitive); e.g. `p:containsWholeOwnText(jsoup The Java HTML Parser)`\n- `:matches(regex)`: find elements whose text matches the specified regular expression; e.g. `div:matches((?i)login)`\n- `:matchesOwn(regex)`: find elements whose own text matches the specified regular expression\n- `:matchesWholeText(regex)`: selects elements whose entire, non-normalized text matches the specified regex; e.g. `div:matchesWholeText(\\d{3}-\\d{2}-\\d{4})`\n- `:matchesWholeOwnText(regex)`: selects elements whose own non-normalized text matches the regex; e.g. `span:matchesWholeOwnText(\\w+)`\n\n#### Structural pseudo selectors\n\n- `:root`: selects the root element of the document (in HTML, the `\u003Chtml>` element); e.g. `:root`\n- `:nth-child(an+b)`: selects elements with an+b–1 preceding siblings; supports expressions like `2n+1` for odd elements; e.g. `tr:nth-child(2n+1)`\n- `:nth-last-child(an+b)`: selects elements with an+b–1 following siblings; e.g. `tr:nth-last-child(-n+2)`\n- `:nth-of-type(an+b)`: selects elements based on their position among siblings of the same type; e.g. `img:nth-of-type(2n+1)`\n- `:nth-last-of-type(an+b)`: selects elements based on their position among siblings of the same type, counting from the end; e.g. `img:nth-last-of-type(2n+1)`\n- `:first-child`: selects elements that are the first child of their parent; e.g. `div > p:first-child`\n- `:last-child`: selects elements that are the last child of their parent; e.g. `ol > li:last-child`\n- `:first-of-type`: selects the first element of its type among its siblings; e.g. `dl dt:first-of-type`\n- `:last-of-type`: selects the last element of its type among its siblings; e.g. `tr > td:last-of-type`\n- `:only-child`: selects elements that are the only child of their parent; e.g. `div:only-child`\n- `:only-of-type`: selects elements that are the only element of their type among their siblings; e.g. `span:only-of-type`\n\n#### Optimize repeated queries\n\nSwiftSoup provides automatic caching of parsed CSS queries to speed up repeated queries, and also to speed up parsing related queries.\n\nThe cache is controlled through the static property `QueryParser.cache`. By default, it is initialized with a reasonable size limit.\nYou may replace the cache at any time; however, assigning a new cache instance will discard all previously cached values.\n\n```swift\n\u002F\u002F Remove any cache limits.\nQueryParser.cache = QueryParser.DefaultCache(limit: .unlimited)\n\u002F\u002F Limit to 1000 items. See also documentation for ``QueryParserCache\u002Fset(_:_:)``.\nQueryParser.cache = QueryParser.DefaultCache(limit: .count(1000))\n```\n\nAn alternative is to parse the query upfront and passing an `Evaluator` instead of query string.\nSince `Evaluator` instances are immutable they are safe to store in (static) properties or pass across isolation boundaries. \n\n```swift\nlet elements: Elements = …\nlet eval = try QueryParser.parse(\"div > p\")\nfor element in elements {\n    print(try element.select(eval).text())\n}\n```\n\n---\n\n## Author\n\nNabil Chatbi, scinfu@gmail.com\n\nCurrent maintainer: Alex Ehlke, available for hire for SwiftSoup related work or other iOS projects: alex dot ehlke at gmail\n\n## Note\nSwiftSoup was ported to Swift from Java [Jsoup](https:\u002F\u002Fjsoup.org\u002F) library.\n\n## License\n\nSwiftSoup is available under the MIT license. See the [LICENSE](https:\u002F\u002Fgithub.com\u002Fscinfu\u002FSwiftSoup\u002Fblob\u002Fmaster\u002FLICENSE) file for more info.\n","SwiftSoup 是一个纯 Swift 编写的 HTML 解析库，支持 macOS、iOS、tvOS、watchOS 和 Linux 等多个平台。它提供了直观的 API，结合了 DOM 遍历、CSS 选择器和类似 jQuery 的方法，使得数据提取和转换变得轻松高效。SwiftSoup 符合 WHATWG HTML5 规范，确保解析后的 HTML 结构与现代浏览器一致。其核心功能包括从 URL、文件或字符串中解析和抓取 HTML、使用 DOM 或 CSS 选择器查找和提取数据、动态修改 HTML 元素及其属性、通过安全白名单净化用户提交的内容以防止 XSS 攻击，以及生成整洁且结构良好的 HTML 输出。该工具适用于需要处理各种 HTML 内容（无论是结构良好还是混乱）的应用场景，特别适合开发跨平台应用时进行网页内容的解析和处理。",2,"2026-06-11 03:09:10","top_language"]