[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-7997":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":25,"discoverSource":26},7997,"html-pipeline","gjtorikian\u002Fhtml-pipeline","gjtorikian","HTML processing filters and utilities",null,"Ruby",2328,385,69,1,0,2,29.76,"MIT License",false,"main",[],"2026-06-12 02:01:47","# HTML-Pipeline\n\nHTML processing filters and utilities. This module is a small\nframework for defining CSS-based content filters and applying them to user\nprovided content.\n\n[Although this project was started at GitHub](https:\u002F\u002Fgithub.com\u002Fblog\u002F1311-html-pipeline-chainable-content-filters), they no longer use it. This gem must be considered standalone and independent from GitHub.\n\n- [HTML-Pipeline](#html-pipeline)\n  - [Installation](#installation)\n  - [Usage](#usage)\n    - [More Examples](#more-examples)\n  - [Filters](#filters)\n    - [TextFilters](#textfilters)\n    - [ConvertFilter](#convertfilter)\n    - [Sanitization](#sanitization)\n    - [NodeFilters](#nodefilters)\n  - [Dependencies](#dependencies)\n  - [Documentation](#documentation)\n  - [Instrumenting](#instrumenting)\n  - [Third Party Extensions](#third-party-extensions)\n  - [FAQ](#faq)\n    - [1. Why doesn't my pipeline work when there's no root element in the document?](#1-why-doesnt-my-pipeline-work-when-theres-no-root-element-in-the-document)\n    - [2. How do I customize an allowlist for `SanitizationFilter`s?](#2-how-do-i-customize-an-allowlist-for-sanitizationfilters)\n    - [Contributors](#contributors)\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'html-pipeline'\n```\n\nAnd then execute:\n\n```sh\n$ bundle\n```\n\nOr install it by yourself as:\n\n```sh\n$ gem install html-pipeline\n```\n\n## Usage\n\nThis library provides a handful of chainable HTML filters to transform user\ncontent into HTML markup. Each filter does some work, and then hands off the\nresults tothe next filter. A pipeline has several kinds of filters available to use:\n\n- Multiple `TextFilter`s, which operate a UTF-8 string\n- A `ConvertFilter` filter, which turns text into HTML (eg., Commonmark\u002FAsciidoc -> HTML)\n- A `SanitizationFilter`, which remove dangerous\u002Funwanted HTML elements and attributes\n- Multiple `NodeFilter`s, which operate on a UTF-8 HTML document\n\nYou can assemble each sequence into a single pipeline, or choose to call each filter individually.\n\nAs an example, suppose we want to transform Commonmark source text into Markdown HTML:\n\n```\nHey there, @gjtorikian\n```\n\nWith the content, we also want to:\n\n- change every instance of `Hey` to `Hello`\n- strip undesired HTML\n- linkify @mention\n\nWe can construct a pipeline to do all that like this:\n\n```ruby\nrequire 'html_pipeline'\n\nclass HelloJohnnyFilter \u003C HTMLPipelineFilter\n  def call\n    text.gsub(\"Hey\", \"Hello\")\n  end\nend\n\npipeline = HTMLPipeline.new(\n  text_filters: [HelloJohnnyFilter.new]\n  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,\n    # note: next line is not needed as sanitization occurs by default;\n    # see below for more info\n  sanitization_config: HTMLPipeline::SanitizationFilter::DEFAULT_CONFIG,\n  node_filters: [HTMLPipeline::NodeFilter::MentionFilter.new]\n)\npipeline.call(user_supplied_text) # recommended: can call pipeline over and over\n```\n\nFilters can be custom ones you create (like `HelloJohnnyFilter`), and `HTMLPipeline` additionally provides several helpful ones (detailed below). If you only need a single filter, you can call one individually, too:\n\n```ruby\nfilter = HTMLPipeline::ConvertFilter::MarkdownFilter.new\nfilter.call(text)\n```\n\nFilters combine into a sequential pipeline, and each filter hands its\noutput to the next filter's input. Text filters are\nprocessed first, then the convert filter, sanitization filter, and finally, the node filters.\n\nSome filters take optional `context` and\u002For `result` hash(es). These are\nused to pass around arguments and metadata between filters in a pipeline. For\nexample, if you want to disable footnotes in the `MarkdownFilter`, you can pass an option in the context hash:\n\n```ruby\ncontext = { markdown: { extensions: { footnotes: false } } }\nfilter = HTMLPipeline::ConvertFilter::MarkdownFilter.new(context: context)\nfilter.call(\"Hi **world**!\")\n```\n\nAlternatively, you can construct a pipeline, and pass in a context during the call:\n\n```ruby\npipeline = HTMLPipeline.new(\n  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,\n  node_filters: [HTMLPipeline::NodeFilter::MentionFilter.new]\n)\npipeline.call(user_supplied_text, context: { markdown: { extensions: { footnotes: false } } })\n```\n\nPlease refer to the documentation for each filter to understand what configuration options are available.\n\n### More Examples\n\nDifferent pipelines can be defined for different parts of an app. Here are a few\nparaphrased snippets to get you started:\n\n```ruby\n# The context hash is how you pass options between different filters.\n# See individual filter source for explanation of options.\ncontext = {\n  asset_root: \"http:\u002F\u002Fyour-domain.com\u002Fwhere\u002Fyour\u002Fimages\u002Flive\u002Ficons\",\n  base_url: \"http:\u002F\u002Fyour-domain.com\"\n}\n\n# Pipeline used for user provided content on the web\nMarkdownPipeline = HTMLPipeline.new (\n  text_filters: [HTMLPipeline::TextFilter::ImageFilter.new],\n  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,\n  node_filters: [\n    HTMLPipeline::NodeFilter::HttpsFilter.new,HTMLPipeline::NodeFilter::MentionFilter.new,\n  ], context: context)\n\n# Pipelines aren't limited to the web. You can use them for email\n# processing also.\nHtmlEmailPipeline = HTMLPipeline.new(\n  text_filters: [\n    PlainTextInputFilter.new,\n    ImageFilter.new\n  ], {})\n```\n\n## Filters\n\n### TextFilters\n\n`TextFilter`s must define a method named `call` which is called on the text. `@text`, `@config`, and `@result` are available to use, and any changes made to these ivars are passed on to the next filter.\n\n- `ImageFilter` - converts image `url` into `\u003Cimg>` tag\n- `PlainTextInputFilter` - html escape text and wrap the result in a `\u003Cdiv>`\n\n### ConvertFilter\n\nThe `ConvertFilter` takes text and turns it into HTML. `@text`, `@config`, and `@result` are available to use. `ConvertFilter` must defined a method named `call`, taking one argument, `text`. `call` must return a string representing the new HTML document.\n\n- `MarkdownFilter` - creates HTML from text using [Commonmarker](https:\u002F\u002Fwww.github.com\u002Fgjtorikian\u002Fcommonmarker)\n\n### Sanitization\n\nBecause the web can be a scary place, **HTML is automatically sanitized** after the `ConvertFilter` runs and before the `NodeFilter`s are processed. This is to prevent malicious or unexpected input from entering the pipeline.\n\nThe sanitization process takes a hash configuration of settings. See the [Selma](https:\u002F\u002Fwww.github.com\u002Fgjtorikian\u002Fselma) documentation for more information on how to configure these settings. Note that users must correctly configure the sanitization configuration if they expect to use it correctly in conjunction with handlers which manipulate HTML.\n\nA default sanitization config is provided by this library (`HTMLPipeline::SanitizationFilter::DEFAULT_CONFIG`). A sample custom sanitization allowlist might look like this:\n\n```ruby\nALLOWLIST = {\n  elements: [\"p\", \"pre\", \"code\"]\n}\n\npipeline = HTMLPipeline.new \\\n  text_filters: [\n    HTMLPipeline::TextFilter::ImageFilter.new,\n  ],\n  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,\n  sanitization_config: ALLOWLIST\n\nresult = pipeline.call \u003C\u003C-CODE\nThis is *great*:\n\n    some_code(:first)\n\nCODE\nresult[:output].to_s\n```\n\nThis would print:\n\n```html\n\u003Cp>This is great:\u003C\u002Fp>\n\u003Cpre>\u003Ccode>some_code(:first)\n\u003C\u002Fcode>\u003C\u002Fpre>\n```\n\nSanitization can be disabled if and only if `nil` is explicitly passed as\nthe config:\n\n```ruby\npipeline = HTMLPipeline.new \\\n  text_filters: [\n    HTMLPipeline::TextFilter::ImageFilter.new,\n  ],\n  convert_filter: HTMLPipeline::ConvertFilter::MarkdownFilter.new,\n  sanitization_config: nil\n```\n\nFor more examples of customizing the sanitization process to include the tags you want, check out [the tests](test\u002Fsanitization_filter_test.rb) and [the FAQ](#faq).\n\n### NodeFilters\n\n`NodeFilters`s can operate either on HTML elements or text nodes using CSS selectors. Each `NodeFilter` must define a method named `selector` which provides an instance of `Selma::Selector`. If elements are being manipulated, `handle_element` must be defined, taking one argument, `element`; if text nodes are being manipulated, `handle_text_chunk` must be defined, taking one argument, `text_chunk`. `@config`, and `@result` are available to use, and any changes made to these ivars are passed on to the next filter.\n\n`NodeFilter` also has an optional method, `after_initialize`, which is run after the filter initializes. This can be useful in setting up a fresh custom state for `result` to start from each time the pipeline is called.\n\nHere's an example `NodeFilter` that adds a base url to images that are root relative:\n\n```ruby\nrequire 'uri'\n\nclass RootRelativeFilter \u003C HTMLPipeline::NodeFilter\n\n  SELECTOR = Selma::Selector.new(match_element: \"img\")\n\n  def selector\n    SELECTOR\n  end\n\n  def handle_element(img)\n    next if img['src'].nil?\n    src = img['src'].strip\n    if src.start_with? '\u002F'\n      img[\"src\"] = URI.join(context[:base_url], src).to_s\n    end\n  end\nend\n```\n\nFor more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https:\u002F\u002Fwww.github.com\u002Fgjtorikian\u002Fselma) for more information.\n\n- `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions\n- `AssetProxyFilter`: replace image links with an encoded link to an asset server\n- `EmojiFilter`: converts `:\u003Cemoji>:` to [emoji](http:\u002F\u002Fwww.emoji-cheat-sheet.com\u002F)\n  - (Note: the included `MarkdownFilter` will already convert emoji)\n- `HttpsFilter`: Replacing http urls with https versions\n- `ImageMaxWidthFilter`: link to full size image for large images\n- `MentionFilter`: replace `@user` mentions with links\n- `SanitizationFilter`: allow sanitize user markup\n- `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks\n  - (Note: the included `MarkdownFilter` will already apply highlighting)\n- `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings\n- `TeamMentionFilter`: replace `@org\u002Fteam` mentions with links\n\n## Dependencies\n\nSince filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.\n\nFor example, `SyntaxHighlightFilter` uses [rouge](https:\u002F\u002Fgithub.com\u002Fjneen\u002Frouge)\nto detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:\n\n```ruby\ngem \"rouge\"\n```\n\n> **Note**\n> See the [Gemfile](\u002FGemfile) `:test` group for any version requirements.\n\nWhen developing a custom filter, call `HTMLPipeline.require_dependency` at the start to ensure that the local machine has the necessary dependency. You can also use `HTMLPipeline.require_dependencies` to provide a list of dependencies to check.\n\nOn a similar note, you must manually require whichever filters you desire:\n\n```ruby\nrequire \"html_pipeline\" # must be included\nrequire \"html_pipeline\u002Fconvert_filter\u002Fmarkdown_filter\" # included because you want to use this filter\nrequire \"html_pipeline\u002Fnode_filter\u002Fmention_filter\" # included because you want to use this filter\n```\n\n## Documentation\n\nFull reference documentation can be [found here](http:\u002F\u002Frubydoc.info\u002Fgems\u002Fhtml-pipeline\u002Fframes).\n\n## Instrumenting\n\nFilters and Pipelines can be set up to be instrumented when called. The pipeline\nmust be setup with an\n[ActiveSupport::Notifications](http:\u002F\u002Fapi.rubyonrails.org\u002Fclasses\u002FActiveSupport\u002FNotifications.html)\ncompatible service object and a name. New pipeline objects will default to the\n`HTMLPipeline.default_instrumentation_service` object.\n\n```ruby\n# the AS::Notifications-compatible service object\nservice = ActiveSupport::Notifications\n\n# instrument a specific pipeline\npipeline = HTMLPipeline.new [MarkdownFilter], context\npipeline.setup_instrumentation \"MarkdownPipeline\", service\n\n# or set default instrumentation service for all new pipelines\nHTMLPipeline.default_instrumentation_service = service\npipeline = HTMLPipeline.new [MarkdownFilter], context\npipeline.setup_instrumentation \"MarkdownPipeline\"\n```\n\nFilters are instrumented when they are run through the pipeline. A\n`call_filter.html_pipeline` event is published once any filter finishes; `call_text_filters`\nand `call_node_filters` is published when all of the text and node filters are finished, respectively.\nThe `payload` should include the `filter` name. Each filter will trigger its own\ninstrumentation call.\n\n```ruby\nservice.subscribe \"call_filter.html_pipeline\" do |event, start, ending, transaction_id, payload|\n  payload[:pipeline] #=> \"MarkdownPipeline\", set with `setup_instrumentation`\n  payload[:filter] #=> \"MarkdownFilter\"\n  payload[:context] #=> context Hash\n  payload[:result] #=> instance of result class\n  payload[:result][:output] #=> output HTML String\nend\n```\n\nThe full pipeline is also instrumented:\n\n```ruby\nservice.subscribe \"call_text_filters.html_pipeline\" do |event, start, ending, transaction_id, payload|\n  payload[:pipeline] #=> \"MarkdownPipeline\", set with `setup_instrumentation`\n  payload[:filters] #=> [\"MarkdownFilter\"]\n  payload[:doc] #=> HTML String\n  payload[:context] #=> context Hash\n  payload[:result] #=> instance of result class\n  payload[:result][:output] #=> output HTML String\nend\n```\n\n## FAQ\n\n### 1. Why doesn't my pipeline work when there's no root element in the document?\n\nTo make a pipeline work on a plain text document, put the `PlainTextInputFilter`\nat the end of your `text_filter`s config . This will wrap the content in a `div` so the filters have a root element to work with. If you're passing in an HTML fragment,\nbut it doesn't have a root element, you can wrap the content in a `div`\nyourself.\n\n### 2. How do I customize an allowlist for `SanitizationFilter`s?\n\n`HTMLPipeline::SanitizationFilter::ALLOWLIST` is the default allowlist used if no `sanitization_config`\nargument is given. The default is a good starting template for\nyou to add additional elements. You can either modify the constant's value, or\nre-define your own config and pass that in, such as:\n\n```ruby\nconfig = HTMLPipeline::SanitizationFilter::DEFAULT_CONFIG.deep_dup\nconfig[:elements] \u003C\u003C \"iframe\" # sure, whatever you want\n```\n\n### Contributors\n\nThanks to all of [these contributors](https:\u002F\u002Fgithub.com\u002Fgjtorikian\u002Fhtml-pipeline\u002Fgraphs\u002Fcontributors).\n\nThis project is a member of the [OSS Manifesto](http:\u002F\u002Fossmanifesto.org\u002F).\n","HTML-Pipeline 是一个用于处理 HTML 内容的 Ruby 库，提供了一系列可链接的内容过滤器。其核心功能包括文本转换、格式转换（如 Commonmark\u002FAsciidoc 转换为 HTML）、内容净化以及节点操作等。通过这些过滤器，用户可以轻松地对输入内容进行一系列预定义或自定义的处理步骤，从而生成安全且符合需求的 HTML 输出。该项目适用于需要对用户提交的富文本内容进行格式化、转换和净化的各种 Web 应用场景，特别是那些重视内容安全性和一致性的平台。","2026-06-11 03:15:31","top_language"]