[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73221":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":15,"starSnapshotCount":15,"syncStatus":33,"lastSyncTime":34,"discoverSource":35},73221,"pipet","bjesus\u002Fpipet","bjesus","Swiss-army tool for scraping and extracting data from online assets, made for hackers ","",null,"Go",4693,215,20,0,1,30,29,"MIT License",false,"main",[23,24,25,26,27,28,29],"css","curl","gjson","json","playwright","scraper","scraping","2026-06-12 02:03:10","\u003Ch1 align=\"center\">\nPipet\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Fbjesus\u002Fpipet\">\u003Cimg src=\"https:\u002F\u002Fgoreportcard.com\u002Fbadge\u002Fgithub.com\u002Fbjesus\u002Fpipet\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fbjesus\u002Fpipet\">\u003Cimg src=\"https:\u002F\u002Fpkg.go.dev\u002Fbadge\u002Fgithub.com\u002Fbjesus\u002Fpipet.svg\" alt=\"Go Reference\">\u003C\u002Fa>\n  \u003Cbr\u002F>\na swiss-army tool for scraping and extracting data from online assets, made for hackers\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe23a40de-c391-46a5-a30c-b825cc02ee8a\" height=\"200\">\n\u003C\u002Fp>\n\nPipet is a command line based web scraper. It supports 3 modes of operation - HTML parsing, JSON parsing, and client-side JavaScript evaluation. It relies heavily on existing tools like curl, and it uses unix pipes for extending its built-in capabilities.\n\nYou can use Pipet to track a shipment, get notified when concert tickets are available, stock price changes, and any other kind of information that appears online.\n\n# Try it out!\n1. Create a `hackernews.pipet` file containing this:\n```\ncurl https:\u002F\u002Fnews.ycombinator.com\u002F\n.title .titleline\n  span > a\n  .sitebit a\n```\n2. Run `go run github.com\u002Fbjesus\u002Fpipet\u002Fcmd\u002Fpipet@latest hackernews.pipet` or install Pipet and run `pipet hackernews.pipet`\n3. See all of the latest hacker news in your terminal!\n\n\u003Cdetails>\u003Csummary>Use custom separators\u003C\u002Fsummary>\n  \nUse the `--separator` (or `-s`) flag to specify custom separators for text output.  For example, run `pipet -s \"\\n\" -s \"->\" hackernews.pipet` to see each item in a new line, with `->` between the title and the domain.\u003C\u002Fdetails>\n\u003Cdetails>\u003Csummary>Get as JSON\u003C\u002Fsummary>\n  \nUse the `--json` flag to make Pipet collect the results into a nice JSON.  For example, run `pipet --json hackernews.pipet` to get a JSON representation of the above results.\u003C\u002Fdetails>\n\u003Cdetails>\u003Csummary>Render to a template\u003C\u002Fsummary>\n\nAdd a template file called `hackernews.tpl` next to your `hackernews.pipet` file with this content:\n```\n\u003Cul>\n  {{range $index, $item := index (index . 0) 0}}\n    \u003Cli>{{index $item 0}} ({{index $item 1}})\u003C\u002Fli>\n  {{end}}\n\u003C\u002Ful>\n```\n\nNow run `pipet hackernews.pipet` again and Pipet will automatically detect your template file, and render the results to it.\n\u003C\u002Fdetails>\n\u003Cdetails>\u003Csummary>Use pipes\u003C\u002Fsummary>\n\nUse Unix pipes after your queries, as if they were running in your shell. For example, count the characters in each title (with `wc`) and extract the full article URL (with [htmlq](https:\u002F\u002Fgithub.com\u002Fmgdm\u002Fhtmlq)):\n\n```\ncurl https:\u002F\u002Fnews.ycombinator.com\u002F\n.title .titleline\n  span > a\n  span > a | wc -c\n  .sitebit a\n  .sitebit a | htmlq --attribute href a\n```\n\u003C\u002Fdetails>\n\u003Cdetails>\u003Csummary>Monitor for changes\u003C\u002Fsummary>\n  \nSet an interval and a command to run on change, and have Pipet notify you when something happened. For example, get a notification whenever the Hacker News #1 story is different:\n\n```\ncurl https:\u002F\u002Fnews.ycombinator.com\u002F\n.title .titleline a\n```\n\nRun it with `pipet --interval 60 --on-change \"notify-send {}\" hackernews.pipet`\n\n\u003C\u002Fdetails>\n\n# Installation\n\n## Pre-built\nDownload the latest release from the [Releases](https:\u002F\u002Fgithub.com\u002Fbjesus\u002Fpipet\u002Freleases\u002F) page. `chmod +x pipet` and run `.\u002Fpipet`.\n\n## Compile\nThis installation method requires Go to be installed on your system.\nYou can use Go to install Pipet using `go install github.com\u002Fbjesus\u002Fpipet\u002Fcmd\u002Fpipet@latest`.  Otherwise you can run it without installing using `go run`.\n\n## Distros\nPackages are currently available for [Arch Linux](https:\u002F\u002Faur.archlinux.org\u002Fpackages\u002Fpipet-git), [Homebrew](https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Fpipet) and [Nix](https:\u002F\u002Fsearch.nixos.org\u002Fpackages?channel=unstable&show=pipet&from=0&size=50&sort=relevance&type=packages&query=pipet).\n\n# Usage\n\nThe only required argument for Pipet is the path to your `.pipet` file. Other than this, the `pipet` command accepts the following flags:\n\n- `--json`, `-j` - Output as JSON (default: false)\n- `--template value`, `-t value` - Specify a path to a template file. You can also simply name the file like your `.pipet` file but with a `.tpl` extension for it to be auto-detected.\n- `--separator value`, `-s value` - Set a separator for text output (can be used multiple times for setting different separators for different levels of data nesting)\n- `---max-pages value`, `-p value` - Maximum number of pages to scrape (default: 3)\n- `--interval value`, `-i value` - Rerun Pipet after X seconds. Use 0 to disable (default: 0)\n- `--on-change value`, `-c value` - A command to run when the pipet result is new\n- `--verbose`, `-v` - Enable verbose logging (default: false)\n- `--version` - Print the Pipet version\n- `--help`, `-h` - Show help\n\n# Pipet files\nPipet files describe where and how to get the data you are interested in. They are normal text files containing one or more blocks separated by an empty line. Lines beginning with `\u002F\u002F` are ignored and can be used for comments. Every block can have 3 sections:\n\n1. **Resource** - The first line containing the URL and the tool we are using for scraping\n2. **Queries** - The following lines describing the selectors reaching the data we would like scrap\n3. **Next page** - An _optional_ last line starting with `>` describing the selector pointing to the \"next page\" of data\n\nBelow is an example Pipet file.\n\n```\n\u002F\u002F Read Wikipedia's \"On This Day\" and the subject of today's featured article\ncurl https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FMain_Page\ndiv#mp-otd li\n  body\ndiv#mp-tfa > p > b > a\n\n\u002F\u002F Get the weather in Alert, Canada\ncurl https:\u002F\u002Fwttr.in\u002FAlert%20Canada?format=j1\ncurrent_condition.0.FeelsLikeC\ncurrent_condition.0.FeelsLikeF\n\n\u002F\u002F Check how popular the Pipet repo is\nplaywright https:\u002F\u002Fgithub.com\u002Fbjesus\u002Fpipet\nArray.from(document.querySelectorAll('.about-margin .Link')).map(e => e.innerText.trim()).filter(t=> \u002F^\\d\u002F.test(t) )\n```\n\n##  Resource\n\nResource lines can start with either `curl` or `playwright`.\n\n### curl\n\nResource lines starting with `curl` will be executed using curl. This is meant so that you can use your browser to find the request containing the information you are interested in, right click it, choose \"Copy as cURL\", and paste in your Pipet file. This ensures that your headers and cookies are all the same, making it very easy to get data that is behind a login page or hidden from bots. For example, this is a perfectly valid first line for a block: `curl 'https:\u002F\u002Fnews.ycombinator.com\u002F' --compressed -H 'User-Agent: Mozilla\u002F5.0 (X11; Linux x86_64; rv:131.0) Gecko\u002F20100101 Firefox\u002F131.0' -H 'Accept: text\u002Fhtml,application\u002Fxhtml+xml,application\u002Fxml;q=0.9,image\u002Favif,image\u002Fwebp,image\u002Fpng,image\u002Fsvg+xml,*\u002F*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br, zstd' -H 'DNT: 1' -H 'Sec-GPC: 1' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Sec-Fetch-Dest: document' -H 'Sec-Fetch-Mode: navigate' -H 'Sec-Fetch-Site: none' -H 'Sec-Fetch-User: ?1' -H 'Priority: u=0, i' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -H 'TE: trailers'`.\n\n### Playwright\n\nResource lines starting with `playwright` will use a headless browser to navigate to the specified URL. If you don't have a headless browser installed, Pipet will attempt to download one for you.\n\n## Queries\n\nQuery lines define 3 things:\n1. The way to the exact pieces of data you would like to extract (e.g. using CSS selectors)\n2. The data structure your output will use (e.g. every title and URL should be grouped together by item)\n3. The way the data will be processed (e.g. using Unix pipes) before it is printed\n\nPipet uses 3 different query types - for HTML, for JSON, and for when loading pages with Playwright.\n\n### HTML Queries\nHTML Queries use CSS Selectors to select specific elements. Whitespace nesting is used for iterations - parent lines will run as iterators, making their children lines run for each occurrence of the parent selector. This means that you can use nesting to determine the structure of your final output. See the following 3 examples:\n\n\u003Cdetails>\u003Csummary>Get only the first title and first URL\u003C\u002Fsummary>\n  \n```\ncurl https:\u002F\u002Fnews.ycombinator.com\u002F\n.title .titleline > a\n.sitebit a\n```\n\n\u003C\u002Fdetails>\u003Cdetails>\u003Csummary>Get all the titles, and then get all URLs\u003C\u002Fsummary>\n  \n```\ncurl https:\u002F\u002Fnews.ycombinator.com\u002F\n.title .titleline\n  span > a\n.title .titleline\n  .sitebit a\n```\n\n\u003C\u002Fdetails>\u003Cdetails>\u003Csummary>Get all the title and URL for each story\u003C\u002Fsummary>\n  \n```\ncurl https:\u002F\u002Fnews.ycombinator.com\u002F\n.title .titleline\n  span > a\n  .sitebit a\n```\n\u003C\u002Fdetails>\n\nWhen writing your child selectors, note that the whole document isn't available anymore. Pipet is passing only your parent HTML to the child iterations.\n\nBy default, Pipet will return the `innerText` of your elements. If you need to another piece of data, use Unix pipes. When piping HTML elements, Pipet will pipe the element's complete HTML. For example, you can use `| htmq --attr href a` to extract the `href` attribute from links.\n\n### JSON Queries\n\nJSON Queries use the [GJSON syntax](https:\u002F\u002Fgithub.com\u002Ftidwall\u002Fgjson\u002Fblob\u002Fmaster\u002FSYNTAX.md) to select specific elements. Here too, whitespace nesting is used for iterations - parent lines will run as iterators, making their children lines run for each occurrence of the parent selector. If you don't like GJSON, that's okay. For example, you can use `jq` by passing parts or the complete JSON to it using Unix pipes, like `@this | jq '.[].firstName'`.\n\nWhen using pipes, Pipet will attempt to parse the returned string. If it's valid JSON, it will be parsed and injected as an object into the Pipet result.\n\n\u003Cdetails>\u003Csummary>Querying and jq usage\u003C\u002Fsummary>\n  \nThe example below will return the latest water temperature in Amsterdam NDSM, and then pipe the complete JSON to `jq` so it will combine the coordinates of the reading into one field.\n\n```\ncurl https:\u002F\u002Fwaterinfo.rws.nl\u002Fapi\u002Fdetail\u002Fget?locationSlug=NDSM-werf-(o)(NDS1)&mapType=watertemperatuur\nlatest.data\n@this | jq -r '\"\\(.coordinatex), \\(.coordinatey)\"'\n```\n\n\u003C\u002Fdetails>\u003Cdetails>\u003Csummary>Iterations\u003C\u002Fsummary>\n  \nThis will return times for bus deparatures. Note the two types of iterations - the first line is GJSON query that returns the `ExpectedDepartureTime` for each trip, while the the following lines iterates over each trip object using the nested lines below it, allowing us to return multiple keys - `ExpectedDepartureTime` & `TripStopStatus`.\n\n```\ncurl http:\u002F\u002Fv0.ovapi.nl\u002Ftpc\u002F30005093\n30005093.Passes.@values.#.ExpectedDepartureTime\n30005093.Passes.@values\n  ExpectedDepartureTime\n  TripStopStatus\n```\n\n\u003C\u002Fdetails>\u003Cdetails>\u003Csummary>CSV export\u003C\u002Fsummary>\n  \nWe can create a simple CSV file by using the previous iteration and configurating a separator\n\n```\ncurl http:\u002F\u002Fv0.ovapi.nl\u002Ftpc\u002F30005093\n30005093.Passes.@values\n  ExpectedDepartureTime\n  TripStopStatus\n```\n\nRun using `pipet -s '\\n' water.pipet > output.csv` to generate a CSV file.\n\n\u003C\u002Fdetails>\n\n### Playwright Queries\n\nPlaywright Queries are different and do not use whitespace nesting. Instead, queries here are simply JavaScript code that will be evaluated after the webpage loaded. If the JavaScript code returns something that can be serialized as JSON, it will be included in Pipet's output. Otherwise, you can write JavaScript that will click, scroll or perform any other action you might want.\n\n\u003Cdetails>\u003Csummary>Simple Playwright example\u003C\u002Fsummary>\n\nThis example will return a string like `80 stars, 2 watching, 2 forks` after visiting the Pipet repo on Github.\n\n```\nplaywright https:\u002F\u002Fgithub.com\u002Fbjesus\u002Fpipet\nArray.from(document.querySelectorAll('.about-margin .Link')).map(e => e.innerText.trim()).filter(t=> \u002F^\\d\u002F.test(t) )\n```\n\nNote that if you copy the second line and paste it in your browser console while visiting https:\u002F\u002Fgithub.com\u002Fbjesus\u002Fpipet, you'd get exactly the same result. The vice-versa is also true - if your code worked in the browser, it should work in Pipet too.\n\n\u003C\u002Fdetails>\n\n## Next page\n\nThe Next Page line lets you specify a CSS selector that will be used to determine the link to the next page of data. Pipet will then follow it and execute the same queries over it. For example, see this `hackernews.pipet` file:\n```\ncurl https:\u002F\u002Fnews.ycombinator.com\u002F\n.title .titleline\n  span > a\n  .sitebit a\n> a.morelink\n```\n\nThe Next Page line is currently only available when working with `curl` and HTML files.\n","Pipet 是一个面向黑客的多功能命令行网页抓取工具，用于从在线资源中提取数据。它支持HTML解析、JSON解析和客户端JavaScript评估三种操作模式，并通过依赖curl等现有工具以及Unix管道来扩展其功能。Pipet特别适合需要自动化跟踪信息更新的场景，如货物追踪、票务监控、股价变动监测等。此外，该工具还提供了灵活的数据输出选项，包括自定义分隔符、JSON格式化及模板渲染，使得用户可以根据需求定制输出结果。",2,"2026-06-11 03:44:33","high_star"]