[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72027":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":46,"readmeContent":47,"aiSummary":48,"trendingCount":16,"starSnapshotCount":16,"syncStatus":49,"lastSyncTime":50,"discoverSource":51},72027,"pydoll","autoscrape-labs\u002Fpydoll","autoscrape-labs","Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions. ","https:\u002F\u002Fpydoll.tech\u002F",null,"Python",6898,385,32,16,0,7,19,72,21,94.46,"MIT License",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45],"anti-detection","automation","browser-automation","cdp","chromium","crawler","e2e-tests","fingerprinting","headless","playwright","puppeteer","recaptcha-v3","scraping","selenium","testing","testing-tools","turnstile-solver","web-scraping","webdriver","2026-06-12 04:01:03","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F2c380638-b04a-4b04-b1c8-2958e4237a94\" alt=\"Pydoll Logo\" \u002F> \u003Cbr>\n\u003C\u002Fp>\n\u003Cp align=\"center\">Async-native, fully typed, built for evasion and performance.\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fautoscrape-labs\u002Fpydoll\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fautoscrape-labs\u002Fpydoll?style=social\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcodecov.io\u002Fgh\u002Fautoscrape-labs\u002Fpydoll\" >\n        \u003Cimg src=\"https:\u002F\u002Fcodecov.io\u002Fgh\u002Fautoscrape-labs\u002Fpydoll\u002Fgraph\u002Fbadge.svg?token=40I938OGM9\"\u002F>\n    \u003C\u002Fa>\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fautoscrape-labs\u002Fpydoll\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg\" alt=\"Tests\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fautoscrape-labs\u002Fpydoll\u002Factions\u002Fworkflows\u002Fruff-ci.yml\u002Fbadge.svg\" alt=\"Ruff CI\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fautoscrape-labs\u002Fpydoll\u002Factions\u002Fworkflows\u002Fmypy.yml\u002Fbadge.svg\" alt=\"MyPy CI\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-%3E%3D3.10-blue\" alt=\"Python >= 3.10\">\n    \u003Ca href=\"https:\u002F\u002Fdeepwiki.com\u002Fautoscrape-labs\u002Fpydoll\">\u003Cimg src=\"https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg\" alt=\"Ask DeepWiki\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fpydoll.tech\u002F\">Documentation\u003C\u002Fa> &middot;\n    \u003Ca href=\"#getting-started\">Getting Started\u003C\u002Fa> &middot;\n    \u003Ca href=\"#features\">Features\u003C\u002Fa> &middot;\n    \u003Ca href=\"#support\">Support\u003C\u002Fa>\n\u003C\u002Fp>\n\nPydoll automates Chromium-based browsers (Chrome, Edge) by connecting directly to the Chrome DevTools Protocol over WebSocket. **No WebDriver binary, no `navigator.webdriver` flag, no compatibility issues.**\n\nIt combines a high-level API for stealthy automation with low-level CDP access for fine-grained control over network, fingerprinting, and browser behavior. And with its new **Pydantic-powered extraction engine**, it maps the DOM directly to structured Python objects, delivering an unmatched Developer Experience (DX).\n\n### Top Sponsors\n\n\u003Ca href=\"https:\u002F\u002Fsubstack.thewebscraping.club\u002Fp\u002Fpydoll-webdriver-scraping?utm_source=github&utm_medium=repo&utm_campaign=pydoll\">\n    \u003Cimg src=\"public\u002Fimages\u002Fbanner-the-webscraping-club.png\" alt=\"The Web Scraping Club\" \u002F>\n\u003C\u002Fa>\n\n\u003Csub>Read a full review of Pydoll on \u003Cb>\u003Ca href=\"https:\u002F\u002Fsubstack.thewebscraping.club\u002Fp\u002Fpydoll-webdriver-scraping?utm_source=github&utm_medium=repo&utm_campaign=pydoll\">The Web Scraping Club\u003C\u002Fa>\u003C\u002Fb>, the #1 newsletter dedicated to web scraping.\u003C\u002Fsub>\n\n### Sponsors\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fwww.thordata.com\u002F?ls=github&lk=pydoll\">\u003Cimg src=\"public\u002Fimages\u002FThordata-logo.png\" height=\"30\" alt=\"Thordata\" \u002F>\u003C\u002Fa>\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fdashboard.capsolver.com\u002Fpassport\u002Fregister?inviteCode=WPhTbOsbXEpc\">\u003Cimg src=\"public\u002Fimages\u002Fcapsolver-logo.png\" height=\"40\" alt=\"CapSolver\" \u002F>\u003C\u002Fa>\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fwww.testmuai.com\u002F?utm_medium=sponsor&utm_source=pydoll\">\u003Cimg src=\"public\u002Fimages\u002Flogo-lamda-test.svg\" height=\"30\" width=\"130\" alt=\"LambdaTest\" \u002F>\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Csub>[Learn more about our sponsors](SPONSORS.md) &middot; [Become a sponsor](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fthalissonvs)\u003C\u002Fsub>\n\n### Why Pydoll\n\n- **Structured extraction**: Define a [Pydantic](https:\u002F\u002Fdocs.pydantic.dev\u002F) model, call `tab.extract()`, get typed and validated data back. No manual element-by-element querying.\n- **Async and typed**: Built on `asyncio` from the ground up, 100% type-checked with `mypy`. Full IDE autocompletion and static error checking.\n- **Stealth built in**: Human-like mouse movement, realistic typing, and granular [browser preference](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fconfiguration\u002Fbrowser-preferences\u002F) control for fingerprint management.\n- **Network control**: [Intercept](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fnetwork\u002Finterception\u002F) requests to block ads\u002Ftrackers, [monitor](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fnetwork\u002Fmonitoring\u002F) traffic for API discovery, and make [authenticated HTTP requests](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fnetwork\u002Fhttp-requests\u002F) that inherit the browser session.\n- **Shadow DOM and iframes**: Full support for [shadow roots](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Fdeep-dive\u002Farchitecture\u002Fshadow-dom\u002F) (including closed) and cross-origin iframes. Discover, query, and interact with elements inside them using the same API.\n\n## Installation\n\n```bash\npip install pydoll-python\n```\n\nNo WebDriver binaries or external dependencies required.\n\n## Getting Started\n\n### 1. Stateful Automation & Evasion\n\nWhen you need to navigate, bypass challenges, or interact with dynamic UI, Pydoll's imperative API handles it with humanized timing by default.\n\n```python\nimport asyncio\nfrom pydoll.browser import Chrome\nfrom pydoll.constants import Key\n\nasync def google_search(query: str):\n    async with Chrome() as browser:\n        tab = await browser.start()\n        await tab.go_to('https:\u002F\u002Fwww.google.com')\n\n        # Find elements and interact with human-like timing\n        search_box = await tab.find(tag_name='textarea', name='q')\n        await search_box.insert_text(query)\n        await tab.keyboard.press(Key.ENTER)\n\n        first_result = await tab.find(\n            tag_name='h3',\n            text='autoscrape-labs\u002Fpydoll',\n            timeout=10,\n        )\n        await first_result.click()\n        print(f\"Page loaded: {await tab.title}\")\n\nasyncio.run(google_search('pydoll site:github.com'))\n```\n\n### 2. Structured Data Extraction\n\nOnce you reach the target page, switch to the declarative engine. Define what you want with a model, and Pydoll extracts it — typed, validated, and ready to use.\n\n```python\nfrom pydoll.browser.chromium import Chrome\nfrom pydoll.extractor import ExtractionModel, Field\n\nclass Quote(ExtractionModel):\n    text: str = Field(selector='.text', description='The quote text')\n    author: str = Field(selector='.author', description='Who said it')\n    tags: list[str] = Field(selector='.tag', description='Tags')\n    year: int | None = Field(selector='.year', description='Year', default=None)\n\nasync def extract_quotes():\n    async with Chrome() as browser:\n        tab = await browser.start()\n        await tab.go_to('https:\u002F\u002Fquotes.toscrape.com')\n\n        quotes = await tab.extract_all(Quote, scope='.quote', timeout=5)\n\n        for q in quotes:\n            print(f'{q.author}: {q.text}')  # fully typed, IDE autocomplete works\n            print(q.tags)                    # list[str], not a raw element\n            print(q.model_dump_json())       # pydantic serialization built-in\n\nasyncio.run(extract_quotes())\n```\n\nModels support CSS\u002FXPath auto-detection, HTML attribute targeting, custom transforms, and nested models.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Nested models, transforms, and attribute extraction\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\n```python\nfrom datetime import datetime\nfrom pydoll.extractor import ExtractionModel, Field\n\ndef parse_date(raw: str) -> datetime:\n    return datetime.strptime(raw.strip(), '%B %d, %Y')\n\nclass Author(ExtractionModel):\n    name: str = Field(selector='.author-title')\n    born: datetime = Field(\n        selector='.author-born-date',\n        transform=parse_date,\n    )\n\nclass Article(ExtractionModel):\n    title: str = Field(selector='h1')\n    url: str = Field(selector='.source-link', attribute='href')\n    author: Author = Field(selector='.author-card', description='Nested model')\n\narticle = await tab.extract(Article, timeout=5)\narticle.author.born.year  # int — types are preserved all the way down\n```\n\u003C\u002Fdetails>\n\n## Features\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Humanized Mouse Movement\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nMouse operations produce human-like cursor movement by default:\n\n- **Bezier curve paths** with asymmetric control points\n- **Fitts's Law timing**: duration scales with distance\n- **Minimum-jerk velocity**: bell-shaped speed profile\n- **Physiological tremor**: Gaussian noise scaled with velocity\n- **Overshoot correction**: ~70% chance on fast movements, then corrects back\n\n```python\nawait tab.mouse.move(500, 300)\nawait tab.mouse.click(500, 300)\nawait tab.mouse.drag(100, 200, 500, 400)\n\nbutton = await tab.find(id='submit')\nawait button.click()\n\n# Opt out when speed matters\nawait tab.mouse.click(500, 300, humanize=False)\n```\n\n[Mouse Control Docs](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fautomation\u002Fmouse-control\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Shadow DOM Support\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nFull Shadow DOM support, including closed shadow roots. Because Pydoll operates at the CDP level (below JavaScript), the `closed` mode restriction doesn't apply.\n\n```python\nshadow = await element.get_shadow_root()\nbutton = await shadow.query('.internal-btn')\nawait button.click()\n\n# Discover all shadow roots on the page\nshadow_roots = await tab.find_shadow_roots()\nfor sr in shadow_roots:\n    checkbox = await sr.query('input[type=\"checkbox\"]', raise_exc=False)\n    if checkbox:\n        await checkbox.click()\n```\n\nHighlights:\n- Closed shadow roots work without workarounds\n- `find_shadow_roots()` discovers every shadow root on the page\n- `timeout` parameter for polling until shadow roots appear\n- `deep=True` traverses cross-origin iframes (OOPIFs)\n- Standard `find()`, `query()`, `click()` API inside shadow roots\n\n[Shadow DOM Docs](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Fdeep-dive\u002Farchitecture\u002Fshadow-dom\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>HAR Network Recording\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nRecord network activity during a browser session and export as HAR 1.2. Replay recorded requests to reproduce exact API sequences.\n\n```python\nfrom pydoll.browser.chromium import Chrome\n\nasync with Chrome() as browser:\n    tab = await browser.start()\n\n    async with tab.request.record() as capture:\n        await tab.go_to('https:\u002F\u002Fexample.com')\n\n    capture.save('flow.har')\n    print(f'Captured {len(capture.entries)} requests')\n\n    responses = await tab.request.replay('flow.har')\n```\n\n[HAR Recording Docs](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fnetwork\u002Fnetwork-recording\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Page Bundles\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nSave the current page and all its assets (CSS, JS, images, fonts) as a `.zip` bundle for offline viewing. Optionally inline everything into a single HTML file.\n\n```python\nawait tab.save_bundle('page.zip')\nawait tab.save_bundle('page-inline.zip', inline_assets=True)\n```\n\n[Screenshots, PDFs & Bundles Docs](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fautomation\u002Fscreenshots-and-pdfs\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Hybrid Automation (UI + API)\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nUse UI automation to pass login flows (CAPTCHAs, JS challenges), then switch to `tab.request` for fast API calls that inherit the full browser session: cookies, headers, and all.\n\n```python\n# Log in via UI\nawait tab.go_to('https:\u002F\u002Fmy-site.com\u002Flogin')\nawait (await tab.find(id='username')).type_text('user')\nawait (await tab.find(id='password')).type_text('pass123')\nawait (await tab.find(id='login-btn')).click()\n\n# Make authenticated API calls using the browser session\nresponse = await tab.request.get('https:\u002F\u002Fmy-site.com\u002Fapi\u002Fuser\u002Fprofile')\nuser_data = response.json()\n```\n[Hybrid Automation Docs](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fnetwork\u002Fhttp-requests\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Network Interception and Monitoring\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nMonitor traffic for API discovery or intercept requests to block ads, trackers, and unnecessary resources.\n\n```python\nimport asyncio\nfrom pydoll.browser.chromium import Chrome\nfrom pydoll.protocol.fetch.events import FetchEvent, RequestPausedEvent\nfrom pydoll.protocol.network.types import ErrorReason\n\nasync def block_images():\n    async with Chrome() as browser:\n        tab = await browser.start()\n\n        async def block_resource(event: RequestPausedEvent):\n            request_id = event['params']['requestId']\n            resource_type = event['params']['resourceType']\n\n            if resource_type in ['Image', 'Stylesheet']:\n                await tab.fail_request(request_id, ErrorReason.BLOCKED_BY_CLIENT)\n            else:\n                await tab.continue_request(request_id)\n\n        await tab.enable_fetch_events()\n        await tab.on(FetchEvent.REQUEST_PAUSED, block_resource)\n\n        await tab.go_to('https:\u002F\u002Fexample.com')\n        await asyncio.sleep(3)\n        await tab.disable_fetch_events()\n\nasyncio.run(block_images())\n```\n[Network Monitoring](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fnetwork\u002Fmonitoring\u002F) | [Request Interception](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fnetwork\u002Finterception\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Browser Fingerprint Control\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nGranular control over [browser preferences](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fconfiguration\u002Fbrowser-preferences\u002F): hundreds of internal Chrome settings for building consistent fingerprints.\n\n```python\noptions = ChromiumOptions()\n\noptions.browser_preferences = {\n    'profile': {\n        'default_content_setting_values': {\n            'notifications': 2,\n            'geolocation': 2,\n        },\n        'password_manager_enabled': False\n    },\n    'intl': {\n        'accept_languages': 'en-US,en',\n    },\n    'browser': {\n        'check_default_browser': False,\n    }\n}\n```\n[Browser Preferences Guide](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fconfiguration\u002Fbrowser-preferences\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Concurrency, Contexts and Remote Connections\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nManage [multiple tabs](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fbrowser-management\u002Ftabs\u002F) and [browser contexts](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fbrowser-management\u002Fcontexts\u002F) (isolated sessions) concurrently. Connect to browsers running in Docker or remote servers.\n\n```python\nasync def scrape_page(url, tab):\n    await tab.go_to(url)\n    return await tab.title\n\nasync def concurrent_scraping():\n    async with Chrome() as browser:\n        tab_google = await browser.start()\n        tab_ddg = await browser.new_tab()\n\n        results = await asyncio.gather(\n            scrape_page('https:\u002F\u002Fgoogle.com\u002F', tab_google),\n            scrape_page('https:\u002F\u002Fduckduckgo.com\u002F', tab_ddg)\n        )\n        print(results)\n```\n[Multi-Tab Management](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fbrowser-management\u002Ftabs\u002F) | [Remote Connections](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fadvanced\u002Fremote-connections\u002F)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Retry Decorator\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr>\n\nThe `@retry` decorator supports custom recovery logic between attempts (e.g., refreshing the page, rotating proxies) and exponential backoff.\n\n```python\nfrom pydoll.decorators import retry\nfrom pydoll.exceptions import ElementNotFound, NetworkError\n\n@retry(\n    max_retries=3,\n    exceptions=[ElementNotFound, NetworkError],\n    on_retry=my_recovery_function,\n    exponential_backoff=True\n)\nasync def scrape_product(self, url: str):\n    # scraping logic\n    ...\n```\n[Retry Decorator Docs](https:\u002F\u002Fpydoll.tech\u002Fdocs\u002Ffeatures\u002Fadvanced\u002Fdecorators\u002F)\n\u003C\u002Fdetails>\n\n---\n\n## Contributing\n\nContributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## Support\n\nIf you find Pydoll useful, consider [sponsoring the project on GitHub](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fthalissonvs).\n\n## License\n\n[MIT License](LICENSE)\n","Pydoll 是一个用于自动化控制基于 Chromium 的浏览器的库，无需使用 WebDriver 即可实现真实的交互。其核心功能包括通过 WebSocket 直接连接 Chrome DevTools Protocol，从而避免了 `navigator.webdriver` 标志和兼容性问题。此外，它还提供了高级 API 以支持隐蔽操作，并允许低级别 CDP 访问来精细控制网络、指纹识别及浏览器行为。Pydoll 特别适用于需要绕过反爬虫机制的数据抓取任务，如电商比价、市场调研等场景。项目采用 Python 编写，支持版本 3.10 及以上，拥有活跃的社区支持和详细的文档说明。",2,"2026-06-11 03:40:01","high_star"]