diff --git a/README.md b/README.md
index 54d32498..56f78840 100644
--- a/README.md
+++ b/README.md
@@ -92,9 +92,11 @@ To create, run, and deploy your first Actor step by step, see the [Quick start g
## What are Actors?
-Actors are serverless cloud programs that can do almost anything a human can do in a web browser. They range from small tasks, such as filling in forms or unsubscribing from online services, all the way up to scraping and processing vast numbers of web pages.
+Actors are serverless programs that can do almost anything. From simple scripts and web scrapers to complex automation workflows, AI agents, or even always-on services that expose HTTP endpoints.
-They run either locally or on the [Apify platform](https://docs.apify.com/platform/), where you can run them at scale, monitor them, schedule them, or publish and monetize them. If you're new to Apify, learn [what Apify is](https://docs.apify.com/platform/about) in the platform documentation.
+They can run either locally or on the Apify platform, where you can scale their execution, monitor runs, schedule tasks, integrate them with other services, or even publish and monetize them. If you're new to Apify, learn more about the platform in the [Apify documentation](https://docs.apify.com/platform/about).
+
+For more context, read the [Actor whitepaper](https://whitepaper.actor/).
## Features
@@ -197,7 +199,7 @@ The full SDK documentation lives at **[docs.apify.com/sdk/python](https://docs.a
| [Overview](https://docs.apify.com/sdk/python/docs/overview) | What the SDK is, what Actors are, and how the pieces fit together. |
| [Quick start](https://docs.apify.com/sdk/python/docs/quick-start) | Create, run, and deploy your first Python Actor. |
| [Concepts](https://docs.apify.com/sdk/python/docs/concepts/actor-lifecycle) | Actor lifecycle, input, storages, events, proxy management, interacting with other Actors, webhooks, accessing the Apify API, logging, configuration, and pay-per-event. |
-| [Guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx) | Integrations with BeautifulSoup, Parsel, Playwright, Selenium, Crawlee, Scrapy, Crawl4AI, and Browser Use, plus running a web server and using uv. |
+| [Guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx) | Integrations with BeautifulSoup, Parsel, Playwright, Selenium, Crawlee, Scrapy, Scrapling, Crawl4AI, and Browser Use, plus running a web server and using uv. |
| [Upgrading](https://docs.apify.com/sdk/python/docs/upgrading/upgrading-to-v4) | Migrating between major versions. |
| [API reference](https://docs.apify.com/sdk/python/reference) | Generated reference for every class and method. |
| [Changelog](https://docs.apify.com/sdk/python/docs/changelog) | Release history and breaking changes. |
diff --git a/docs/01_introduction/index.mdx b/docs/01_introduction/index.mdx
index be37d6ac..567e4db1 100644
--- a/docs/01_introduction/index.mdx
+++ b/docs/01_introduction/index.mdx
@@ -9,26 +9,42 @@ import CodeBlock from '@theme/CodeBlock';
import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
-The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform). With the SDK, you can:
-
-- Manage the Actor lifecycle: initialization, graceful shutdown, status messages, rebooting, and metamorphing.
-- Work with datasets, key-value stores, and request queues, with automatic local emulation when running outside the platform.
-- Read the Actor input, including automatic decryption of secret fields.
-- React to platform events (system info, migration, abort) and persist state across migrations and restarts.
-- Manage proxies, both [Apify Proxy](https://docs.apify.com/platform/proxy) and your own, with session and tiered-proxy support.
-- Start, call, and abort Actors and tasks, create webhooks, and reach the full Apify API client.
-- Charge users with the pay-per-event pricing model.
-- Integrate with [Crawlee](../guides/crawlee) and [Scrapy](../guides/scrapy), with guides for [Playwright](../guides/playwright) and others.
+The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform). It handles the Actor lifecycle, [storage](https://docs.apify.com/platform/storage) access, platform events, [Apify Proxy](https://docs.apify.com/platform/proxy), pay-per-event charging, and more.
{IntroductionExample}
-## What are Actors
+## What are Actors?
+
+Actors are serverless programs that can do almost anything. From simple scripts and web scrapers to complex automation workflows, AI agents, or even always-on services that expose HTTP endpoints.
+
+They can run either locally or on the Apify platform, where you can scale their execution, monitor runs, schedule tasks, integrate them with other services, or even publish and monetize them. If you're new to Apify, learn more about the platform in the [Apify documentation](https://docs.apify.com/platform/about).
+
+For more context, read the [Actor whitepaper](https://whitepaper.actor/).
+
+## Features
+
+- Run the full Actor lifecycle inside `async with Actor:`, covering init, exit, failures, status messages, rebooting, and metamorphing ([Actor lifecycle](../concepts/actor-lifecycle)).
+- Read Actor input validated against your input schema with `Actor.get_input()`, including automatic decryption of secret fields ([Actor input](../concepts/actor-input)).
+- Read and write datasets, key-value stores, and request queues, locally or on the platform ([Working with storages](../concepts/storages)).
+- React to platform events such as system info, migration, and abort, and persist state across migrations and restarts ([Actor events](../concepts/actor-events)).
+- Route requests through Apify Proxy with group selection, country targeting, and rotation, with session and tiered-proxy support ([Proxy management](../concepts/proxy-management)).
+- Start, call, and abort other Actors and tasks, and attach webhooks to run events ([Interacting with other Actors](../concepts/interacting-with-other-actors), [Webhooks](../concepts/webhooks)).
+- Monetize your Actor with pay-per-event charging ([Pay-per-event](../concepts/pay-per-event)).
+- Reach the full [Apify API](https://docs.apify.com/api/v2) through a preconfigured `ApifyClient` ([Accessing the Apify API](../concepts/access-apify-api)).
+
+## What you can build
+
+Almost any Python project can become an Actor, including projects for:
-Actors are serverless cloud programs capable of performing tasks in a web browser, similar to what a human can do. These tasks can range from simple operations, such as filling out forms or unsubscribing from services, to complex jobs like scraping and processing large numbers of web pages.
+- **Web scraping and crawling** - The SDK is fully compatible with [Crawlee](https://crawlee.dev/python), which makes Apify a natural place to deploy and scale your crawlers (see the [Crawlee guide](../guides/crawlee)). It also works with other popular scraping libraries, such as [Scrapy](../guides/scrapy), [Scrapling](../guides/scrapling), or [Crawl4AI](../guides/crawl4ai).
+- **Browser automation** - Drive a real browser with [Playwright](../guides/playwright) or [Selenium](../guides/selenium), or with higher-level tools such as [Browser Use](../guides/browser-use).
+- **Web servers and APIs** - Run a [web server](../guides/running-webserver) inside an Actor to serve HTTP requests, for example to expose your scraper as a live API.
+- **AI agents** - Host agents built with your framework of choice. Ready-made Actor templates cover [PydanticAI](https://apify.com/templates/python-pydanticai), [CrewAI](https://apify.com/templates/python-crewai), [LangGraph](https://apify.com/templates/python-langgraph), [LlamaIndex](https://apify.com/templates/python-llamaindex-agent), and [Smolagents](https://apify.com/templates/python-smolagents).
+- **MCP servers** - Deploy a Python MCP server as an Actor and make its tools available to any MCP client. See the [MCP server](https://apify.com/templates/python-mcp-empty) and [MCP proxy](https://apify.com/templates/python-mcp-proxy) templates.
-Actors can be executed locally or on the [Apify platform](https://docs.apify.com/platform). The Apify platform lets you run Actors at scale and provides features for monitoring, scheduling, publishing, and monetizing them.
+Whatever you build, the Apify SDK doesn't lock you into a particular framework. Bring the libraries you already use, and let Apify run your project in the cloud.
## Quick start
diff --git a/docs/03_guides/07_scrapling.mdx b/docs/03_guides/07_scrapling.mdx
new file mode 100644
index 00000000..8d384df2
--- /dev/null
+++ b/docs/03_guides/07_scrapling.mdx
@@ -0,0 +1,141 @@
+---
+id: scrapling
+title: Adaptive scraping with Scrapling
+description: Build an Apify Actor that scrapes web pages using the Scrapling adaptive web scraping library.
+---
+
+import CodeBlock from '@theme/CodeBlock';
+import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
+
+import ScraplingExample from '!!raw-loader!roa-loader!./code/07_scrapling.py';
+import ScraplingBrowserScraper from '!!raw-loader!./code/07_scrapling_browser.py';
+
+In this guide, you'll learn how to use the [Scrapling](https://scrapling.readthedocs.io/) library for adaptive web scraping in your Apify Actors.
+
+## Introduction
+
+[Scrapling](https://scrapling.readthedocs.io/) is an adaptive web scraping library for Python that combines fetching and parsing behind a single, high-level API. It can fetch a page with fast HTTP requests or with a real browser, parse the result with familiar CSS selectors and XPath, and relocate your selectors automatically when a website's structure changes.
+
+Scrapling is a great fit for Apify Actors:
+
+- A single API exposes a fast HTTP client with browser TLS-fingerprint impersonation, as well as full browser automation for JavaScript-heavy or protected pages.
+- Scrapling can remember the elements you scraped and find them again after a website redesign. Your scrapers keep working with fewer manual fixes.
+- Built-in stealth features (browser impersonation, realistic headers, and automatic Cloudflare Turnstile solving with the browser fetchers) help you avoid being blocked.
+- Elements are selected with CSS selectors (including the `::text` and `::attr()` pseudo-elements) or XPath, with a Scrapy/Parsel-like `.get()` and `.getall()` interface.
+- Every fetcher has an asynchronous variant, which integrates naturally with the asyncio-based Apify SDK.
+
+Scrapling's parser works on its own. The fetchers are an optional extra. To get the HTTP and browser fetchers, install Scrapling with the `fetchers` extra:
+
+```bash
+pip install "scrapling[fetchers]"
+```
+
+## Choosing a fetcher
+
+All of Scrapling's fetchers are importable from `scrapling.fetchers`. Pick the one that matches the website you're scraping:
+
+- **`Fetcher` / `AsyncFetcher`** - Plain HTTP requests via `.get()`, `.post()`, `.put()`, and `.delete()`. Fast and lightweight, with optional browser TLS-fingerprint impersonation (`impersonate`) and realistic headers (`stealthy_headers`). This is the best choice for static pages and APIs, and it doesn't need browser binaries.
+- **`DynamicFetcher` / `DynamicSession`** - Full browser automation based on [Playwright](https://playwright.dev/), for pages that require JavaScript rendering or interaction. Fetch a page with `.fetch()` or its async variant `.async_fetch()`.
+- **`StealthyFetcher` / `StealthySession`** - A stealth-hardened browser fetcher that can automatically solve Cloudflare Turnstile challenges (`solve_cloudflare=True`). Use it for the most heavily protected websites.
+
+The returned `Response` object is also a Scrapling selector, so you can call `.css()`, `.xpath()`, `.find_all()`, and the other parsing methods on it directly.
+
+The HTTP fetchers work with just the `scrapling[fetchers]` extra. The browser-based fetchers (`DynamicFetcher` and `StealthyFetcher`) additionally need browser binaries, which you download with the `scrapling install` command. See [Running browser-based fetchers](#running-browser-based-fetchers).
+
+The example Actor in this guide uses the HTTP `AsyncFetcher`, which is the simplest to deploy and pairs well with Apify Proxy.
+
+## Example Actor
+
+The following Actor recursively scrapes data from linked pages on the same site, up to a user-defined maximum depth, starting from the URLs in the Actor input. It uses Scrapling's `AsyncFetcher` to fetch each page through [Apify Proxy](https://docs.apify.com/platform/proxy), and CSS selectors to extract the title, headings, and links.
+
+The whole Actor fits in a single file. A `scrape_page` helper holds the Scrapling-specific fetching and parsing, while the `main` coroutine handles the [Actor](https://docs.apify.com/platform/actors) lifecycle, reads the input, sets up [Apify Proxy](https://docs.apify.com/platform/proxy) and the [request queue](https://docs.apify.com/platform/storage/request-queue), and drives the crawl:
+
+
+ {ScraplingExample}
+
+
+Note that:
+
+- Keeping the fetching and parsing in `scrape_page` separates the Scrapling-specific code from the Actor's orchestration logic. The function returns the extracted data together with the discovered links, so `main` decides what to store and what to enqueue.
+- The response of `AsyncFetcher.get` is a Scrapling selector, so `response.css('title::text').get()` reads the page title and `response.css('a::attr(href)').getall()` returns every link's `href` in one call.
+- `response.urljoin(link_href)` resolves relative links against the page URL, so you can enqueue them directly.
+- The `impersonate='chrome'` and `stealthy_headers=True` options make the request look like it comes from a real Chrome browser. Combined with Apify Proxy, it reduces the chance of being blocked.
+
+## Adaptive selectors
+
+The example above uses plain CSS selectors. Scrapling can also track the elements you scrape and relocate them when a website changes its markup, so a redesign doesn't immediately break your scraper. This is most useful for scrapers that revisit the same pages over time, rather than one-off crawls.
+
+1. Enable adaptive matching once on the fetcher:
+
+ ```python
+ AsyncFetcher.configure(adaptive=True)
+ ```
+
+2. On the first run, pass `auto_save=True` when you select an element. Scrapling records a fingerprint of that element, keyed by the selector:
+
+ ```python
+ title = response.css('h1.product-title::text', auto_save=True).get()
+ ```
+
+3. On a later run, if the selector no longer matches because the page changed, pass `adaptive=True` with the same selector. Scrapling uses the saved fingerprint to find the element in its new location:
+
+ ```python
+ title = response.css('h1.product-title::text', adaptive=True).get()
+ ```
+
+Scrapling keeps these fingerprints in a local SQLite database. On the Apify platform the Actor's filesystem doesn't persist between runs, so to keep them across runs, store that database in a [key-value store](https://docs.apify.com/platform/storage/key-value-store) and restore it on startup. For details, see [Scrapling's adaptive parsing documentation](https://scrapling.readthedocs.io/en/latest/parsing/adaptive.html).
+
+## Using Apify Proxy
+
+Running on the Apify platform gives your scraper access to [Apify Proxy](https://docs.apify.com/platform/proxy), which rotates IP addresses to avoid rate limiting and blocking. In the example above, `main` creates a proxy configuration with `Actor.create_proxy_configuration` and passes a fresh proxy URL to `scrape_page` for every request, which forwards it to Scrapling's `proxy` argument.
+
+Scrapling accepts the proxy as a URL string (for example `http://user:pass@proxy.apify.com:8000`), which is what `ProxyConfiguration.new_url` returns. To select specific proxy groups or a country, pass the relevant arguments to `Actor.create_proxy_configuration`. For details, see [Proxy management](../concepts/proxy-management). The browser-based fetchers accept the same `proxy` argument.
+
+## Running browser-based fetchers
+
+`DynamicFetcher` and `StealthyFetcher` drive a real browser, so they need the browser binaries installed with the `scrapling install` command. Locally, run it once after installing the `scrapling[fetchers]` extra:
+
+```bash
+scrapling install
+```
+
+To switch the example from HTTP to a real browser, fetch each page through a browser session instead of `AsyncFetcher`. Opening a fresh browser for every page would be wasteful, so `main` enters an `AsyncDynamicSession` once and reuses it for the whole crawl, while `scrape_page` fetches with `session.fetch`. The parsing API is identical, so the extraction code stays the same:
+
+
+ {ScraplingBrowserScraper}
+
+
+Note that:
+
+- `AsyncDynamicSession` launches one browser and keeps it open across `session.fetch` calls, so the crawl doesn't pay the browser-startup cost on every page.
+- The proxy URL is passed per fetch, so each page can go through a fresh Apify Proxy IP while sharing the same browser.
+
+To run this on the Apify platform, build on top of the [Apify Playwright base image](https://hub.docker.com/r/apify/actor-python-playwright), which already ships a browser together with all of its system-level dependencies. Then run `scrapling install` during the Docker build to download the browser binaries that Scrapling expects:
+
+```docker title="Dockerfile"
+FROM apify/actor-python-playwright:3.14
+
+# Install the Actor's Python dependencies.
+COPY requirements.txt ./
+RUN pip install -r requirements.txt
+
+# Download the browser binaries that Scrapling's browser fetchers need.
+RUN scrapling install
+
+# Copy in the source code and launch the Actor as a module.
+COPY . ./
+CMD ["python", "-m", "src"]
+```
+
+## Conclusion
+
+In this guide, you learned how to use Scrapling in your Apify Actors. You can now fetch pages with Scrapling's HTTP or browser-based fetchers, extract data with its CSS and XPath selectors, route requests through Apify Proxy, and run the whole thing on the Apify platform. To get started with your own scraping tasks, see the [Actor templates](https://apify.com/templates/categories/python). If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Scrapling: Official documentation](https://scrapling.readthedocs.io/)
+- [Scrapling: Fetchers](https://scrapling.readthedocs.io/en/latest/fetching/choosing/)
+- [Scrapling: Parsing and selecting elements](https://scrapling.readthedocs.io/en/latest/parsing/selection/)
+- [Scrapling: Adaptive parsing](https://scrapling.readthedocs.io/en/latest/parsing/adaptive.html)
+- [Scrapling: GitHub repository](https://github.com/D4Vinci/Scrapling)
+- [Apify: Proxy management](https://docs.apify.com/platform/proxy)
diff --git a/docs/03_guides/code/07_scrapling.py b/docs/03_guides/code/07_scrapling.py
new file mode 100644
index 00000000..7195165e
--- /dev/null
+++ b/docs/03_guides/code/07_scrapling.py
@@ -0,0 +1,122 @@
+import asyncio
+from typing import Any
+from urllib.parse import urlsplit
+
+from scrapling.fetchers import AsyncFetcher
+
+from apify import Actor, Request
+from apify.storages import RequestQueue
+
+
+async def scrape_page(
+ url: str,
+ *,
+ proxy_url: str | None = None,
+) -> tuple[dict[str, Any], list[str]]:
+ """Fetch a page with Scrapling's HTTP fetcher and return data and links."""
+ # `impersonate` and `stealthy_headers` make the request look like Chrome.
+ response = await AsyncFetcher.get(
+ url,
+ proxy=proxy_url,
+ impersonate='chrome',
+ stealthy_headers=True,
+ timeout=60,
+ )
+
+ data = {
+ 'url': url,
+ 'title': response.css('title::text').get(),
+ 'h1s': response.css('h1::text').getall(),
+ 'h2s': response.css('h2::text').getall(),
+ 'h3s': response.css('h3::text').getall(),
+ }
+
+ # Keep only absolute links on the same host.
+ links: list[str] = []
+ host = urlsplit(url).netloc
+ for href in response.css('a::attr(href)').getall():
+ link_url = response.urljoin(href)
+ if not link_url.startswith(('http://', 'https://')):
+ continue
+ if urlsplit(link_url).netloc == host:
+ links.append(link_url)
+
+ return data, links
+
+
+async def enqueue_links(
+ request_queue: RequestQueue,
+ links: list[str],
+ *,
+ depth: int,
+ max_depth: int,
+) -> None:
+ """Enqueue the links one level deeper, unless max_depth was reached."""
+ if depth >= max_depth:
+ return
+
+ for link_url in links:
+ Actor.log.info(f'Enqueuing {link_url} ...')
+ request = Request.from_url(link_url)
+ request.crawl_depth = depth + 1
+ await request_queue.add_request(request)
+
+
+async def main() -> None:
+ async with Actor:
+ # Read the Actor input.
+ actor_input = await Actor.get_input() or {}
+ start_urls = actor_input.get('startUrls', [{'url': 'https://crawlee.dev'}])
+ max_depth = actor_input.get('maxDepth', 1)
+
+ if not start_urls:
+ Actor.log.info('No start URLs specified in Actor input, exiting...')
+ await Actor.exit()
+
+ # Set up Apify Proxy and the request queue.
+ proxy_configuration = await Actor.create_proxy_configuration()
+ request_queue = await Actor.open_request_queue()
+
+ # Enqueue the start URLs (crawl depth defaults to 0).
+ for start_url in start_urls:
+ url = start_url.get('url')
+ Actor.log.info(f'Enqueuing start URL: {url}')
+ await request_queue.add_request(Request.from_url(url))
+
+ # Cap the crawl. Raise or remove the limit to follow more pages.
+ max_requests = 50
+ handled_requests = 0
+
+ while handled_requests < max_requests and (
+ request := await request_queue.fetch_next_request()
+ ):
+ handled_requests += 1
+ url = request.url
+ depth = request.crawl_depth
+ Actor.log.info(f'Scraping {url} (depth={depth}) ...')
+
+ try:
+ # Fresh proxy URL per request (None if no proxy).
+ proxy_url = None
+ if proxy_configuration:
+ proxy_url = await proxy_configuration.new_url()
+
+ data, links = await scrape_page(url, proxy_url=proxy_url)
+ await Actor.push_data(data)
+ Actor.log.info(
+ f'Stored data from {url} '
+ f'(title={data["title"]!r}, {len(links)} links found).'
+ )
+ await enqueue_links(
+ request_queue, links, depth=depth, max_depth=max_depth
+ )
+
+ except Exception:
+ Actor.log.exception(f'Cannot extract data from {url}.')
+
+ finally:
+ await request_queue.mark_request_as_handled(request)
+
+
+if __name__ == '__main__':
+ asyncio.run(main())
diff --git a/docs/03_guides/code/07_scrapling_browser.py b/docs/03_guides/code/07_scrapling_browser.py
new file mode 100644
index 00000000..8c9b63b6
--- /dev/null
+++ b/docs/03_guides/code/07_scrapling_browser.py
@@ -0,0 +1,119 @@
+import asyncio
+from typing import Any
+from urllib.parse import urlsplit
+
+from scrapling.fetchers import AsyncDynamicSession
+
+from apify import Actor, Request
+from apify.storages import RequestQueue
+
+
+async def scrape_page(
+ session: AsyncDynamicSession,
+ url: str,
+ *,
+ proxy_url: str | None = None,
+) -> tuple[dict[str, Any], list[str]]:
+ """Fetch a page through the shared browser session and return data and links."""
+ # `network_idle` waits until the page stops making network requests.
+ response = await session.fetch(url, proxy=proxy_url, network_idle=True)
+
+ data = {
+ 'url': url,
+ 'title': response.css('title::text').get(),
+ 'h1s': response.css('h1::text').getall(),
+ 'h2s': response.css('h2::text').getall(),
+ 'h3s': response.css('h3::text').getall(),
+ }
+
+ # Keep only absolute links on the same host.
+ links: list[str] = []
+ host = urlsplit(url).netloc
+ for href in response.css('a::attr(href)').getall():
+ link_url = response.urljoin(href)
+ if not link_url.startswith(('http://', 'https://')):
+ continue
+ if urlsplit(link_url).netloc == host:
+ links.append(link_url)
+
+ return data, links
+
+
+async def enqueue_links(
+ request_queue: RequestQueue,
+ links: list[str],
+ *,
+ depth: int,
+ max_depth: int,
+) -> None:
+ """Enqueue the links one level deeper, unless max_depth was reached."""
+ if depth >= max_depth:
+ return
+
+ for link_url in links:
+ Actor.log.info(f'Enqueuing {link_url} ...')
+ request = Request.from_url(link_url)
+ request.crawl_depth = depth + 1
+ await request_queue.add_request(request)
+
+
+async def main() -> None:
+ async with Actor:
+ # Read the Actor input.
+ actor_input = await Actor.get_input() or {}
+ start_urls = actor_input.get('startUrls', [{'url': 'https://crawlee.dev'}])
+ max_depth = actor_input.get('maxDepth', 1)
+
+ if not start_urls:
+ Actor.log.info('No start URLs specified in Actor input, exiting...')
+ await Actor.exit()
+
+ # Set up Apify Proxy and the request queue.
+ proxy_configuration = await Actor.create_proxy_configuration()
+ request_queue = await Actor.open_request_queue()
+
+ # Enqueue the start URLs (crawl depth defaults to 0).
+ for start_url in start_urls:
+ url = start_url.get('url')
+ Actor.log.info(f'Enqueuing start URL: {url}')
+ await request_queue.add_request(Request.from_url(url))
+
+ # Cap the crawl. Raise or remove the limit to follow more pages.
+ max_requests = 50
+ handled_requests = 0
+
+ # Open the browser once and reuse it for every page in the crawl.
+ async with AsyncDynamicSession(headless=True) as session:
+ while handled_requests < max_requests and (
+ request := await request_queue.fetch_next_request()
+ ):
+ handled_requests += 1
+ url = request.url
+ depth = request.crawl_depth
+ Actor.log.info(f'Scraping {url} (depth={depth}) ...')
+
+ try:
+ # Fresh proxy URL per request (None if no proxy).
+ proxy_url = None
+ if proxy_configuration:
+ proxy_url = await proxy_configuration.new_url()
+
+ data, links = await scrape_page(session, url, proxy_url=proxy_url)
+ await Actor.push_data(data)
+ Actor.log.info(
+ f'Stored data from {url} '
+ f'(title={data["title"]!r}, {len(links)} links found).'
+ )
+ await enqueue_links(
+ request_queue, links, depth=depth, max_depth=max_depth
+ )
+
+ except Exception:
+ Actor.log.exception(f'Cannot extract data from {url}.')
+
+ finally:
+ await request_queue.mark_request_as_handled(request)
+
+
+if __name__ == '__main__':
+ asyncio.run(main())
diff --git a/website/versioned_docs/version-3.4/01_introduction/index.mdx b/website/versioned_docs/version-3.4/01_introduction/index.mdx
index f0675fe8..bd41b55e 100644
--- a/website/versioned_docs/version-3.4/01_introduction/index.mdx
+++ b/website/versioned_docs/version-3.4/01_introduction/index.mdx
@@ -9,26 +9,42 @@ import CodeBlock from '@theme/CodeBlock';
import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
-The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform). With the SDK, you can:
-
-- Manage the Actor lifecycle: initialization, graceful shutdown, status messages, rebooting, and metamorphing.
-- Work with datasets, key-value stores, and request queues, with automatic local emulation when running outside the platform.
-- Read the Actor input, including automatic decryption of secret fields.
-- React to platform events (system info, migration, abort) and persist state across migrations and restarts.
-- Manage proxies, both [Apify Proxy](https://docs.apify.com/platform/proxy) and your own, with session and tiered-proxy support.
-- Start, call, and abort Actors and tasks, create webhooks, and reach the full Apify API client.
-- Charge users with the pay-per-event pricing model.
-- Integrate with [Crawlee](../guides/crawlee) and [Scrapy](../guides/scrapy), with guides for [Playwright](../guides/playwright) and others.
+The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform). It handles the Actor lifecycle, [storage](https://docs.apify.com/platform/storage) access, platform events, [Apify Proxy](https://docs.apify.com/platform/proxy), pay-per-event charging, and more.
{IntroductionExample}
-## What are Actors
+## What are Actors?
+
+Actors are serverless programs that can do almost anything. From simple scripts and web scrapers to complex automation workflows, AI agents, or even always-on services that expose HTTP endpoints.
+
+They can run either locally or on the Apify platform, where you can scale their execution, monitor runs, schedule tasks, integrate them with other services, or even publish and monetize them. If you're new to Apify, learn more about the platform in the [Apify documentation](https://docs.apify.com/platform/about).
+
+For more context, read the [Actor whitepaper](https://whitepaper.actor/).
+
+## Features
+
+- Run the full Actor lifecycle inside `async with Actor:`, covering init, exit, failures, status messages, rebooting, and metamorphing ([Actor lifecycle](../concepts/actor-lifecycle)).
+- Read Actor input validated against your input schema with `Actor.get_input()`, including automatic decryption of secret fields ([Actor input](../concepts/actor-input)).
+- Read and write datasets, key-value stores, and request queues, locally or on the platform ([Working with storages](../concepts/storages)).
+- React to platform events such as system info, migration, and abort, and persist state across migrations and restarts ([Actor events](../concepts/actor-events)).
+- Route requests through Apify Proxy with group selection, country targeting, and rotation, with session and tiered-proxy support ([Proxy management](../concepts/proxy-management)).
+- Start, call, and abort other Actors and tasks, and attach webhooks to run events ([Interacting with other Actors](../concepts/interacting-with-other-actors), [Webhooks](../concepts/webhooks)).
+- Monetize your Actor with pay-per-event charging ([Pay-per-event](../concepts/pay-per-event)).
+- Reach the full [Apify API](https://docs.apify.com/api/v2) through a preconfigured `ApifyClient` ([Accessing the Apify API](../concepts/access-apify-api)).
+
+## What you can build
+
+Almost any Python project can become an Actor, including projects for:
-Actors are serverless cloud programs capable of performing tasks in a web browser, similar to what a human can do. These tasks can range from simple operations, such as filling out forms or unsubscribing from services, to complex jobs like scraping and processing large numbers of web pages.
+- **Web scraping and crawling** - The SDK is fully compatible with [Crawlee](https://crawlee.dev/python), which makes Apify a natural place to deploy and scale your crawlers (see the [Crawlee guide](../guides/crawlee)). It also works with other popular scraping libraries, such as [Scrapy](../guides/scrapy), [Scrapling](../guides/scrapling), or [Crawl4AI](../guides/crawl4ai).
+- **Browser automation** - Drive a real browser with [Playwright](../guides/playwright) or [Selenium](../guides/selenium), or with higher-level tools such as [Browser Use](../guides/browser-use).
+- **Web servers and APIs** - Run a [web server](../guides/running-webserver) inside an Actor to serve HTTP requests, for example to expose your scraper as a live API.
+- **AI agents** - Host agents built with your framework of choice. Ready-made Actor templates cover [PydanticAI](https://apify.com/templates/python-pydanticai), [CrewAI](https://apify.com/templates/python-crewai), [LangGraph](https://apify.com/templates/python-langgraph), [LlamaIndex](https://apify.com/templates/python-llamaindex-agent), and [Smolagents](https://apify.com/templates/python-smolagents).
+- **MCP servers** - Deploy a Python MCP server as an Actor and make its tools available to any MCP client. See the [MCP server](https://apify.com/templates/python-mcp-empty) and [MCP proxy](https://apify.com/templates/python-mcp-proxy) templates.
-Actors can be executed locally or on the [Apify platform](https://docs.apify.com/platform). The Apify platform lets you run Actors at scale and provides features for monitoring, scheduling, publishing, and monetizing them.
+Whatever you build, the Apify SDK doesn't lock you into a particular framework. Bring the libraries you already use, and let Apify run your project in the cloud.
## Quick start
diff --git a/website/versioned_docs/version-3.4/01_introduction/quick-start.mdx b/website/versioned_docs/version-3.4/01_introduction/quick-start.mdx
index 27738ace..a26a0c83 100644
--- a/website/versioned_docs/version-3.4/01_introduction/quick-start.mdx
+++ b/website/versioned_docs/version-3.4/01_introduction/quick-start.mdx
@@ -107,6 +107,7 @@ To see how you can integrate the Apify SDK with popular scraping libraries and f
- [Browser automation with Selenium](../guides/selenium)
- [Building crawlers with Crawlee](../guides/crawlee)
- [Building crawlers with Scrapy](../guides/scrapy)
+- [Adaptive scraping with Scrapling](../guides/scrapling)
- [LLM-ready scraping with Crawl4AI](../guides/crawl4ai)
- [Browser AI agents with Browser Use](../guides/browser-use)
diff --git a/website/versioned_docs/version-3.4/03_guides/07_scrapling.mdx b/website/versioned_docs/version-3.4/03_guides/07_scrapling.mdx
new file mode 100644
index 00000000..8d384df2
--- /dev/null
+++ b/website/versioned_docs/version-3.4/03_guides/07_scrapling.mdx
@@ -0,0 +1,141 @@
+---
+id: scrapling
+title: Adaptive scraping with Scrapling
+description: Build an Apify Actor that scrapes web pages using the Scrapling adaptive web scraping library.
+---
+
+import CodeBlock from '@theme/CodeBlock';
+import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
+
+import ScraplingExample from '!!raw-loader!roa-loader!./code/07_scrapling.py';
+import ScraplingBrowserScraper from '!!raw-loader!./code/07_scrapling_browser.py';
+
+In this guide, you'll learn how to use the [Scrapling](https://scrapling.readthedocs.io/) library for adaptive web scraping in your Apify Actors.
+
+## Introduction
+
+[Scrapling](https://scrapling.readthedocs.io/) is an adaptive web scraping library for Python that combines fetching and parsing behind a single, high-level API. It can fetch a page with fast HTTP requests or with a real browser, parse the result with familiar CSS selectors and XPath, and relocate your selectors automatically when a website's structure changes.
+
+Scrapling is a great fit for Apify Actors:
+
+- A single API exposes a fast HTTP client with browser TLS-fingerprint impersonation, as well as full browser automation for JavaScript-heavy or protected pages.
+- Scrapling can remember the elements you scraped and find them again after a website redesign. Your scrapers keep working with fewer manual fixes.
+- Built-in stealth features (browser impersonation, realistic headers, and automatic Cloudflare Turnstile solving with the browser fetchers) help you avoid being blocked.
+- Elements are selected with CSS selectors (including the `::text` and `::attr()` pseudo-elements) or XPath, with a Scrapy/Parsel-like `.get()` and `.getall()` interface.
+- Every fetcher has an asynchronous variant, which integrates naturally with the asyncio-based Apify SDK.
+
+Scrapling's parser works on its own. The fetchers are an optional extra. To get the HTTP and browser fetchers, install Scrapling with the `fetchers` extra:
+
+```bash
+pip install "scrapling[fetchers]"
+```
+
+## Choosing a fetcher
+
+All of Scrapling's fetchers are importable from `scrapling.fetchers`. Pick the one that matches the website you're scraping:
+
+- **`Fetcher` / `AsyncFetcher`** - Plain HTTP requests via `.get()`, `.post()`, `.put()`, and `.delete()`. Fast and lightweight, with optional browser TLS-fingerprint impersonation (`impersonate`) and realistic headers (`stealthy_headers`). This is the best choice for static pages and APIs, and it doesn't need browser binaries.
+- **`DynamicFetcher` / `DynamicSession`** - Full browser automation based on [Playwright](https://playwright.dev/), for pages that require JavaScript rendering or interaction. Fetch a page with `.fetch()` or its async variant `.async_fetch()`.
+- **`StealthyFetcher` / `StealthySession`** - A stealth-hardened browser fetcher that can automatically solve Cloudflare Turnstile challenges (`solve_cloudflare=True`). Use it for the most heavily protected websites.
+
+The returned `Response` object is also a Scrapling selector, so you can call `.css()`, `.xpath()`, `.find_all()`, and the other parsing methods on it directly.
+
+The HTTP fetchers work with just the `scrapling[fetchers]` extra. The browser-based fetchers (`DynamicFetcher` and `StealthyFetcher`) additionally need browser binaries, which you download with the `scrapling install` command. See [Running browser-based fetchers](#running-browser-based-fetchers).
+
+The example Actor in this guide uses the HTTP `AsyncFetcher`, which is the simplest to deploy and pairs well with Apify Proxy.
+
+## Example Actor
+
+The following Actor recursively scrapes data from linked pages on the same site, up to a user-defined maximum depth, starting from the URLs in the Actor input. It uses Scrapling's `AsyncFetcher` to fetch each page through [Apify Proxy](https://docs.apify.com/platform/proxy), and CSS selectors to extract the title, headings, and links.
+
+The whole Actor fits in a single file. A `scrape_page` helper holds the Scrapling-specific fetching and parsing, while the `main` coroutine handles the [Actor](https://docs.apify.com/platform/actors) lifecycle, reads the input, sets up [Apify Proxy](https://docs.apify.com/platform/proxy) and the [request queue](https://docs.apify.com/platform/storage/request-queue), and drives the crawl:
+
+
+ {ScraplingExample}
+
+
+Note that:
+
+- Keeping the fetching and parsing in `scrape_page` separates the Scrapling-specific code from the Actor's orchestration logic. The function returns the extracted data together with the discovered links, so `main` decides what to store and what to enqueue.
+- The response of `AsyncFetcher.get` is a Scrapling selector, so `response.css('title::text').get()` reads the page title and `response.css('a::attr(href)').getall()` returns every link's `href` in one call.
+- `response.urljoin(link_href)` resolves relative links against the page URL, so you can enqueue them directly.
+- The `impersonate='chrome'` and `stealthy_headers=True` options make the request look like it comes from a real Chrome browser. Combined with Apify Proxy, it reduces the chance of being blocked.
+
+## Adaptive selectors
+
+The example above uses plain CSS selectors. Scrapling can also track the elements you scrape and relocate them when a website changes its markup, so a redesign doesn't immediately break your scraper. This is most useful for scrapers that revisit the same pages over time, rather than one-off crawls.
+
+1. Enable adaptive matching once on the fetcher:
+
+ ```python
+ AsyncFetcher.configure(adaptive=True)
+ ```
+
+2. On the first run, pass `auto_save=True` when you select an element. Scrapling records a fingerprint of that element, keyed by the selector:
+
+ ```python
+ title = response.css('h1.product-title::text', auto_save=True).get()
+ ```
+
+3. On a later run, if the selector no longer matches because the page changed, pass `adaptive=True` with the same selector. Scrapling uses the saved fingerprint to find the element in its new location:
+
+ ```python
+ title = response.css('h1.product-title::text', adaptive=True).get()
+ ```
+
+Scrapling keeps these fingerprints in a local SQLite database. On the Apify platform the Actor's filesystem doesn't persist between runs, so to keep them across runs, store that database in a [key-value store](https://docs.apify.com/platform/storage/key-value-store) and restore it on startup. For details, see [Scrapling's adaptive parsing documentation](https://scrapling.readthedocs.io/en/latest/parsing/adaptive.html).
+
+## Using Apify Proxy
+
+Running on the Apify platform gives your scraper access to [Apify Proxy](https://docs.apify.com/platform/proxy), which rotates IP addresses to avoid rate limiting and blocking. In the example above, `main` creates a proxy configuration with `Actor.create_proxy_configuration` and passes a fresh proxy URL to `scrape_page` for every request, which forwards it to Scrapling's `proxy` argument.
+
+Scrapling accepts the proxy as a URL string (for example `http://user:pass@proxy.apify.com:8000`), which is what `ProxyConfiguration.new_url` returns. To select specific proxy groups or a country, pass the relevant arguments to `Actor.create_proxy_configuration`. For details, see [Proxy management](../concepts/proxy-management). The browser-based fetchers accept the same `proxy` argument.
+
+## Running browser-based fetchers
+
+`DynamicFetcher` and `StealthyFetcher` drive a real browser, so they need the browser binaries installed with the `scrapling install` command. Locally, run it once after installing the `scrapling[fetchers]` extra:
+
+```bash
+scrapling install
+```
+
+To switch the example from HTTP to a real browser, fetch each page through a browser session instead of `AsyncFetcher`. Opening a fresh browser for every page would be wasteful, so `main` enters an `AsyncDynamicSession` once and reuses it for the whole crawl, while `scrape_page` fetches with `session.fetch`. The parsing API is identical, so the extraction code stays the same:
+
+
+ {ScraplingBrowserScraper}
+
+
+Note that:
+
+- `AsyncDynamicSession` launches one browser and keeps it open across `session.fetch` calls, so the crawl doesn't pay the browser-startup cost on every page.
+- The proxy URL is passed per fetch, so each page can go through a fresh Apify Proxy IP while sharing the same browser.
+
+To run this on the Apify platform, build on top of the [Apify Playwright base image](https://hub.docker.com/r/apify/actor-python-playwright), which already ships a browser together with all of its system-level dependencies. Then run `scrapling install` during the Docker build to download the browser binaries that Scrapling expects:
+
+```docker title="Dockerfile"
+FROM apify/actor-python-playwright:3.14
+
+# Install the Actor's Python dependencies.
+COPY requirements.txt ./
+RUN pip install -r requirements.txt
+
+# Download the browser binaries that Scrapling's browser fetchers need.
+RUN scrapling install
+
+# Copy in the source code and launch the Actor as a module.
+COPY . ./
+CMD ["python", "-m", "src"]
+```
+
+## Conclusion
+
+In this guide, you learned how to use Scrapling in your Apify Actors. You can now fetch pages with Scrapling's HTTP or browser-based fetchers, extract data with its CSS and XPath selectors, route requests through Apify Proxy, and run the whole thing on the Apify platform. To get started with your own scraping tasks, see the [Actor templates](https://apify.com/templates/categories/python). If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy scraping!
+
+## Additional resources
+
+- [Scrapling: Official documentation](https://scrapling.readthedocs.io/)
+- [Scrapling: Fetchers](https://scrapling.readthedocs.io/en/latest/fetching/choosing/)
+- [Scrapling: Parsing and selecting elements](https://scrapling.readthedocs.io/en/latest/parsing/selection/)
+- [Scrapling: Adaptive parsing](https://scrapling.readthedocs.io/en/latest/parsing/adaptive.html)
+- [Scrapling: GitHub repository](https://github.com/D4Vinci/Scrapling)
+- [Apify: Proxy management](https://docs.apify.com/platform/proxy)
diff --git a/website/versioned_docs/version-3.4/03_guides/code/07_scrapling.py b/website/versioned_docs/version-3.4/03_guides/code/07_scrapling.py
new file mode 100644
index 00000000..7195165e
--- /dev/null
+++ b/website/versioned_docs/version-3.4/03_guides/code/07_scrapling.py
@@ -0,0 +1,122 @@
+import asyncio
+from typing import Any
+from urllib.parse import urlsplit
+
+from scrapling.fetchers import AsyncFetcher
+
+from apify import Actor, Request
+from apify.storages import RequestQueue
+
+
+async def scrape_page(
+ url: str,
+ *,
+ proxy_url: str | None = None,
+) -> tuple[dict[str, Any], list[str]]:
+ """Fetch a page with Scrapling's HTTP fetcher and return data and links."""
+ # `impersonate` and `stealthy_headers` make the request look like Chrome.
+ response = await AsyncFetcher.get(
+ url,
+ proxy=proxy_url,
+ impersonate='chrome',
+ stealthy_headers=True,
+ timeout=60,
+ )
+
+ data = {
+ 'url': url,
+ 'title': response.css('title::text').get(),
+ 'h1s': response.css('h1::text').getall(),
+ 'h2s': response.css('h2::text').getall(),
+ 'h3s': response.css('h3::text').getall(),
+ }
+
+ # Keep only absolute links on the same host.
+ links: list[str] = []
+ host = urlsplit(url).netloc
+ for href in response.css('a::attr(href)').getall():
+ link_url = response.urljoin(href)
+ if not link_url.startswith(('http://', 'https://')):
+ continue
+ if urlsplit(link_url).netloc == host:
+ links.append(link_url)
+
+ return data, links
+
+
+async def enqueue_links(
+ request_queue: RequestQueue,
+ links: list[str],
+ *,
+ depth: int,
+ max_depth: int,
+) -> None:
+ """Enqueue the links one level deeper, unless max_depth was reached."""
+ if depth >= max_depth:
+ return
+
+ for link_url in links:
+ Actor.log.info(f'Enqueuing {link_url} ...')
+ request = Request.from_url(link_url)
+ request.crawl_depth = depth + 1
+ await request_queue.add_request(request)
+
+
+async def main() -> None:
+ async with Actor:
+ # Read the Actor input.
+ actor_input = await Actor.get_input() or {}
+ start_urls = actor_input.get('startUrls', [{'url': 'https://crawlee.dev'}])
+ max_depth = actor_input.get('maxDepth', 1)
+
+ if not start_urls:
+ Actor.log.info('No start URLs specified in Actor input, exiting...')
+ await Actor.exit()
+
+ # Set up Apify Proxy and the request queue.
+ proxy_configuration = await Actor.create_proxy_configuration()
+ request_queue = await Actor.open_request_queue()
+
+ # Enqueue the start URLs (crawl depth defaults to 0).
+ for start_url in start_urls:
+ url = start_url.get('url')
+ Actor.log.info(f'Enqueuing start URL: {url}')
+ await request_queue.add_request(Request.from_url(url))
+
+ # Cap the crawl. Raise or remove the limit to follow more pages.
+ max_requests = 50
+ handled_requests = 0
+
+ while handled_requests < max_requests and (
+ request := await request_queue.fetch_next_request()
+ ):
+ handled_requests += 1
+ url = request.url
+ depth = request.crawl_depth
+ Actor.log.info(f'Scraping {url} (depth={depth}) ...')
+
+ try:
+ # Fresh proxy URL per request (None if no proxy).
+ proxy_url = None
+ if proxy_configuration:
+ proxy_url = await proxy_configuration.new_url()
+
+ data, links = await scrape_page(url, proxy_url=proxy_url)
+ await Actor.push_data(data)
+ Actor.log.info(
+ f'Stored data from {url} '
+ f'(title={data["title"]!r}, {len(links)} links found).'
+ )
+ await enqueue_links(
+ request_queue, links, depth=depth, max_depth=max_depth
+ )
+
+ except Exception:
+ Actor.log.exception(f'Cannot extract data from {url}.')
+
+ finally:
+ await request_queue.mark_request_as_handled(request)
+
+
+if __name__ == '__main__':
+ asyncio.run(main())
diff --git a/website/versioned_docs/version-3.4/03_guides/code/07_scrapling_browser.py b/website/versioned_docs/version-3.4/03_guides/code/07_scrapling_browser.py
new file mode 100644
index 00000000..8c9b63b6
--- /dev/null
+++ b/website/versioned_docs/version-3.4/03_guides/code/07_scrapling_browser.py
@@ -0,0 +1,119 @@
+import asyncio
+from typing import Any
+from urllib.parse import urlsplit
+
+from scrapling.fetchers import AsyncDynamicSession
+
+from apify import Actor, Request
+from apify.storages import RequestQueue
+
+
+async def scrape_page(
+ session: AsyncDynamicSession,
+ url: str,
+ *,
+ proxy_url: str | None = None,
+) -> tuple[dict[str, Any], list[str]]:
+ """Fetch a page through the shared browser session and return data and links."""
+ # `network_idle` waits until the page stops making network requests.
+ response = await session.fetch(url, proxy=proxy_url, network_idle=True)
+
+ data = {
+ 'url': url,
+ 'title': response.css('title::text').get(),
+ 'h1s': response.css('h1::text').getall(),
+ 'h2s': response.css('h2::text').getall(),
+ 'h3s': response.css('h3::text').getall(),
+ }
+
+ # Keep only absolute links on the same host.
+ links: list[str] = []
+ host = urlsplit(url).netloc
+ for href in response.css('a::attr(href)').getall():
+ link_url = response.urljoin(href)
+ if not link_url.startswith(('http://', 'https://')):
+ continue
+ if urlsplit(link_url).netloc == host:
+ links.append(link_url)
+
+ return data, links
+
+
+async def enqueue_links(
+ request_queue: RequestQueue,
+ links: list[str],
+ *,
+ depth: int,
+ max_depth: int,
+) -> None:
+ """Enqueue the links one level deeper, unless max_depth was reached."""
+ if depth >= max_depth:
+ return
+
+ for link_url in links:
+ Actor.log.info(f'Enqueuing {link_url} ...')
+ request = Request.from_url(link_url)
+ request.crawl_depth = depth + 1
+ await request_queue.add_request(request)
+
+
+async def main() -> None:
+ async with Actor:
+ # Read the Actor input.
+ actor_input = await Actor.get_input() or {}
+ start_urls = actor_input.get('startUrls', [{'url': 'https://crawlee.dev'}])
+ max_depth = actor_input.get('maxDepth', 1)
+
+ if not start_urls:
+ Actor.log.info('No start URLs specified in Actor input, exiting...')
+ await Actor.exit()
+
+ # Set up Apify Proxy and the request queue.
+ proxy_configuration = await Actor.create_proxy_configuration()
+ request_queue = await Actor.open_request_queue()
+
+ # Enqueue the start URLs (crawl depth defaults to 0).
+ for start_url in start_urls:
+ url = start_url.get('url')
+ Actor.log.info(f'Enqueuing start URL: {url}')
+ await request_queue.add_request(Request.from_url(url))
+
+ # Cap the crawl. Raise or remove the limit to follow more pages.
+ max_requests = 50
+ handled_requests = 0
+
+ # Open the browser once and reuse it for every page in the crawl.
+ async with AsyncDynamicSession(headless=True) as session:
+ while handled_requests < max_requests and (
+ request := await request_queue.fetch_next_request()
+ ):
+ handled_requests += 1
+ url = request.url
+ depth = request.crawl_depth
+ Actor.log.info(f'Scraping {url} (depth={depth}) ...')
+
+ try:
+ # Fresh proxy URL per request (None if no proxy).
+ proxy_url = None
+ if proxy_configuration:
+ proxy_url = await proxy_configuration.new_url()
+
+ data, links = await scrape_page(session, url, proxy_url=proxy_url)
+ await Actor.push_data(data)
+ Actor.log.info(
+ f'Stored data from {url} '
+ f'(title={data["title"]!r}, {len(links)} links found).'
+ )
+ await enqueue_links(
+ request_queue, links, depth=depth, max_depth=max_depth
+ )
+
+ except Exception:
+ Actor.log.exception(f'Cannot extract data from {url}.')
+
+ finally:
+ await request_queue.mark_request_as_handled(request)
+
+
+if __name__ == '__main__':
+ asyncio.run(main())