Soenneker.Playwrights.Crawler
4.0.17
Prefix Reserved
dotnet add package Soenneker.Playwrights.Crawler --version 4.0.17
NuGet\Install-Package Soenneker.Playwrights.Crawler -Version 4.0.17
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Soenneker.Playwrights.Crawler" Version="4.0.17" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Soenneker.Playwrights.Crawler" Version="4.0.17" />
<PackageReference Include="Soenneker.Playwrights.Crawler" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Soenneker.Playwrights.Crawler --version 4.0.17
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Soenneker.Playwrights.Crawler, 4.0.17"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Soenneker.Playwrights.Crawler@4.0.17
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Soenneker.Playwrights.Crawler&version=4.0.17
#tool nuget:?package=Soenneker.Playwrights.Crawler&version=4.0.17
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
Soenneker.Playwrights.Crawler
A configurable Playwright crawler for mirroring sites to disk with support for:
- HTML-only or full resource capture
- crawl limits by depth, page count, duration, and storage
- same-host restrictions with optional cross-origin asset capture
- throttling, retries, slow mode, and cooldown behavior
- optional stealth launch/context settings
Installation
dotnet add package Soenneker.Playwrights.Crawler
Register With DI
using Microsoft.Extensions.DependencyInjection;
using Soenneker.Playwrights.Crawler.Registrars;
var services = new ServiceCollection();
services.AddLogging();
services.AddPlaywrightCrawlerAsSingleton();
Use AddPlaywrightCrawlerAsScoped() if you prefer a scoped lifetime.
Basic Usage
using Soenneker.Playwrights.Crawler.Abstract;
using Soenneker.Playwrights.Crawler.Dtos;
using Soenneker.Playwrights.Crawler.Enums;
IPlaywrightCrawler crawler = serviceProvider.GetRequiredService<IPlaywrightCrawler>();
PlaywrightCrawlResult result = await crawler.Crawl(new PlaywrightCrawlOptions
{
Url = "https://example.com",
SaveDirectory = @"C:\temp\example",
Mode = PlaywrightCrawlMode.Full,
MaxDepth = 2,
ClearSaveDirectory = true,
SameHostOnly = true
});
Advanced Example
using Soenneker.Playwrights.Crawler.Abstract;
using Soenneker.Playwrights.Crawler.Dtos;
using Soenneker.Playwrights.Crawler.Enums;
using Soenneker.Playwrights.Extensions.Stealth.Options;
PlaywrightCrawlResult result = await crawler.Crawl(new PlaywrightCrawlOptions
{
Url = "https://example.com",
SaveDirectory = @"C:\temp\example",
Mode = PlaywrightCrawlMode.Full,
MaxDepth = 2,
MaxPages = 50,
MaxStorageBytes = 250_000_000,
MaxDuration = TimeSpan.FromMinutes(10),
SameHostOnly = true,
IgnoreQueryStringsInDuplicateDetection = true,
FormatHtml = true,
IncludeCrossOriginAssets = true,
RewriteCrossOriginAssetUrls = true,
ClearSaveDirectory = true,
OverwriteExistingFiles = true,
Headless = true,
UseStealth = true,
NavigationTimeoutMs = 45_000,
PostNavigationDelayMs = 0,
ContinueOnPageError = true,
StealthLaunchOptions = new StealthLaunchOptions
{
IgnoreDetectableDefaultArguments = true
},
StealthContextOptions = new StealthContextOptions
{
NormalizeDocumentHeaders = true,
EnableCdpDomainHardening = false
},
Policy = new PlaywrightCrawlPolicy
{
GlobalMaxConcurrency = 20,
PerDomainMaxConcurrency = 2,
PerIpMaxConcurrency = 2,
MinimumDelayBetweenRequestsMs = 750,
DelayJitterMaxMs = 500,
RequestTimeoutMs = 30_000,
MaxRetries = 4
}
});
Modes
HtmlOnly
Saves only rendered HTML documents discovered during the crawl.
Full
Saves:
- rendered HTML documents
- same-origin network resources observed while pages load
- optional cross-origin assets under
_externalwhenIncludeCrossOriginAssets = true - optional rewriting of cross-origin asset URLs in saved HTML when
RewriteCrossOriginAssetUrls = true
Key Options
| Option | Description |
|---|---|
Url |
Required absolute http or https root URL. |
SaveDirectory |
Required output directory for mirrored content. |
MaxDepth |
Link depth to follow from the root page. 0 crawls only the starting page. |
MaxPages |
Optional hard cap on visited pages. |
MaxStorageBytes |
Optional hard cap on bytes written to disk. |
MaxDuration |
Optional maximum crawl duration. |
SameHostOnly |
Restricts queued pages to the same host as the root URL. |
IgnoreQueryStringsInDuplicateDetection |
Treats query-string variants as the same page when detecting duplicates. |
FormatHtml |
Formats saved HTML documents with Soenneker.Html.Formatter when true. Defaults to false. |
IncludeCrossOriginAssets |
In Full mode, saves cross-origin resources under _external. |
RewriteCrossOriginAssetUrls |
Rewrites saved HTML so captured cross-origin asset URLs point at the local _external copy. Requires IncludeCrossOriginAssets. |
ClearSaveDirectory |
Deletes the output directory before crawling. |
OverwriteExistingFiles |
Controls whether existing files can be replaced. |
Headless |
Runs Chromium headlessly when true. |
UseStealth |
Enables the Soenneker stealth Playwright extensions. |
NavigationTimeoutMs |
Navigation timeout per page. |
PostNavigationDelayMs |
Extra delay after navigation to allow late assets to settle. |
ContinueOnPageError |
Continues crawling after an individual page fails. |
Policy |
Crawl throttling, retries, concurrency, slow mode, and cooldown configuration. |
Result
Crawl() returns PlaywrightCrawlResult, which includes:
- crawl timing (
StartedAtUtc,CompletedAtUtc,Duration) - page counts (
PagesDiscovered,PagesVisited) - file counts (
HtmlFilesSaved,AssetFilesSaved) - total bytes written (
BytesWritten) - stop reasons (
StorageLimitReached,DurationLimitReached,PageLimitReached) - per-file details in
Files - page-level failures in
Errors
Output Layout
Saved files preserve URL structure so the output can be served by a simple static web server.
Examples:
https://example.com/→index.htmlhttps://example.com/docs/getting-started→docs/getting-started/index.htmlhttps://cdn.example.com/app.css→_external/cdn.example.com/app.csswhen cross-origin asset capture is enabled- a saved page can reference that asset as
../../_external/cdn.example.com/app.csswhen URL rewriting is enabled
Behavior Notes
- Playwright browser installation is ensured automatically before the crawl starts.
- Duplicate detection ignores query strings by default.
- HTML formatting is opt-in and uses
Soenneker.Html.FormatterwhenFormatHtml = true. - Challenge and captcha-like pages contribute to the crawler's blocking and slow-mode signals.
- Cross-origin URL rewriting only applies to captured cross-origin assets that are actually available on disk.
Fullmode captures resources observed during page loads, but the rewrite pass is limited to captured cross-origin asset URLs rather than a full offline-mirroring transform.- Some response types are intentionally skipped, such as empty bodies and certain framework/internal fetch endpoints.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net10.0
- Soenneker.Html.Formatter (>= 4.0.9)
- Soenneker.Playwrights.Extensions.Stealth (>= 4.0.62)
- Soenneker.Playwrights.Installation (>= 4.0.3)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 4.0.17 | 38 | 4/10/2026 |
| 4.0.16 | 40 | 4/10/2026 |
| 4.0.15 | 39 | 4/9/2026 |
| 4.0.14 | 75 | 4/7/2026 |
| 4.0.13 | 76 | 4/7/2026 |
| 4.0.12 | 75 | 4/7/2026 |
| 4.0.11 | 80 | 4/5/2026 |
| 4.0.10 | 83 | 4/3/2026 |
| 4.0.9 | 87 | 4/3/2026 |
| 4.0.8 | 84 | 4/1/2026 |
| 4.0.7 | 79 | 4/1/2026 |
| 4.0.6 | 80 | 3/31/2026 |
| 4.0.5 | 81 | 3/31/2026 |
| 4.0.3 | 81 | 3/31/2026 |
| 4.0.2 | 87 | 3/27/2026 |
| 4.0.1 | 87 | 3/27/2026 |