Soenneker.Playwrights.Crawler 4.0.17

Prefix Reserved
dotnet add package Soenneker.Playwrights.Crawler --version 4.0.17
                    
NuGet\Install-Package Soenneker.Playwrights.Crawler -Version 4.0.17
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Soenneker.Playwrights.Crawler" Version="4.0.17" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Soenneker.Playwrights.Crawler" Version="4.0.17" />
                    
Directory.Packages.props
<PackageReference Include="Soenneker.Playwrights.Crawler" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Soenneker.Playwrights.Crawler --version 4.0.17
                    
#r "nuget: Soenneker.Playwrights.Crawler, 4.0.17"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Soenneker.Playwrights.Crawler@4.0.17
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Soenneker.Playwrights.Crawler&version=4.0.17
                    
Install as a Cake Addin
#tool nuget:?package=Soenneker.Playwrights.Crawler&version=4.0.17
                    
Install as a Cake Tool

alternate text is missing from this package README image alternate text is missing from this package README image alternate text is missing from this package README image

alternate text is missing from this package README image Soenneker.Playwrights.Crawler

A configurable Playwright crawler for mirroring sites to disk with support for:

  • HTML-only or full resource capture
  • crawl limits by depth, page count, duration, and storage
  • same-host restrictions with optional cross-origin asset capture
  • throttling, retries, slow mode, and cooldown behavior
  • optional stealth launch/context settings

Installation

dotnet add package Soenneker.Playwrights.Crawler

Register With DI

using Microsoft.Extensions.DependencyInjection;
using Soenneker.Playwrights.Crawler.Registrars;

var services = new ServiceCollection();

services.AddLogging();
services.AddPlaywrightCrawlerAsSingleton();

Use AddPlaywrightCrawlerAsScoped() if you prefer a scoped lifetime.

Basic Usage

using Soenneker.Playwrights.Crawler.Abstract;
using Soenneker.Playwrights.Crawler.Dtos;
using Soenneker.Playwrights.Crawler.Enums;

IPlaywrightCrawler crawler = serviceProvider.GetRequiredService<IPlaywrightCrawler>();

PlaywrightCrawlResult result = await crawler.Crawl(new PlaywrightCrawlOptions
{
    Url = "https://example.com",
    SaveDirectory = @"C:\temp\example",
    Mode = PlaywrightCrawlMode.Full,
    MaxDepth = 2,
    ClearSaveDirectory = true,
    SameHostOnly = true
});

Advanced Example

using Soenneker.Playwrights.Crawler.Abstract;
using Soenneker.Playwrights.Crawler.Dtos;
using Soenneker.Playwrights.Crawler.Enums;
using Soenneker.Playwrights.Extensions.Stealth.Options;

PlaywrightCrawlResult result = await crawler.Crawl(new PlaywrightCrawlOptions
{
    Url = "https://example.com",
    SaveDirectory = @"C:\temp\example",
    Mode = PlaywrightCrawlMode.Full,
    MaxDepth = 2,
    MaxPages = 50,
    MaxStorageBytes = 250_000_000,
    MaxDuration = TimeSpan.FromMinutes(10),
    SameHostOnly = true,
    IgnoreQueryStringsInDuplicateDetection = true,
    FormatHtml = true,
    IncludeCrossOriginAssets = true,
    RewriteCrossOriginAssetUrls = true,
    ClearSaveDirectory = true,
    OverwriteExistingFiles = true,
    Headless = true,
    UseStealth = true,
    NavigationTimeoutMs = 45_000,
    PostNavigationDelayMs = 0,
    ContinueOnPageError = true,
    StealthLaunchOptions = new StealthLaunchOptions
    {
        IgnoreDetectableDefaultArguments = true
    },
    StealthContextOptions = new StealthContextOptions
    {
        NormalizeDocumentHeaders = true,
        EnableCdpDomainHardening = false
    },
    Policy = new PlaywrightCrawlPolicy
    {
        GlobalMaxConcurrency = 20,
        PerDomainMaxConcurrency = 2,
        PerIpMaxConcurrency = 2,
        MinimumDelayBetweenRequestsMs = 750,
        DelayJitterMaxMs = 500,
        RequestTimeoutMs = 30_000,
        MaxRetries = 4
    }
});

Modes

HtmlOnly

Saves only rendered HTML documents discovered during the crawl.

Full

Saves:

  • rendered HTML documents
  • same-origin network resources observed while pages load
  • optional cross-origin assets under _external when IncludeCrossOriginAssets = true
  • optional rewriting of cross-origin asset URLs in saved HTML when RewriteCrossOriginAssetUrls = true

Key Options

Option Description
Url Required absolute http or https root URL.
SaveDirectory Required output directory for mirrored content.
MaxDepth Link depth to follow from the root page. 0 crawls only the starting page.
MaxPages Optional hard cap on visited pages.
MaxStorageBytes Optional hard cap on bytes written to disk.
MaxDuration Optional maximum crawl duration.
SameHostOnly Restricts queued pages to the same host as the root URL.
IgnoreQueryStringsInDuplicateDetection Treats query-string variants as the same page when detecting duplicates.
FormatHtml Formats saved HTML documents with Soenneker.Html.Formatter when true. Defaults to false.
IncludeCrossOriginAssets In Full mode, saves cross-origin resources under _external.
RewriteCrossOriginAssetUrls Rewrites saved HTML so captured cross-origin asset URLs point at the local _external copy. Requires IncludeCrossOriginAssets.
ClearSaveDirectory Deletes the output directory before crawling.
OverwriteExistingFiles Controls whether existing files can be replaced.
Headless Runs Chromium headlessly when true.
UseStealth Enables the Soenneker stealth Playwright extensions.
NavigationTimeoutMs Navigation timeout per page.
PostNavigationDelayMs Extra delay after navigation to allow late assets to settle.
ContinueOnPageError Continues crawling after an individual page fails.
Policy Crawl throttling, retries, concurrency, slow mode, and cooldown configuration.

Result

Crawl() returns PlaywrightCrawlResult, which includes:

  • crawl timing (StartedAtUtc, CompletedAtUtc, Duration)
  • page counts (PagesDiscovered, PagesVisited)
  • file counts (HtmlFilesSaved, AssetFilesSaved)
  • total bytes written (BytesWritten)
  • stop reasons (StorageLimitReached, DurationLimitReached, PageLimitReached)
  • per-file details in Files
  • page-level failures in Errors

Output Layout

Saved files preserve URL structure so the output can be served by a simple static web server.

Examples:

  • https://example.com/index.html
  • https://example.com/docs/getting-starteddocs/getting-started/index.html
  • https://cdn.example.com/app.css_external/cdn.example.com/app.css when cross-origin asset capture is enabled
  • a saved page can reference that asset as ../../_external/cdn.example.com/app.css when URL rewriting is enabled

Behavior Notes

  • Playwright browser installation is ensured automatically before the crawl starts.
  • Duplicate detection ignores query strings by default.
  • HTML formatting is opt-in and uses Soenneker.Html.Formatter when FormatHtml = true.
  • Challenge and captcha-like pages contribute to the crawler's blocking and slow-mode signals.
  • Cross-origin URL rewriting only applies to captured cross-origin assets that are actually available on disk.
  • Full mode captures resources observed during page loads, but the rewrite pass is limited to captured cross-origin asset URLs rather than a full offline-mirroring transform.
  • Some response types are intentionally skipped, such as empty bodies and certain framework/internal fetch endpoints.
Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
4.0.17 38 4/10/2026
4.0.16 40 4/10/2026
4.0.15 39 4/9/2026
4.0.14 75 4/7/2026
4.0.13 76 4/7/2026
4.0.12 75 4/7/2026
4.0.11 80 4/5/2026
4.0.10 83 4/3/2026
4.0.9 87 4/3/2026
4.0.8 84 4/1/2026
4.0.7 79 4/1/2026
4.0.6 80 3/31/2026
4.0.5 81 3/31/2026
4.0.3 81 3/31/2026
4.0.2 87 3/27/2026
4.0.1 87 3/27/2026