PageProbe 1.0.7

There is a newer version of this package available.
See the version list below for details.
dotnet add package PageProbe --version 1.0.7
                    
NuGet\Install-Package PageProbe -Version 1.0.7
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="PageProbe" Version="1.0.7" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="PageProbe" Version="1.0.7" />
                    
Directory.Packages.props
<PackageReference Include="PageProbe" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add PageProbe --version 1.0.7
                    
#r "nuget: PageProbe, 1.0.7"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#addin nuget:?package=PageProbe&version=1.0.7
                    
Install PageProbe as a Cake Addin
#tool nuget:?package=PageProbe&version=1.0.7
                    
Install PageProbe as a Cake Tool

PageProbe

PageProbe is a .NET-based web crawling library designed to monitor and extract content from statically generated websites. It enables developers, IT students, and enthusiasts to gather and export data such as links, media, metadata, multimedia, text, or even price information. The library supports both real-time and scheduled crawling, with the ability to store results and compare differences over time.


Features

  • Modular and extensible architecture using interfaces and models
  • Extract links, images, videos, metadata, and plain text from webpages
  • Specialized price crawler with regex-based rule support
  • Export to multiple formats (JSON, CSV, XML, TXT, Markdown)
  • Snapshot comparisons for detecting changes
  • Console and file-based output
  • Asynchronous support for crawling over time

Installation

Install-Package PageProbe

2. Clone Repository

git clone https://dev.azure.com/emilberglund/_git/rammeverk_gruppe2

Then add the project to your solution and reference it in your code:

using PageProbe;

.NET Requirement: Ensure you are using .NET 8.0 or newer.


Project Structure

Main Components

🕷 Crawlers
  • BaseCrawler: Core logic for HTML parsing and extraction
  • PriceCrawler: Extends BaseCrawler, specialized for extracting price data using defined regex rules
🧩 Interfaces
  • ILinkCrawler: Link extraction
  • IMediaCrawler: Image/media extraction
  • IMetadataCrawler: Meta tags (title, description, keywords)
  • IMultimediaCrawler: Video/audio/iframe extraction
  • ITextCrawler: Plain text extraction
  • IExporter, IExportDifferences: Export results and diffs
📦 Models
  • CrawlResult: Snapshot of crawled content
  • CrawlDifferences: Differences between crawl snapshots
  • PriceExtractionRule: Regex-based rule for price detection
📁 File Handlers (Exporters)
  • JsonExporter, CsvExporter, MarkdownExporter, TextExporter, XmlExporter: Export to corresponding file formats

Basic Usage

Crawl a URL and Export the Results

var crawler = new BaseCrawler();
var url = "https://example.no/";

var result = new CrawlResult
{
    Url = url,
    Title = crawler.GetMetaTitle(url),
    Description = crawler.GetMetaDescription(url),
    Keywords = crawler.GetMetaKeywords(url).Split(',', StringSplitOptions.RemoveEmptyEntries).ToList(),
    Links = crawler.GetLinks(url),
    Images = crawler.GetImages(url),
    Multimedia = crawler.GetMultimedia(url),
    Text = crawler.GetText(url)
};

var exporter = new JsonExporter();
exporter.Export(result, "results");

Export Asynchronously

await exporter.ExportAsync(result, "results-async");
crawler.GetLinks(url).ToList().ForEach(Console.WriteLine);

Full Documentation

See the full documentation here for architecture, API reference, and advanced usage.


Build & Test

Build the Solution

  1. Open in Visual Studio
  2. Press Ctrl + Shift + B

Run Tests

dotnet test

Contributing

As this is an internal project:

  • Use feature/* branches
  • Follow .NET conventions
  • Write meaningful commit messages

License

This project is for internal use only and not distributed publicly.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
2.0.0 132 5/27/2025
1.0.9 135 5/25/2025
1.0.8 140 5/25/2025
1.0.7 139 5/25/2025
1.0.1 70 5/23/2025
1.0.0 300 5/23/2025 1.0.0 is deprecated.