PageProbe 1.0.7
See the version list below for details.
dotnet add package PageProbe --version 1.0.7
NuGet\Install-Package PageProbe -Version 1.0.7
<PackageReference Include="PageProbe" Version="1.0.7" />
<PackageVersion Include="PageProbe" Version="1.0.7" />
<PackageReference Include="PageProbe" />
paket add PageProbe --version 1.0.7
#r "nuget: PageProbe, 1.0.7"
#addin nuget:?package=PageProbe&version=1.0.7
#tool nuget:?package=PageProbe&version=1.0.7
PageProbe
PageProbe is a .NET-based web crawling library designed to monitor and extract content from statically generated websites. It enables developers, IT students, and enthusiasts to gather and export data such as links, media, metadata, multimedia, text, or even price information. The library supports both real-time and scheduled crawling, with the ability to store results and compare differences over time.
Features
- Modular and extensible architecture using interfaces and models
- Extract links, images, videos, metadata, and plain text from webpages
- Specialized price crawler with regex-based rule support
- Export to multiple formats (JSON, CSV, XML, TXT, Markdown)
- Snapshot comparisons for detecting changes
- Console and file-based output
- Asynchronous support for crawling over time
Installation
1. NuGet Package (Recommended)
Install-Package PageProbe
2. Clone Repository
git clone https://dev.azure.com/emilberglund/_git/rammeverk_gruppe2
Then add the project to your solution and reference it in your code:
using PageProbe;
.NET Requirement: Ensure you are using .NET 8.0 or newer.
Project Structure
Main Components
🕷 Crawlers
BaseCrawler
: Core logic for HTML parsing and extractionPriceCrawler
: ExtendsBaseCrawler
, specialized for extracting price data using defined regex rules
🧩 Interfaces
ILinkCrawler
: Link extractionIMediaCrawler
: Image/media extractionIMetadataCrawler
: Meta tags (title, description, keywords)IMultimediaCrawler
: Video/audio/iframe extractionITextCrawler
: Plain text extractionIExporter
,IExportDifferences
: Export results and diffs
📦 Models
CrawlResult
: Snapshot of crawled contentCrawlDifferences
: Differences between crawl snapshotsPriceExtractionRule
: Regex-based rule for price detection
📁 File Handlers (Exporters)
JsonExporter
,CsvExporter
,MarkdownExporter
,TextExporter
,XmlExporter
: Export to corresponding file formats
Basic Usage
Crawl a URL and Export the Results
var crawler = new BaseCrawler();
var url = "https://example.no/";
var result = new CrawlResult
{
Url = url,
Title = crawler.GetMetaTitle(url),
Description = crawler.GetMetaDescription(url),
Keywords = crawler.GetMetaKeywords(url).Split(',', StringSplitOptions.RemoveEmptyEntries).ToList(),
Links = crawler.GetLinks(url),
Images = crawler.GetImages(url),
Multimedia = crawler.GetMultimedia(url),
Text = crawler.GetText(url)
};
var exporter = new JsonExporter();
exporter.Export(result, "results");
Export Asynchronously
await exporter.ExportAsync(result, "results-async");
Show Links in Console
crawler.GetLinks(url).ToList().ForEach(Console.WriteLine);
Full Documentation
See the full documentation here for architecture, API reference, and advanced usage.
Build & Test
Build the Solution
- Open in Visual Studio
- Press
Ctrl + Shift + B
Run Tests
dotnet test
Contributing
As this is an internal project:
- Use
feature/*
branches - Follow .NET conventions
- Write meaningful commit messages
License
This project is for internal use only and not distributed publicly.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Microsoft.Extensions.Logging (>= 9.0.4)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.4)
- Microsoft.Extensions.Logging.Console (>= 9.0.4)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.