Apache.Druid.Querying
0.7.7
See the version list below for details.
dotnet add package Apache.Druid.Querying --version 0.7.7
NuGet\Install-Package Apache.Druid.Querying -Version 0.7.7
<PackageReference Include="Apache.Druid.Querying" Version="0.7.7" />
paket add Apache.Druid.Querying --version 0.7.7
#r "nuget: Apache.Druid.Querying, 0.7.7"
// Install Apache.Druid.Querying as a Cake Addin #addin nuget:?package=Apache.Druid.Querying&version=0.7.7 // Install Apache.Druid.Querying as a Cake Tool #tool nuget:?package=Apache.Druid.Querying&version=0.7.7
Apache Druid client library/micro-orm for dotnet 6+ inspired by EF Core.
https://www.nuget.org/packages/Apache.Druid.Querying
Setup
To make your Druid data sources available for querying create a class deriving from Apache.Druid.Querying.DataSourceProvider
. The class represents collection of data sources available for querying similarily to how EfCore
's DbContext
represents collection of database tables. The class contains methods Table
, Lookup
and Inline
which you can use to create instances of Apache.Druid.Querying.DataSource
(similar to EfCore
's DbSet
) which in turn turn can be used of querying. The instances are thread safe and so can be used for executing multiple queries at the same time. Some of the DataSource
creating methods require parameter id
which corresponds to id of related Druid
data source.
The method Table
additionally requires generic parameter TSource
depicting a row of your table data, similarily to how EfCore
's Entities
depict database rows. The type's public properties correspond to the data source columns.
By default TSource
property names map 1-to-1 into Druid
data source column names. This can be overriden in two ways:
- By decorating
TSource
withApache.Druid.Querying.DataSourceNamingConvention
attribute. The convention will applied to allTSource
's property names. - By decorating
TSource
's properties withApache.Druid.Querying.DataSourceColumn
attribute. The string parameter passed to the attrubute will become the data source column name. As mostDruid
data source contain column__time
for convenience there exists attributeApache.Druid.Querying.DataSourceTimeColumn
equivalent toApache.Druid.Querying.DataSourceColumn("__time")
.
[DataSourceColumnNamingConvention.CamelCase]
public record Edit(
[property: DataSourceTimeColumn] DateTimeOffset Timestamp,
bool IsRobot,
string Channel,
string Flags,
bool IsUnpatrolled,
string Page,
[property: DataSourceColumn("diffUrl")] string DiffUri,
int Added,
string Comment,
int CommentLength,
bool IsNew,
bool IsMinor,
int Delta,
bool IsAnonymous,
string User,
int DeltaBucket,
int Deleted,
string Namespace,
string CityName,
string CountryName,
string? RegionIsoCode,
int? MetroCode,
string? CountryIsoCode,
string? RegionName);
public class WikipediaDataSourceProvider : DataSourceProvider
{
public WikipediaDataSourceProvider()
{
// Druid's example wikipedia edits data source.
Edits = Table<Edit>("wikipedia");
}
public DataSource<Edit> Edits { get; }
}
Then connect up your data source provider to a depency injection framework of your choice:
Querying
Choose query type and models representing query's data using nested types of Apache.Druid.Querying.Query<TSource>
. Create a query by instantiating chosen nested type. Set query data by calling the instance methods. The methods often accept Expression<Delegate>
, using which given an object representing input data available at that point in a query and an object representing all possible operations on that input data, you create an object representing results of your chosen operations. To get an idea on what's possible it's best to look into project's tests. The queries have been designed so as much information as possible is available complie time. Wherever possible, the query results have been "flattened" so they are streamed to consumers as soon as possible.
Currently available query types:
- TimeSeries
- TopN
- GroupBy
- Scan (currently missing option to specifiy a subset of columns)
- DataSourceMetadata
// Getting DataSourceProvider from dependency injection container.
private static WikipediaDataSourceProvider Wikipedia
=> Services.GetRequiredService<WikipediaDataSourceProvider>();
private record Aggregations(int Count, int TotalAdded);
private record PostAggregations(double AverageAdded);
public void ExampleTimeSeries()
{
var query = new Query<Edit>
.TimeSeries
.WithNoVirtualColumns
.WithAggregations<Aggregations>
.WithPostAggregations<PostAggregations>()
.Order(OrderDirection.Descending)
.Aggregations(type => new Aggregations( // Explicitly stating data types in the methods for the sake of clarity in the example. Query is able to infer them.
type.Count(),
type.Sum((Edit edit) => edit.Added)))
.PostAggregations(type => new PostAggregations(type.Arithmetic(
ArithmeticFunction.Divide,
type.FieldAccess(aggregations => aggregations.TotalAdded),
type.FieldAccess(aggregations => aggregations.Count))))
.Filter(type => type.Selector(edit => edit.CountryIsoCode, "US"))
.Interval(new(DateTimeOffset.UtcNow, DateTimeOffset.UtcNow.AddDays(1)))
.Granularity(Granularity.Hour)
.Context(new QueryContext.TimeSeries() { SkipEmptyBuckets = true });
var json = Wikipedia.Edits.MapQueryToJson(query); // Use MapQueryToJson to look up query's json representation.
IAsyncEnumerable<WithTimestamp<Aggregations_PostAggregations<Aggregations, PostAggregations>>> results
= Wikipedia.Edits.ExecuteQuery(query);
}
Data types
In Apache Druid operations on data have multiple "variants". Which variant you may want to choose in which query depends on:
- Data type of column used in the operation.
- Expected result of the operation.
For example, to perform a sum over some column's values, you may use:
- doubleSum
- floatSum
- longSum.
Most often though, you want the operation to match your column's data type. For this reason, such operations have been "merged" into one, accepting optional parameter of type SimpleDataType
. Given example of operation Sum
:
<table>
<thead>
<tr>
<th>Apache.Druid.Querying</th>
<th>Apache Druid</th>
</tr>
</thead>
<tbody>
<tr>
<td>
query
.Aggregations(type => new(
type.Sum(edit => edit.Added, SimpleDataType.Double)));
</td>
<td>
{
"aggregations": [
{
"type": "doubleSum",
"name": "TotalAdded",
"fieldName": "added"
}
]
}
</td> </tr> <tr> <td>
query
.Aggregations(type => new(
type.Sum(edit => edit.Added, SimpleDataType.Float)));
</td> <td>
{
"aggregations": [
{
"type": "floatSum",
"name": "TotalAdded",
"fieldName": "added"
}
]
}
</td> </tr> <tr> <td>
query
.Aggregations(type => new(
type.Sum(edit => edit.Added, SimpleDataType.Long)));
</td> <td>
{
"aggregations": [
{
"type": "longSum",
"name": "TotalAdded",
"fieldName": "added"
}
]
}
</td> </tr> </tbody> </table>
In case SimpleDataType
has not been specified, the library will infer it from related property type with following logic:
<table>
<thead>
<tr>
<th>Property type</th>
<th>Druid data type</th>
</tr>
</thead>
<tbody>
<tr>
<td>string, Guid, char, Uri, Enum</td>
<td>String</td>
</tr>
<tr>
<td>double</td>
<td>Double</td>
</tr>
<tr>
<td>float</td>
<td>Float</td>
</tr>
<tr>
<td>short, int, long, DateTime, DateTimeOffset</td>
<td>Long</td>
</tr>
<tr>
<td>Nullable<T></td>
<td>Result of type inference on T</td>
</tr>
<tr>
<td>IEnumerable<T></td>
<td>Array<Result of type inference on T></td>
</tr>
<tr>
<td>If property type does not match any above types</td>
<td>Complex<json></td>
</tr>
</tbody>
</table>
Druid expressions
The library accepts Druid expressions in form of a delegate where given object representing data available at that point in a query you are supposed to return an interpolated string using $ where each string's parameter is either:
- a property of object representing data, which will get mapped to approporiate column
- a constant, which will get converted to a string.
Passing any other parameters will result in an InvalidOperationException
being thrown upon execution of the query.
Refering to objects representing data
You can refer objects representing your query data in two way:
- by its properties, resulting in library mapping them to Druid columns
- by it as a whole, resulting in library mapping whole the object to a column.
This means the following queries will give you equivalent results.
record Aggregations(int AddedSum);
var first = new Query<Edit>
.TimeSeries
.WithNoVirtualColumns
.WithAggregations<Aggregations>()
.Aggregations(type => new(
type.Sum(data => edit.Added)));
var second = new Query<Edit>
.TimeSeries
.WithNoVirtualColumns
.WithAggregations<int>()
.Aggregations(type => type.Sum(data => edit.Added));
Query result deserialization
The library deserializes query results using System.Text.Json. The deserializer has been tweaked in following ways:
- applied
System.Text.Json.JsonSerializerDefaults.Web
DateTime
andDateTimeOffset
can additionaly be deserialized from unix timestampsbool
can additionally be deserialized from "true", "false', "True" and "False" string literals in quotes.
Get the tweaked serializer options by calling Apache.Druid.Querying.Json.DefaultSerializerOptions.Create()
.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net6.0
- No dependencies.
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Apache.Druid.Querying:
Package | Downloads |
---|---|
Apache.Druid.Querying.Microsoft.Extensions.DependencyInjection
Integrates Apache.Druid.Querying with Microsoft.Extensions.DependencyInjection. |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
1.3.1 | 116 | 10/24/2024 |
1.3.0 | 170 | 9/24/2024 |
1.2.0 | 183 | 5/5/2024 |
1.0.3-alpha.0.1 | 54 | 4/30/2024 |
1.0.2 | 125 | 4/29/2024 |
1.0.1 | 121 | 4/24/2024 |
1.0.0 | 141 | 4/7/2024 |
0.7.9 | 138 | 3/30/2024 |
0.7.7 | 141 | 3/23/2024 |
0.7.6 | 136 | 3/23/2024 |
0.7.6-alpha.0.1 | 59 | 3/23/2024 |
0.7.5 | 146 | 3/23/2024 |
0.7.4 | 144 | 3/22/2024 |
0.7.3 | 135 | 3/15/2024 |
0.7.2 | 144 | 3/11/2024 |
0.7.0 | 154 | 2/21/2024 |