Skip to content

Commit

Permalink
realtime enrichment stuff
Browse files Browse the repository at this point in the history
Signed-off-by: 🐼 Samrose Ahmed 🐼 <[email protected]>
  • Loading branch information
Samrose-Ahmed committed May 5, 2023
1 parent 6f68278 commit 5fdc6c2
Show file tree
Hide file tree
Showing 6 changed files with 128 additions and 1 deletion.
Binary file added blog/2023-05-04-realtime-enrichment/enr1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/2023-05-04-realtime-enrichment/enr2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
88 changes: 88 additions & 0 deletions blog/2023-05-04-realtime-enrichment/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
title: "New - Streaming realtime enrichment in Matano"
authors: "samrose"
keywords: ["enrichment"]
tags: [announcement]
---

<head>
<meta name="twitter:creator" content="@AhmedSamrose" />
</head>

Matano now supports realtime streaming enrichment for log sources, allowing you to enrich your data in realtime as it is ingested into Matano. This powerful new feature allows you to add contextual information directly into your data without the need for a join or lookup later on.

<!-- truncate -->

## Enrichment overview

Enrichment in Matano refers to adding contextual information to your data. This can be anything from adding a user's name to adding geolocation information based on an IP address. Enrichment allows you to add context to your data, making it easier to understand and analyze. Matano already supports enrichment, in the form of [enrichment tables](https://www.matano.dev/docs/enrichment/overview).

<div style={{ width: "75%", margin: "0 auto", textAlign: "center" }}>

![](./enr1.png)

_Previously available enrichment mechanisms_

</div>

Previously, these enrichment tables were only ingested into Apache Iceberg tables, allowing you to perform SQL joins, as well as being made available inside Python detections using a lookup helper method. While this were powerful features, they come with some limitations, as they require a separate lookup, which can cause performance issues and ergonomic challenges, especially in SQL as it requires writing a join even for a simple lookup.

## How realtime enrichment works

Realtime enrichment works by allowing you to add enrichment data into your data during the _transformation_ step. As a recap, Matano contains an embedded transformation engine that allows you to write transformation scripts using the Vector Remap Language (VRL). This transformation is run in realtime as data is ingested into Matano.

<div style={{ width: "75%", margin: "0 auto", textAlign: "center" }}>

![](./enr2.png)

_Realtime streaming enrichment_

</div>

The new realtime enrichment feature works by adding a new function available inside your VRL transformation scripts, called `get_enrichment_table_record`. This function allows you to lookup a value by a key from an enrichment table.

The realtime enrichment feature is designed to be highly performant. The enrichment data is stored in a highly optimized custom format, and is cached in memory for fast lookups.

Because this lookup happens during the transformation step, the enrichment data is added directly into your data, allowing you to access it directly without having to perform any lookups or joins.

## How to use realtime enrichment

To use realtime enrichment, we use the `get_enrichment_table_record` function inside our VRL transformation scripts. This function takes two arguments, the first is the name of the enrichment table, and the second is the value to lookup. The function returns a record, which can be used to access the fields inside the enrichment table.

For example, let's say we have an enrichment table called `users`, which contains the following fields (the enrichment table has a single lookup key on `user_id`):

- `user_id`
- `user_name`
- `user_email`

We can use the `get_enrichment_table_record` function to lookup a user's name, given their user ID, like so:

```go
user_info = get_enrichment_table_record("users", user_id)
```

## An example of using realtime enrichment

Let's look at a concrete example. Say we have a log source that contains the following fields:

- `user_id`
- `ip_address`
- `timestamp`

We want to enrich this data with the user's name, as well as the user's email address. We have an enrichment table called `users`, with a single lookup key on `user_id`, which contains the following fields:

- `user_id`
- `user_name`
- `user_email`

We can use the `get_enrichment_table_record` function to lookup the user's name and email address, and add two new fields to our data, `user.name` and `user.email`, containing the user's name and email address, like so:

```go
user_info = get_enrichment_table_record("users", .user_id)
.user.name = user_info.user_name
.user.email = user_info.user_email
```

## Get started

You can start using the realtime enrichment feature today. Read the complete reference documentation [here](https://www.matano.dev/docs/enrichment/realtime-data-enrichment). We'll also be further expanding our enrichment capabilities in the future, including dedicated support for geolocation and IP address enrichment. Stay tuned for more updates!
8 changes: 8 additions & 0 deletions docs/enrichment/lookup-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ sidebar_position: 3

You can use enrichment tables to lookup data in the Matano system, including in Python detections.

## Using enrichment tables

You can use enrichment tables in three ways.

* **Query enrichment Iceberg tables** - All enrichment tables are ingested into Apache Iceberg tables. You can directly query and join these enrichment tables during analysis and investigation.
* **Lookup in Python detections** - You can use enrichment tables inside Python detections to lookup records. [Read more on enrichment in Python detections](../detections/enrichment).
* **Realtime data enrichment** - You can use enrichment tables to enrich your data directly during the transformation step, using a special VRL function. Read more on [realtime data enrichment](./realtime-data-enrichment).

## Specifying lookup keys

To specify which column keys you can lookup data based on, use the `lookup_keys` property in your `enrichment.yml`. Matano supports looking up data based on multiple columns. Specify an array of column names in your `enrichment.yml` to enable lookup:
Expand Down
32 changes: 32 additions & 0 deletions docs/enrichment/realtime-data-enrichment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: Realtime data enrichment
sidebar_position: 4
---

You can use enrichment tables to directly enrich incoming data being ingested from a log source into Matano, during the transformation step. This lets you add fields to your data based on an enrichment table lookup.

The enrichment is done using a VRL function named `get_enrichment_table_record`. You can use this function inside your VRL transforms to lookup a record in an enrichment table based on a key and add fields to your data.

## Function reference

The `get_enrichment_table_record` function takes two arguments. The first argument is always the name of the enrichment table you want to lookup from. The second argument specifies the lookup - it is either a string, in the case of an enrichment table with only one lookup key, or an key value object in the case of an enrichment table with multiple lookup keys.

### Look up data for an enrichment table with a single lookup key

To lookup an enrichment table record from an enrichment table that has only one lookup key, pass the first argument as the enrichment table name and the second argument as the value of the lookup key.

For example, if we have a `users` enrichment table that has a sole `user_id` lookup key, we would lookup the record in our VRL transform as follows:

```go
user_info = get_enrichment_table_record("users", .userid)
```

### Look up data for an enrichment table with multiple lookup keys

To lookup an enrichment table record from an enrichment table that has multiple lookup keys, pass the first argument as the enrichment table name and the second argument as an object with the key as the lookup key name and the value as the lookup key value.

For example, if we have a `users` enrichment table that has two lookup keys, `user_id` and `user_email`, we would lookup the record in our VRL transform as follows:

```go
user_info = get_enrichment_table_record("users", { "user_id": .userid })
```
1 change: 0 additions & 1 deletion src/pages/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,6 @@ const SubscribeForm = () => {
onClick={(e) => {
e.preventDefault();
handleSubmit(fields);
console.log("done!");
}}
>
Join the Waitlist
Expand Down

1 comment on commit 5fdc6c2

@vercel
Copy link

@vercel vercel bot commented on 5fdc6c2 May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.