Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDoc-2883 OpenTelemetry #1862

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[
{
"Path": "telegraf.markdown",
"Name": "Telegraf Plugin",
"DiscussionId": "f59c124a-b94a-4380-bff2-dcb1782ef1f6",
"Mappings": []
},
{
"Path": "prometheus.markdown",
"Name": "Prometheus",
"DiscussionId": "f59c124a-b94a-4380-bff2-dcb1782ef1f6",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same discussionId as in Telegraf? check with @reebhub if this is correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Danielle9897 can you take care of it?

Copy link
Member

@Danielle9897 Danielle9897 Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maciejaszyk just add @Danielle9897 as the collaborator to your repo so she'll be able to fix it here (no need to open another PR)

"Mappings": []
},
{
"Path": "opentelemetry.markdown",
"Name": "OpenTelemetry",
"DiscussionId": "TODO",
"Mappings": []
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# OpenTelemetry Support
---

{NOTE: }

* OpenTelemetry is a popular monitoring standard designated to help in the inspection and
administration of networks, infrastructures, databases, etc.

* RavenDB sends data metrics via an OpenTelemetry Protocol protocol,
allowing a OpenTelemetry retriever to scrape the data from RavenDB.

* A OpenTelemetry support is provided by RavenDB instances both on-premise and on the cloud.

* You can also retrieve data for OpenTelemetry collector from Prometheus endpoint.

{NOTE/}

---

{PANEL: OpenTelemetry}

OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software's performance and behavior. (description via [https://opentelemetry.io](https://opentelemetry.io))

RavenDB utilize official SDK and allows user to retrieve the metrics via OpenTelemetry protocol and much more!
Copy link
Member

@gregolsky gregolsky Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RavenDB utilize official SDK and allows user to retrieve the metrics via OpenTelemetry protocol and much more!
RavenDB utilizes the official OpenTelemetry SDK and allows user to retrieve the metrics via OpenTelemetry protocol and much more!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be reviewing this article and fixing all phrasing/grammar issues in:
https://issues.hibernatingrhinos.com/issue/RDoc-2944/Review-Open-Telemetry-documentation


{PANEL/}

{PANEL: RavenDB OpenTelemetry Metrics}

{INFO: How to turn on metrics in RavenDB}
To enable metrics in RavenDB, you need to set the configuration option `Monitoring.OpenTelemetry.Enabled` to `true`.
Please remember that to apply the changes, it is necessary to restart the RavenDB process.
{INFO/}

{INFO: Identifaction of nodes in metrices}
RavenDB exposes the node tag to identify metrics specific to machines in the instruments' instance tag.
{INFO/}

RavenDB exposes the following metrics:

| Name | Description |
|:----------------------------------| :--- |
| ravendb.server.general | Exposes general info about server |
| ravendb.server.requests | Exposes informations about requests processed by server |
| ravendb.server.storage | Exposes storage informations |
| ravendb.server.gc | Exposes detailed informations about Garbage Collector |
| ravendb.server.resources | Exposes detailed information about resources usage (e.g. CPU etc) |
| ravendb.server.totaldatabases | Exposes aggregated informations about databases on the server |
| ravendb.server.cpucredits | Exposes status of CPU credits (cloud) |

We also support exposing metrices developed by Microsoft for AspNetCore and also .NET Runtime.
More info about it can be found on official Microsoft documentations:
- [Runtime documentation](https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime#metrics)
- [AspNetCore documentation](https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/src/OpenTelemetry.Instrumentation.AspNetCore/README.md#metrics)

### Configuring meters
By default, only most commonly used meters are turned on, but this can be controlled via following configuration options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, only most commonly used meters are turned on, but this can be controlled via following configuration options:
By default, only most commonly used meters are enabled. This can be controlled via following configuration options:



| Configuration name | Meter name | Default value |
| :--- | :--- | :--- |
| Monitoring.OpenTelemetry.Meters.AspNetCore.Enabled | Official AspNetCore instrumentation | false |
| Monitoring.OpenTelemetry.Meters.Runtime.Enabled | Official Runtime instrumentation | false |
| Monitoring.OpenTelemetry.Meters.Server.Storage.Enabled | ravendb.server.storage | true |
| Monitoring.OpenTelemetry.Meters.Server.CPUCredits.Enabled | ravendb.server.cpucredits | false|
Copy link
Member

@gregolsky gregolsky Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using CPU in all caps here? in the configuration it's spelled "Cpu" everywhere I remember

| Monitoring.OpenTelemetry.Meters.Server.Resources.Enabled | ravendb.server.resources | true |
| Monitoring.OpenTelemetry.Meters.Server.TotalDatabases.Enabled | ravendb.server.totaldatabases | true |
| Monitoring.OpenTelemetry.Meters.Server.Requests.Enabled | ravendb.server.requests | true |
| Monitoring.OpenTelemetry.Meters.Server.GC.Enabled | ravendb.server.gc | false |

### Meters instruments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we exposing information about the client certificates expiration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

| Name | Description | Instrument type |
| :--- | :--- | :--- |
| ravendb.server.cpucredits.alert_raised | CPU Credits Any Alert Raised | Gauge |
| ravendb.server.cpucredits.background.tasks.alert_raised | CPU Credits Background Tasks Alert Raised | Gauge |
| ravendb.server.cpucredits.base | CPU Credits Base | UpDownCounter |
| ravendb.server.cpucredits.consumption_current | CPU Credits Gained Per Second | UpDownCounter |
| ravendb.server.cpucredits.failover.alert_raised | CPU Credits Failover Alert Raised | Gauge |
| ravendb.server.cpucredits.max | CPU Credits Max | UpDownCounter |
| ravendb.server.cpucredits.remaining | CPU Credits Remaining | Gauge |
| ravendb.server.gc.compacted | Specifies if this is a compacting GC or not. | Gauge |
| ravendb.server.gc.concurrent | Specifies if this is a concurrent GC or not. | Gauge |
| ravendb.server.gc.finalizationpendingcount | Gets the number of objects ready for finalization this GC observed. | Gauge |
| ravendb.server.gc.fragmented | Gets the total fragmentation (in MB) when the last garbage collection occurred. | Gauge |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does open telemetry suggests to use explicit units just like prometheus metrics?
See: https://prometheus.io/docs/practices/naming/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| ravendb.server.gc.gclohsize | Gets the large object heap size (in MB) after the last garbage collection of given kind occurred. | Gauge |
| ravendb.server.gc.generation | Gets the generation this GC collected. | Gauge |
| ravendb.server.gc.heapsize | Gets the total heap size (in MB) when the last garbage collection occurred. | Gauge |
| ravendb.server.gc.highmemoryloadthreshold | Gets the high memory load threshold (in MB) when the last garbage collection occurred. | Gauge |
| ravendb.server.gc.index | The index of this GC. | Gauge |
| ravendb.server.gc.memoryload | Gets the memory load (in MB) when the last garbage collection occurred. | Gauge |
| ravendb.server.gc.pausedurations1 | Gets the pause durations. First item in the array. | Gauge |
| ravendb.server.gc.pausedurations2 | Gets the pause durations. Second item in the array. | Gauge |
| ravendb.server.gc.pinnedobjectscount | Gets the number of pinned objects this GC observed. | Gauge |
| ravendb.server.gc.promoted | Gets the promoted MB for this GC. | Gauge |
| ravendb.server.gc.timepercentage | Gets the pause time percentage in the GC so far. | Gauge |
| ravendb.server.gc.totalavailablememory | Gets the total available memory (in MB) for the garbage collector to use when the last garbage collection occurred. | Gauge |
| ravendb.server.gc.totalcommited | Gets the total committed MB of the managed heap. | Gauge |
| ravendb.server.general.certificate_server_certificate_expiration_left_seconds | Server certificate expiration left | Gauge |
| ravendb.server.general.cluster.index | Cluster index | UpDownCounter |
| ravendb.server.general.cluster.node.state | Current node state | UpDownCounter |
| ravendb.server.general.cluster.term | Cluster term | UpDownCounter |
| ravendb.server.general.license.cores.max | Server license max CPU cores | Gauge |
| ravendb.server.general.license.cpu.utilized | Server license utilized CPU cores | Gauge |
| ravendb.server.general.license.expiration_left_seconds | Server license expiration left | Gauge |
| ravendb.server.general.license.type | Server license type | Gauge |
| ravendb.server.resources.available_memory_for_processing | Available memory for processing \(in MB\) | Gauge |
| ravendb.server.resources.cpu.machine | Machine CPU usage in % | Gauge |
| ravendb.server.resources.cpu.process | Process CPU usage in % | Gauge |
| ravendb.server.resources.dirty_memory | Dirty Memory that is used by the scratch buffers in MB | Gauge |
| ravendb.server.resources.encryption_buffers.memory_in_pool | Server encryption buffers memory being in pool in MB | Gauge |
| ravendb.server.resources.encryption_buffers.memory_in_use | Server encryption buffers memory being in use in MB | Gauge |
| ravendb.server.resources.io_wait | IO wait in % | Gauge |
| ravendb.server.resources.low_memory_flag | Server low memory flag value | Gauge |
| ravendb.server.resources.machine.assigned_processor_count | Number of assigned processors on the machine | UpDownCounter |
| ravendb.server.resources.machine.processor_count | Number of processor on the machine | UpDownCounter |
| ravendb.server.resources.managed_memory | Server managed memory size in MB | Gauge |
| ravendb.server.resources.thread_pool.available_completion_port_threads | Number of available completion port threads in the thread pool | Gauge |
| ravendb.server.resources.thread_pool.available_worker_threads | Number of available worker threads in the thread pool | Gauge |
| ravendb.server.resources.total_memory | Server allocated memory in MB | Gauge |
| ravendb.server.resources.total.swap_usage | Server total swap usage in MB | Gauge |
| ravendb.server.resources.total.swap.size | Server total swap size in MB | Gauge |
| ravendb.server.resources.unmanaged_memory | Server unmanaged memory size in MB | Gauge |
| ravendb.server.resources.working_set_swap_usage | Server working set swap usage in MB | Gauge |
| ravendb.server.requests.requests.average_duration | Average request time in milliseconds | Gauge |
| ravendb.server.requests.requests.concurrent_requests | Number of concurrent requests | UpDownCounter |
| ravendb.server.requests.requests.per_second | Number of requests per second. | Gauge |
| ravendb.server.requests.tcp.active.connections | Number of active TCP connections | Gauge |
| ravendb.server.requests.total.requests | Total number of requests since server startup | UpDownCounter |
| ravendb.server.storage.storage.disk.ios.read_operations | IO read operations per second | Gauge |
| ravendb.server.storage.storage.disk.ios.write_operations | IO write operations per second | Gauge |
| ravendb.server.storage.storage.disk.queue_length | Queue length | Gauge |
| ravendb.server.storage.storage.disk.read_throughput | Read throughput in kilobytes per second | Gauge |
| ravendb.server.storage.storage.disk.remaining.space | Remaining server storage disk space in MB | Gauge |
| ravendb.server.storage.storage.disk.remaining.space_percentage | Remaining server storage disk space in % | Gauge |
| ravendb.server.storage.storage.disk.write_throughput | Write throughput in kilobytes per second | Gauge |
| ravendb.server.storage.storage.total_size | Server storage total size in MB | Gauge |
| ravendb.server.storage.storage.used_size | Server storage used size in MB | Gauge |
| ravendb.server.totaldatabases.count_stale_indexes | Number of stale indexes in all loaded databases | UpDownCounter |
| ravendb.server.totaldatabases.data.written.per_second | Number of bytes written \(documents, attachments, counters\) in all loaded databases | Gauge |
maciejaszyk marked this conversation as resolved.
Show resolved Hide resolved
| ravendb.server.totaldatabases.database.disabled_count | Number of disabled databases | UpDownCounter |
| ravendb.server.totaldatabases.database.encrypted_count | Number of encrypted databases | UpDownCounter |
| ravendb.server.totaldatabases.database.faulted_count | Number of faulted databases | UpDownCounter |
| ravendb.server.totaldatabases.database.loaded_count | Number of loaded databases | UpDownCounter |
| ravendb.server.totaldatabases.database.node_count | Number of databases for current node | UpDownCounter |
| ravendb.server.totaldatabases.database.total_count | Number of all databases | UpDownCounter |
| ravendb.server.totaldatabases.map_reduce.index.mapped_per_second | Number of maps per second for map-reduce indexes \(one minute rate\) in all loaded databases | Gauge |
| ravendb.server.totaldatabases.map_reduce.index.reduced_per_second | Number of reduces per second for map-reduce indexes \(one minute rate\) in all loaded databases | Gauge |
| ravendb.server.totaldatabases.map.index.indexed_per_second | Number of indexed documents per second for map indexes \(one minute rate\) in all loaded databases | Gauge |
| ravendb.server.totaldatabases.number_error_indexes | Number of error indexes in all loaded databases | UpDownCounter |
| ravendb.server.totaldatabases.number_of_indexes | Number of indexes in all loaded databases | UpDownCounter |
| ravendb.server.totaldatabases.number.faulty_indexes | Number of faulty indexes in all loaded databases | UpDownCounter |
| ravendb.server.totaldatabases.writes_per_second | Number of writes \(documents, attachments, counters\) in all loaded databases | Gauge |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it include time series writes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from Snmp description, and it contains. Fix: ravendb/ravendb#19141

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be reflected in the docs as well, right?

Copy link
Member

@Danielle9897 Danielle9897 Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks


{PANEL/}

{PANEL: OpenTelemetry exporters}
{INFO: Exporters}
RavenDB currently supports two options for metrics export:

- OpenTelemetry Protocol
- Console

{INFO/}
### Console
All metrices will be printed in RavenDB console. This is useful for local development and debugging purposes

### OpenTelemetryProtocol
Official protocol for OpenTelemetry is supported by default. You can export your metrices to the software that support this protocol. The suggested software, provided by OpenTelemetry authors is called OpenTelemetry Collector. It allows to gather all data from RavenDB and configure your favorite tools as retrievers of metrics.

Best source knowledge about its possibilities is the official documentation site: [https://opentelemetry.io/docs/collector/](https://opentelemetry.io/docs/collector/)

RavenDB by default is not overriding default values for OpenTelemetryProtocol exporter, however customization is available.
| Configuration key | Description | Accepted values |
| :--- | :--- | :--- |
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Endpoint | Endpoint where OpenTelemetryProtocol should sends data. | string |
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Protocol | Defines the protocol that OpenTelemetryProtocol should use to send data. | Grpc / HttpProtobuf |
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Headers | Custom headers | string |
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.ExportProcessorType | Export processor type | Simple / Batch |
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Timeout | Timeout | int |

{INFO: Setting protocol to HttpProtobuf}
Currently, official .NET implementation requires to provide complete path to the collector endpoint. By default for OpenTelemetry collector it is `/v1/metrics`.
For example, default OpenTelemetryCollector setting endpoint for `HttpProtobuf` is `http://localhost:4318/v1/metrics`.
{INFO/}

{PANEL/}

{PANEL: OpenTelemetry Collector}
### Configuring OpenTelemetry protocol in collector

{CODE-BLOCK: json}
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
http:
endpoint: localhost:4318

{CODE-BLOCK/}


### Prometheus endpoint as data source in collector
OpenTelemetryCollector contributors added support to retrieve metrices from prometheus. Our Prometheus endpoint provides metrices in a well-known format and it works as plug-in without requiring any custom configuration. An example configuration may look like this:

{CODE-BLOCK:json}
receivers:
prometheus_simple:
endpoint: "your_ravendb_server.run"
metrics_path: "/admin/monitoring/v1/prometheus"
collection_interval: 10s
tls:
cert_file: "D:\\cert.crt"
key_file: "D:\\key.key"
insecure: false
insecure_skip_verify: false
{CODE-BLOCK/}
{PANEL/}

Loading