-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDoc-2883 OpenTelemetry #1862
RDoc-2883 OpenTelemetry #1862
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
[ | ||
{ | ||
"Path": "telegraf.markdown", | ||
"Name": "Telegraf Plugin", | ||
"DiscussionId": "f59c124a-b94a-4380-bff2-dcb1782ef1f6", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "prometheus.markdown", | ||
"Name": "Prometheus", | ||
"DiscussionId": "f59c124a-b94a-4380-bff2-dcb1782ef1f6", | ||
"Mappings": [] | ||
}, | ||
{ | ||
"Path": "opentelemetry.markdown", | ||
"Name": "OpenTelemetry", | ||
"DiscussionId": "TODO", | ||
"Mappings": [] | ||
} | ||
] |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,219 @@ | ||||||
# OpenTelemetry Support | ||||||
--- | ||||||
|
||||||
{NOTE: } | ||||||
|
||||||
* OpenTelemetry is a popular monitoring standard designated to help in the inspection and | ||||||
administration of networks, infrastructures, databases, etc. | ||||||
|
||||||
* RavenDB sends data metrics via an OpenTelemetry Protocol protocol, | ||||||
allowing a OpenTelemetry retriever to scrape the data from RavenDB. | ||||||
|
||||||
* A OpenTelemetry support is provided by RavenDB instances both on-premise and on the cloud. | ||||||
|
||||||
* You can also retrieve data for OpenTelemetry collector from Prometheus endpoint. | ||||||
|
||||||
{NOTE/} | ||||||
|
||||||
--- | ||||||
|
||||||
{PANEL: OpenTelemetry} | ||||||
|
||||||
OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software's performance and behavior. (description via [https://opentelemetry.io](https://opentelemetry.io)) | ||||||
|
||||||
RavenDB utilize official SDK and allows user to retrieve the metrics via OpenTelemetry protocol and much more! | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will be reviewing this article and fixing all phrasing/grammar issues in: |
||||||
|
||||||
{PANEL/} | ||||||
|
||||||
{PANEL: RavenDB OpenTelemetry Metrics} | ||||||
|
||||||
{INFO: How to turn on metrics in RavenDB} | ||||||
To enable metrics in RavenDB, you need to set the configuration option `Monitoring.OpenTelemetry.Enabled` to `true`. | ||||||
Please remember that to apply the changes, it is necessary to restart the RavenDB process. | ||||||
{INFO/} | ||||||
|
||||||
{INFO: Identifaction of nodes in metrices} | ||||||
RavenDB exposes the node tag to identify metrics specific to machines in the instruments' instance tag. | ||||||
{INFO/} | ||||||
|
||||||
RavenDB exposes the following metrics: | ||||||
|
||||||
| Name | Description | | ||||||
|:----------------------------------| :--- | | ||||||
| ravendb.server.general | Exposes general info about server | | ||||||
| ravendb.server.requests | Exposes informations about requests processed by server | | ||||||
| ravendb.server.storage | Exposes storage informations | | ||||||
| ravendb.server.gc | Exposes detailed informations about Garbage Collector | | ||||||
| ravendb.server.resources | Exposes detailed information about resources usage (e.g. CPU etc) | | ||||||
| ravendb.server.totaldatabases | Exposes aggregated informations about databases on the server | | ||||||
| ravendb.server.cpucredits | Exposes status of CPU credits (cloud) | | ||||||
|
||||||
We also support exposing metrices developed by Microsoft for AspNetCore and also .NET Runtime. | ||||||
More info about it can be found on official Microsoft documentations: | ||||||
- [Runtime documentation](https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime#metrics) | ||||||
- [AspNetCore documentation](https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/src/OpenTelemetry.Instrumentation.AspNetCore/README.md#metrics) | ||||||
|
||||||
### Configuring meters | ||||||
By default, only most commonly used meters are turned on, but this can be controlled via following configuration options: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|
||||||
| Configuration name | Meter name | Default value | | ||||||
| :--- | :--- | :--- | | ||||||
| Monitoring.OpenTelemetry.Meters.AspNetCore.Enabled | Official AspNetCore instrumentation | false | | ||||||
| Monitoring.OpenTelemetry.Meters.Runtime.Enabled | Official Runtime instrumentation | false | | ||||||
| Monitoring.OpenTelemetry.Meters.Server.Storage.Enabled | ravendb.server.storage | true | | ||||||
| Monitoring.OpenTelemetry.Meters.Server.CPUCredits.Enabled | ravendb.server.cpucredits | false| | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we using CPU in all caps here? in the configuration it's spelled "Cpu" everywhere I remember |
||||||
| Monitoring.OpenTelemetry.Meters.Server.Resources.Enabled | ravendb.server.resources | true | | ||||||
| Monitoring.OpenTelemetry.Meters.Server.TotalDatabases.Enabled | ravendb.server.totaldatabases | true | | ||||||
| Monitoring.OpenTelemetry.Meters.Server.Requests.Enabled | ravendb.server.requests | true | | ||||||
| Monitoring.OpenTelemetry.Meters.Server.GC.Enabled | ravendb.server.gc | false | | ||||||
|
||||||
### Meters instruments | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are we exposing information about the client certificates expiration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No |
||||||
| Name | Description | Instrument type | | ||||||
| :--- | :--- | :--- | | ||||||
| ravendb.server.cpucredits.alert_raised | CPU Credits Any Alert Raised | Gauge | | ||||||
| ravendb.server.cpucredits.background.tasks.alert_raised | CPU Credits Background Tasks Alert Raised | Gauge | | ||||||
| ravendb.server.cpucredits.base | CPU Credits Base | UpDownCounter | | ||||||
| ravendb.server.cpucredits.consumption_current | CPU Credits Gained Per Second | UpDownCounter | | ||||||
| ravendb.server.cpucredits.failover.alert_raised | CPU Credits Failover Alert Raised | Gauge | | ||||||
| ravendb.server.cpucredits.max | CPU Credits Max | UpDownCounter | | ||||||
| ravendb.server.cpucredits.remaining | CPU Credits Remaining | Gauge | | ||||||
| ravendb.server.gc.compacted | Specifies if this is a compacting GC or not. | Gauge | | ||||||
| ravendb.server.gc.concurrent | Specifies if this is a concurrent GC or not. | Gauge | | ||||||
| ravendb.server.gc.finalizationpendingcount | Gets the number of objects ready for finalization this GC observed. | Gauge | | ||||||
| ravendb.server.gc.fragmented | Gets the total fragmentation (in MB) when the last garbage collection occurred. | Gauge | | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does open telemetry suggests to use explicit units just like prometheus metrics? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see it here: https://opentelemetry.io/docs/specs/semconv/general/metrics/ |
||||||
| ravendb.server.gc.gclohsize | Gets the large object heap size (in MB) after the last garbage collection of given kind occurred. | Gauge | | ||||||
| ravendb.server.gc.generation | Gets the generation this GC collected. | Gauge | | ||||||
| ravendb.server.gc.heapsize | Gets the total heap size (in MB) when the last garbage collection occurred. | Gauge | | ||||||
| ravendb.server.gc.highmemoryloadthreshold | Gets the high memory load threshold (in MB) when the last garbage collection occurred. | Gauge | | ||||||
| ravendb.server.gc.index | The index of this GC. | Gauge | | ||||||
| ravendb.server.gc.memoryload | Gets the memory load (in MB) when the last garbage collection occurred. | Gauge | | ||||||
| ravendb.server.gc.pausedurations1 | Gets the pause durations. First item in the array. | Gauge | | ||||||
| ravendb.server.gc.pausedurations2 | Gets the pause durations. Second item in the array. | Gauge | | ||||||
| ravendb.server.gc.pinnedobjectscount | Gets the number of pinned objects this GC observed. | Gauge | | ||||||
| ravendb.server.gc.promoted | Gets the promoted MB for this GC. | Gauge | | ||||||
| ravendb.server.gc.timepercentage | Gets the pause time percentage in the GC so far. | Gauge | | ||||||
| ravendb.server.gc.totalavailablememory | Gets the total available memory (in MB) for the garbage collector to use when the last garbage collection occurred. | Gauge | | ||||||
| ravendb.server.gc.totalcommited | Gets the total committed MB of the managed heap. | Gauge | | ||||||
| ravendb.server.general.certificate_server_certificate_expiration_left_seconds | Server certificate expiration left | Gauge | | ||||||
| ravendb.server.general.cluster.index | Cluster index | UpDownCounter | | ||||||
| ravendb.server.general.cluster.node.state | Current node state | UpDownCounter | | ||||||
| ravendb.server.general.cluster.term | Cluster term | UpDownCounter | | ||||||
| ravendb.server.general.license.cores.max | Server license max CPU cores | Gauge | | ||||||
| ravendb.server.general.license.cpu.utilized | Server license utilized CPU cores | Gauge | | ||||||
| ravendb.server.general.license.expiration_left_seconds | Server license expiration left | Gauge | | ||||||
| ravendb.server.general.license.type | Server license type | Gauge | | ||||||
| ravendb.server.resources.available_memory_for_processing | Available memory for processing \(in MB\) | Gauge | | ||||||
| ravendb.server.resources.cpu.machine | Machine CPU usage in % | Gauge | | ||||||
| ravendb.server.resources.cpu.process | Process CPU usage in % | Gauge | | ||||||
| ravendb.server.resources.dirty_memory | Dirty Memory that is used by the scratch buffers in MB | Gauge | | ||||||
| ravendb.server.resources.encryption_buffers.memory_in_pool | Server encryption buffers memory being in pool in MB | Gauge | | ||||||
| ravendb.server.resources.encryption_buffers.memory_in_use | Server encryption buffers memory being in use in MB | Gauge | | ||||||
| ravendb.server.resources.io_wait | IO wait in % | Gauge | | ||||||
| ravendb.server.resources.low_memory_flag | Server low memory flag value | Gauge | | ||||||
| ravendb.server.resources.machine.assigned_processor_count | Number of assigned processors on the machine | UpDownCounter | | ||||||
| ravendb.server.resources.machine.processor_count | Number of processor on the machine | UpDownCounter | | ||||||
| ravendb.server.resources.managed_memory | Server managed memory size in MB | Gauge | | ||||||
| ravendb.server.resources.thread_pool.available_completion_port_threads | Number of available completion port threads in the thread pool | Gauge | | ||||||
| ravendb.server.resources.thread_pool.available_worker_threads | Number of available worker threads in the thread pool | Gauge | | ||||||
| ravendb.server.resources.total_memory | Server allocated memory in MB | Gauge | | ||||||
| ravendb.server.resources.total.swap_usage | Server total swap usage in MB | Gauge | | ||||||
| ravendb.server.resources.total.swap.size | Server total swap size in MB | Gauge | | ||||||
| ravendb.server.resources.unmanaged_memory | Server unmanaged memory size in MB | Gauge | | ||||||
| ravendb.server.resources.working_set_swap_usage | Server working set swap usage in MB | Gauge | | ||||||
| ravendb.server.requests.requests.average_duration | Average request time in milliseconds | Gauge | | ||||||
| ravendb.server.requests.requests.concurrent_requests | Number of concurrent requests | UpDownCounter | | ||||||
| ravendb.server.requests.requests.per_second | Number of requests per second. | Gauge | | ||||||
| ravendb.server.requests.tcp.active.connections | Number of active TCP connections | Gauge | | ||||||
| ravendb.server.requests.total.requests | Total number of requests since server startup | UpDownCounter | | ||||||
| ravendb.server.storage.storage.disk.ios.read_operations | IO read operations per second | Gauge | | ||||||
| ravendb.server.storage.storage.disk.ios.write_operations | IO write operations per second | Gauge | | ||||||
| ravendb.server.storage.storage.disk.queue_length | Queue length | Gauge | | ||||||
| ravendb.server.storage.storage.disk.read_throughput | Read throughput in kilobytes per second | Gauge | | ||||||
| ravendb.server.storage.storage.disk.remaining.space | Remaining server storage disk space in MB | Gauge | | ||||||
| ravendb.server.storage.storage.disk.remaining.space_percentage | Remaining server storage disk space in % | Gauge | | ||||||
| ravendb.server.storage.storage.disk.write_throughput | Write throughput in kilobytes per second | Gauge | | ||||||
| ravendb.server.storage.storage.total_size | Server storage total size in MB | Gauge | | ||||||
| ravendb.server.storage.storage.used_size | Server storage used size in MB | Gauge | | ||||||
| ravendb.server.totaldatabases.count_stale_indexes | Number of stale indexes in all loaded databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.data.written.per_second | Number of bytes written \(documents, attachments, counters\) in all loaded databases | Gauge | | ||||||
maciejaszyk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| ravendb.server.totaldatabases.database.disabled_count | Number of disabled databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.database.encrypted_count | Number of encrypted databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.database.faulted_count | Number of faulted databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.database.loaded_count | Number of loaded databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.database.node_count | Number of databases for current node | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.database.total_count | Number of all databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.map_reduce.index.mapped_per_second | Number of maps per second for map-reduce indexes \(one minute rate\) in all loaded databases | Gauge | | ||||||
| ravendb.server.totaldatabases.map_reduce.index.reduced_per_second | Number of reduces per second for map-reduce indexes \(one minute rate\) in all loaded databases | Gauge | | ||||||
| ravendb.server.totaldatabases.map.index.indexed_per_second | Number of indexed documents per second for map indexes \(one minute rate\) in all loaded databases | Gauge | | ||||||
| ravendb.server.totaldatabases.number_error_indexes | Number of error indexes in all loaded databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.number_of_indexes | Number of indexes in all loaded databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.number.faulty_indexes | Number of faulty indexes in all loaded databases | UpDownCounter | | ||||||
| ravendb.server.totaldatabases.writes_per_second | Number of writes \(documents, attachments, counters\) in all loaded databases | Gauge | | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it include time series writes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is from Snmp description, and it contains. Fix: ravendb/ravendb#19141 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So it should be reflected in the docs as well, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added that in my latest docs PRs: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great. Thanks |
||||||
|
||||||
{PANEL/} | ||||||
|
||||||
{PANEL: OpenTelemetry exporters} | ||||||
{INFO: Exporters} | ||||||
RavenDB currently supports two options for metrics export: | ||||||
|
||||||
- OpenTelemetry Protocol | ||||||
- Console | ||||||
|
||||||
{INFO/} | ||||||
### Console | ||||||
All metrices will be printed in RavenDB console. This is useful for local development and debugging purposes | ||||||
|
||||||
### OpenTelemetryProtocol | ||||||
Official protocol for OpenTelemetry is supported by default. You can export your metrices to the software that support this protocol. The suggested software, provided by OpenTelemetry authors is called OpenTelemetry Collector. It allows to gather all data from RavenDB and configure your favorite tools as retrievers of metrics. | ||||||
|
||||||
Best source knowledge about its possibilities is the official documentation site: [https://opentelemetry.io/docs/collector/](https://opentelemetry.io/docs/collector/) | ||||||
|
||||||
RavenDB by default is not overriding default values for OpenTelemetryProtocol exporter, however customization is available. | ||||||
| Configuration key | Description | Accepted values | | ||||||
| :--- | :--- | :--- | | ||||||
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Endpoint | Endpoint where OpenTelemetryProtocol should sends data. | string | | ||||||
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Protocol | Defines the protocol that OpenTelemetryProtocol should use to send data. | Grpc / HttpProtobuf | | ||||||
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Headers | Custom headers | string | | ||||||
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.ExportProcessorType | Export processor type | Simple / Batch | | ||||||
| Monitoring.OpenTelemetry.OpenTelemetryProtocol.Timeout | Timeout | int | | ||||||
|
||||||
{INFO: Setting protocol to HttpProtobuf} | ||||||
Currently, official .NET implementation requires to provide complete path to the collector endpoint. By default for OpenTelemetry collector it is `/v1/metrics`. | ||||||
For example, default OpenTelemetryCollector setting endpoint for `HttpProtobuf` is `http://localhost:4318/v1/metrics`. | ||||||
{INFO/} | ||||||
|
||||||
{PANEL/} | ||||||
|
||||||
{PANEL: OpenTelemetry Collector} | ||||||
### Configuring OpenTelemetry protocol in collector | ||||||
|
||||||
{CODE-BLOCK: json} | ||||||
receivers: | ||||||
otlp: | ||||||
protocols: | ||||||
grpc: | ||||||
endpoint: localhost:4317 | ||||||
http: | ||||||
endpoint: localhost:4318 | ||||||
|
||||||
{CODE-BLOCK/} | ||||||
|
||||||
|
||||||
### Prometheus endpoint as data source in collector | ||||||
OpenTelemetryCollector contributors added support to retrieve metrices from prometheus. Our Prometheus endpoint provides metrices in a well-known format and it works as plug-in without requiring any custom configuration. An example configuration may look like this: | ||||||
|
||||||
{CODE-BLOCK:json} | ||||||
receivers: | ||||||
prometheus_simple: | ||||||
endpoint: "your_ravendb_server.run" | ||||||
metrics_path: "/admin/monitoring/v1/prometheus" | ||||||
collection_interval: 10s | ||||||
tls: | ||||||
cert_file: "D:\\cert.crt" | ||||||
key_file: "D:\\key.key" | ||||||
insecure: false | ||||||
insecure_skip_verify: false | ||||||
{CODE-BLOCK/} | ||||||
{PANEL/} | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same discussionId as in Telegraf? check with @reebhub if this is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maciejaszyk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Danielle9897 can you take care of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a fix for that
Will include the fix in the PR for:
https://issues.hibernatingrhinos.com/issue/RDoc-2944/Review-Open-Telemetry-documentation
(that PR will replace this one)
The root cause is that the 'Monitoring' directory was not created correctly in the docs.json file in version 5.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maciejaszyk just add @Danielle9897 as the collaborator to your repo so she'll be able to fix it here (no need to open another PR)