Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] No telemetry exported form a Scala notebook when using AzureMonitorExporterBuilder #41856

Open
3 tasks done
greatvovan opened this issue Sep 14, 2024 · 17 comments
Open
3 tasks done
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team OpenTelemetry OpenTelemetry instrumentation question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@greatvovan
Copy link

greatvovan commented Sep 14, 2024

Describe the bug
When running a Scala notebook in a Databricks cluster, instrumentation using AzureMonitorExporterBuilder does not produce any output in the App Insights backend.

Exception or Stack Trace
No errors.

To Reproduce

  1. Have a Databricks workspace
  2. Have an App Insights instance
  3. Create a smallest compute cluster with default configuration (I used "Personal Compute" policy).
  4. Install com.azure:azure-monitor-opentelemetry-exporter:1.0.0-beta.30 in the cluster.
  5. Create a Scala notebook
  6. Run the below code in the notebook (copy in a cell and run)

Code Snippet

  val aiConnStr = "..."

  import org.apache.logging.log4j.LogManager
  import org.apache.logging.log4j.Level
  import org.apache.logging.log4j.core.config.builder.api.ConfigurationBuilderFactory
  import org.apache.logging.log4j.core.config.Configurator
  import io.opentelemetry.instrumentation.log4j.appender.v2_17.OpenTelemetryAppender
  import io.opentelemetry.api.common.Attributes
  import io.opentelemetry.api.common.AttributeKey
  import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk
  import com.azure.monitor.opentelemetry.exporter.AzureMonitorExporter
          
  val configBuilder = ConfigurationBuilderFactory.newConfigurationBuilder()
  val configuration = configBuilder
    .add(
      configBuilder
        .newAppender("Console", "CONSOLE")
    )
    .add(
      configBuilder
        .newAppender("Otel", "OpenTelemetry")
    )
    .add(
      configBuilder 
        .newRootLogger(Level.INFO)
        .add(configBuilder.newAppenderRef("Console"))
        .add(configBuilder.newAppenderRef("Otel"))
    )
    .build(false)
  Configurator.initialize(configuration)

  val sdkBuilder = AutoConfiguredOpenTelemetrySdk.builder()

  AzureMonitorExporter.customize(sdkBuilder, aiConnStr)

  val openTelemetry = sdkBuilder.build().getOpenTelemetrySdk()
  OpenTelemetryAppender.install(openTelemetry)
  println("Configured")

  val logger = LogManager.getLogger()

  logger.info("ScalaEA Log outside span")
  println("log sent")

  val attributes = Attributes.of[java.lang.String, java.lang.Long](
    AttributeKey.stringKey("foo"), "bar",
    AttributeKey.longKey("code"), 42L
  )

  val tracer = openTelemetry.getTracer("my-notebook")
  val span = tracer.spanBuilder("My-span-ScalaEA").startSpan()
  val scope = span.makeCurrent()
  try {
    span.addEvent("ScalaEA span event", attributes)
    logger.info("ScalaEA Log from span")
    Thread.sleep(1000L)
  }
  finally {
    scope.close()
    span.end()
  }
  println("trace sent")

  val meter = openTelemetry.getMeter("my-notebook")
  val gauge = meter.gaugeBuilder("my-gauge-ScalaEA").build()

  gauge.set(111, attributes)
  println("metric sent")

  openTelemetry.shutdown()

  println("Done")

Important: run the configuration code (till sdkBuilder.build().getOpenTelemetrySdk()) only once (see #41859). If you ran it and got ConfigurationException, restart the compute cluster (Run -> Restart compute resource). – fixed in version 1.0.0-beta.29.

The above code as a gist: https://gist.github.com/greatvovan/a2148bccf2c0e0c2305e8a35e0779dc3

Alternative (with manual configuration): https://gist.github.com/greatvovan/5e9e3d8ecaf4e210ade619c5b55455b3

Expected behavior
3 entries in traces table (2 logs, 1 event), 1 entry in customMetrics, 1 entry in dependencies table (span) created in App Insights

Screenshots
N/A

Additional context

  • Sending telemetry from a Python notebook on the same cluster works fine, which I am guessing enough to rule out a connectivity problem.
  • The same code works fine as a standalone Scala program and produces the expected telemetry.
  • Documentation used.
  • Based on examples.

Setup (please complete the following information):

  • OS: Linux-like (unknown)
  • IDE: Databricks Notebook
  • Library/Libraries: com.azure:azure-monitor-opentelemetry-exporter:1.0.0-beta.30
  • Java version: 1.8.0.412
  • App Server/Environment: Databricks runtime 15.4 LTS ML (no GPU)
  • Frameworks: No

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added

EDIT: Updated the gists to look more like linear Scala programs.
EDIT: Updated the code for version 1.0.0-beta.29 (the gists remain for version 28).
EDIT: Bumped version to 1.0.0-beta.30.

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Sep 14, 2024
@joshfree joshfree added the OpenTelemetry OpenTelemetry instrumentation label Sep 18, 2024
@joshfree
Copy link
Member

@lmolkova could you please take a look?

@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Sep 18, 2024
@lmolkova lmolkova assigned jeanbisutti and unassigned lmolkova Sep 18, 2024
@lmolkova
Copy link
Member

Thank you for reporting this @greatvovan!

I wonder if the problem could be around Scala notebook having different shutdown hooks and exiting before anything is exported.

Could you please try calling openTelemetry.shutdown() and awaiting it to complete to eliminate this? Thank you!

Otherwise, @jeanbisutti @heyams or @trask should be able to investigate further.

@lmolkova lmolkova added the Monitor - Exporter Monitor OpenTelemetry Exporter label Sep 18, 2024
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Sep 18, 2024
@greatvovan
Copy link
Author

@lmolkova I tried openTelemetry.shutdown() as well as Thread.sleep(...) at the end, but it did not change anything.

@juliuswwj
Copy link

juliuswwj commented Oct 9, 2024

the following code snippet works in Scala under Linux, but does NOT work in a Databricks notebook/job with a compute cluster 15.4 LTS ML (includes Apache Spark 3.5.0, Scala 2.12). it seems the issue exists in the Databricks runtime

import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.api.common.Attributes
import io.opentelemetry.api.common.AttributeKey
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk
import com.azure.monitor.opentelemetry.exporter.AzureMonitorExporter

object Main {
  def test(aiConnStr: String): String = {
    val sdkBuilder = AutoConfiguredOpenTelemetrySdk.builder()

    AzureMonitorExporter.customize(sdkBuilder, aiConnStr);

    val openTelemetry = sdkBuilder.build().getOpenTelemetrySdk()
    println("Auto-configured " + openTelemetry.toString())

    val attributes = Attributes.of[java.lang.String, java.lang.Long](
      AttributeKey.stringKey("foo"), "bar",
      AttributeKey.longKey("code"), 42L
    )

    val tracer = openTelemetry.getTracer("my-notebook")
    val span = tracer.spanBuilder("My-span-ScalaEA").startSpan()
    val scope = span.makeCurrent()
    try {
      span.addEvent("XScalaEA span event", attributes)
      Thread.sleep(1000L)
    }
    finally {
      scope.close()
      span.end()
    }
    println("trace sent")

    val meter = openTelemetry.getMeter("my-notebook")
    val gauge = meter.gaugeBuilder("my-gauge-ScalaEA").build()

    gauge.set(111, attributes)
    println("metric sent")

    openTelemetry.shutdown()

    Thread.sleep(5000L)

    "done"
  }

  def main(args: Array[String]): Unit = {
    test("...")
  }
}
    libraryDependencies += "io.opentelemetry" % "opentelemetry-sdk" % "1.42.1",
    libraryDependencies += "io.opentelemetry" % "opentelemetry-api" % "1.42.1",
    libraryDependencies += "com.azure" % "azure-monitor-opentelemetry-exporter" % "1.0.0-beta.29"

@greatvovan
Copy link
Author

Yes, that's exactly what I am saying in the initial post.
I am not ready to blame the Databricks runtime, though. It should be a regular Java runtime. I'd rather blame the library.
Maybe developers can help us enable logging from the library to stdout to troubleshoot it faster...

@trask
Copy link
Member

trask commented Oct 22, 2024

hi @greatvovan! I was able to reproduce your findings

and it took me a while, but I finally found where the logs go in azure databricks 😅

and saw this:

24/10/22 22:20:52 WARN ChannelInitializer: Failed to initialize a channel. Closing: [id: 0xec533e08]
java.lang.NoSuchMethodError: io.netty.handler.codec.http.HttpClientCodec.<init>(Lio/netty/handler/codec/http/HttpDecoderConfig;ZZ)V

so it looks like databricks is bringing their own (older) version of netty which is conflicting with the version used by Azure SDK libraries

the good news is that Azure SDK libraries support alternate http client implementations, so I think you should be able to exclude com.azure:azure-core-http-netty (which is a transitive dependency of com.azure:azure-monitor-opentelemetry-exporter) and include com.azure:azure-core-http-okhttp instead

@greatvovan
Copy link
Author

Hi @trask! Thank you for investigation, at least now we understand the reason.

In terms of fixing, how can we tell the library to stop using Netty and start using OkHttp? I installed com.azure:azure-core-http-okhttp:1.12.4 but it did nothing. I also tried installing io.netty:netty-all:4.1.114.Final or using com.azure:azure-core-http-netty:1.15.5 as root in hope that it will replace a part of the dependency tree, but with no avail.

I think you should be able to exclude com.azure:azure-core-http-netty

I don't quite understand how I can exclude a library in Databricks, and not sure it is possible, as it may be used internally.

For better diagnostics, do you not think it would be better to check all necessary dependencies in initialization stage, e.g. in AutoConfiguredOpenTelemetrySdkBuilder.build() and throw a clear exception rather than being failing silently?

Also, could you explain how to enable/find the logging you quoted, so that I could possibly try troubleshooting on my own?

@trask
Copy link
Member

trask commented Oct 23, 2024

I think you should be able to exclude com.azure:azure-core-http-netty

so if this was a maven project, here's what I would do:

        <dependency>
            <groupId>com.azure</groupId>
            <artifactId>azure-monitor-opentelemetry-exporter</artifactId>
            <version>1.0.0-beta.29</version>
            <exclusions>
                <exclusion>
                    <groupId>com.azure</groupId>
                    <artifactId>azure-core-http-netty</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>com.azure</groupId>
            <artifactId>azure-core-http-okhttp</artifactId>
            <version>1.12.4</version>
        </dependency>

I didn't try it in Databricks, but it looks like they give you an option to add exclusions:

Image

so I'd try to install com.azure:azure-monitor-opentelemetry-exporter with exclusion com.azure:azure-core-http-okhttp com.azure:azure-core-http-netty

and then also install com.azure:azure-core-http-okhttp

Also, could you explain how to enable/find the logging you quoted, so that I could possibly try troubleshooting on my own?

yeah, open your cluster and go to the Driver logs tab, then look at the log4j logs

@greatvovan
Copy link
Author

greatvovan commented Oct 23, 2024

@trask you saved my day! Or maybe week...

Indeed, there is an option to exclude, which worked as expected.

Now traces and metrics are coming (which is the main thing for me), but logs are not. Not sure if it's still related to library conflicts, I assume the same client is responsible for exporting all kinds of telemetry, and now I don't see such complaints in driver logs anyway.

I activate logging this way:

import org.apache.logging.log4j.{LogManager, Logger, Level}
import org.apache.logging.log4j.core.{LoggerContext}
import org.apache.logging.log4j.core.appender.ConsoleAppender
import org.apache.logging.log4j.core.appender.ConsoleAppender.{Builder => ConsoleBuilder}
import org.apache.logging.log4j.core.layout.PatternLayout
import org.apache.logging.log4j.core.config.builder.api.ConfigurationBuilderFactory
import org.apache.logging.log4j.core.config.{AppenderRef, LoggerConfig}
import io.opentelemetry.instrumentation.log4j.appender.v2_17.OpenTelemetryAppender
import io.opentelemetry.instrumentation.log4j.appender.v2_17.OpenTelemetryAppender.{Builder => OtelBuilder}

val context = LogManager.getContext(false).asInstanceOf[LoggerContext]
val config = context.getConfiguration

val layout = PatternLayout.newBuilder().withPattern("%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - MARKER %msg%n").build()
val consoleAppender = ConsoleAppender.newBuilder().asInstanceOf[ConsoleBuilder[_]]
  .setName("MyConsoleAppender").asInstanceOf[ConsoleBuilder[_]]
  .setLayout(layout).asInstanceOf[ConsoleBuilder[_]]
  .setTarget(ConsoleAppender.Target.SYSTEM_OUT).asInstanceOf[ConsoleBuilder[_]]
  .build()
consoleAppender.start()

val otelAppender = OpenTelemetryAppender.builder().asInstanceOf[OtelBuilder[_]]
  .setName("MyOtelAppender").asInstanceOf[OtelBuilder[_]]
  .setCaptureMapMessageAttributes(true).asInstanceOf[OtelBuilder[_]]
  .setCaptureContextDataAttributes("*").asInstanceOf[OtelBuilder[_]]
  .build()
otelAppender.start()

val rootLoggerConfig = config.getRootLogger
rootLoggerConfig.addAppender(consoleAppender, null, null)
rootLoggerConfig.addAppender(otelAppender, null, null)

context.updateLoggers()

I connect a ConsoleAppender just to have an indication in stdout that my logs are handled.
Then initialize the Exporter and install the Appender.

val sdkBuilder = AutoConfiguredOpenTelemetrySdk.builder()
AzureMonitorExporter.customize(sdkBuilder, aiConnStr)
val openTelemetry = sdkBuilder.build().getOpenTelemetrySdk()
OpenTelemetryAppender.install(openTelemetry)

Cluster libraries:

Image

Logging:

val logger = LogManager.getLogger("My-notebook")
logger.error("My test log")

I see the result in the cell output and cluster's stdout (with formatting I set), so I conclude that logging system is set up fine. Console and Otel appenders are added identically, and should emit messages simultaneously, but not a single log record is coming to the cloud. How can we further troubleshoot the issue?

@jeanbisutti jeanbisutti removed their assignment Oct 23, 2024
@trask
Copy link
Member

trask commented Oct 25, 2024

I think I got it working.

Just change OpenTelemetryAppender.install(openTelemetry) to otelAppender.setOpenTelemetry(openTelemetry).

It seems that OpenTelemetryAppender.install(openTelemetry) doesn't work when configuring things dynamically (as opposed to via log4j2.xml file).

Also, I found it helpful to add this to the top of the notebook:

Configurator.reconfigure();

@greatvovan
Copy link
Author

Indeed, appender's setOpenTelemetry() connected the dots! It is unbelievable, we've got from nothing to all three types of telemetry. Thanks a lot for help.

To be more accurate, though, the difference seems to be not in static vs. dynamic use, but elsewhere. The following code, if put into a standalone Scala program,

  val configBuilder = ConfigurationBuilderFactory.newConfigurationBuilder()
  val configuration = configBuilder
    .add(
      configBuilder
        .newAppender("Console", "CONSOLE")
    )
    .add(
      configBuilder
        .newAppender("Otel", "OpenTelemetry")
        .addAttribute("captureMapMessageAttributes", true)  // Capture extra MapMessage attributes
        .addAttribute("captureContextDataAttributes", "*")  // Capture ThreadContext attributes
    )
    .add(
      configBuilder 
        .newRootLogger(Level.INFO)
        .add(configBuilder.newAppenderRef("Console"))
        .add(configBuilder.newAppenderRef("Otel"))
    )
    .build(false)
  Configurator.initialize(configuration)

  val sdkBuilder = AutoConfiguredOpenTelemetrySdk.builder()
  AzureMonitorExporter.customize(sdkBuilder, aiConnStr)
  val openTelemetry = sdkBuilder.build().getOpenTelemetrySdk()
  OpenTelemetryAppender.install(openTelemetry)

also configures logging in run time, but it is compatible with OpenTelemetryAppender.install(openTelemetry). Unfortunately configuration with ConfigurationBuilderFactory does not work under Databricks, so we can't test install() method because the problem happens in the chain before the appender. This is unfortunate also because Log4j discourages modifying configuration components directly, and my current approach (that worked) goes against declarative methods like ConfigurationBuilderFactory.

@trask
Copy link
Member

trask commented Oct 25, 2024

To be more accurate, though, the difference seems to be not in static vs. dynamic use, but elsewhere

I see different behavior, you can check out my Java repro:

https://github.com/trask/log4j-dynamic-config-example/blob/836e08bd2452296a2e644e0c87eef2989db02612/src/main/java/io/opentelemetry/example/logappender/Application.java#L38-L39

@greatvovan
Copy link
Author

No, I mean, if you use ConfigurationBuilder. Check this out:

https://github.com/greatvovan/log4j-dynamic-config-example/blob/main/src/main/java/io/opentelemetry/example/logappender/Application2.java#L46-L47

It is still dynamic, but OpenTelemetryAppender.install() does its work.

Not sure to which extent it is useful, but I think it's enough to say that the problem lies deeper than static vs dynamic split.

@trask
Copy link
Member

trask commented Oct 26, 2024

@trask
Copy link
Member

trask commented Oct 26, 2024

btw, not sure if this might be interesting / relevant to follow: open-telemetry/opentelemetry-java-instrumentation#12468

@trask
Copy link
Member

trask commented Oct 28, 2024

@greatvovan just checking if it's ok to close this issue, or if you'd like to leave it open? thanks

@greatvovan
Copy link
Author

My issue is resolved, but I am thinking if we can derive moral from the story. Could the library handle the problem of missing/incorrect dependency better?

Following up on this comment, wouldn't it be better to fail more loudly than posting a quiet warning? Was it the intended design to not interfere with the application in case of problems, or we get that as an undesired side effect? My thinking is that if the user added instrumentation, it become an integral part of the program and cannot be easily sacrificed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team OpenTelemetry OpenTelemetry instrumentation question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

6 participants