Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AwsXrayRecoroder trying to modify a non-concurrent collection without exclusive access? #67

Closed
jaybyrrd opened this issue Apr 10, 2019 · 16 comments
Labels

Comments

@jaybyrrd
Copy link
Contributor

Hi there I can't really provide code snippets for the project, but the context is that we are simply trying to call AWSXrayRecorder.Instance.BeginSubsegment("some name"); inside of a web api and it throws the following exception:

Error Message:
 System.InvalidOperationException : Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
Stack Trace:
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
   at Amazon.XRay.Recorder.Core.Internal.Emitters.JsonSegmentMarshaller..ctor()
   at Amazon.XRay.Recorder.Core.AWSXRayRecorder.get_Instance()
   at <redacted>
   at <redacted>
Error Message:
 System.InvalidOperationException : Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
Stack Trace:
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
   at Amazon.XRay.Recorder.Core.Internal.Emitters.JsonSegmentMarshaller..ctor()
   at Amazon.XRay.Recorder.Core.AWSXRayRecorder.get_Instance()
   at <redacted>
   at <redacted>
   at <redacted>

This is happening from inside a dependency injected class, so it would follow the following pattern (where everything is scoped as far as DI goes)
Controller --> Provider --> ProviderWithXraySubsegment

We do see the API tracing the segments started by making the Http Request just fine, but when we add subsegments, it seems to fail with this kind of an error.

This will impact whether or not we use X-Ray at our business. We have also tried using the TraceMethod functions instead of creating subsegments.

@yogiraj07
Copy link
Contributor

yogiraj07 commented Apr 10, 2019

Hi @jaybyrrd ,
Thanks for posting the issue. From the error stack it's hard to understand what is happening.

  1. What is the framework you are using? .NET45 or .NET Core?
  2. For the flow Controller --> Provider --> ProviderWithXraySubsegment can you clarify a bit? A request goes to controller -> then a provider (is this DI ?) -> then ProviderWithXraySubsegment (is this DI?) ? Am I correct? Is there any asynchronous, multithreaded path ?
  3. Can you isolate the reproduction of this error to a some small project and we can work on this.
  4. On the side note, can you also let me know your use case, overall flow you are trying to integrate?

@jaybyrrd
Copy link
Contributor Author

Hi there,

  1. .NET Core
  2. All classes are dependency injected. Controller -> Scoped Dependency Injected Class -> Scoped Dependency Injected Data Layer. Per scope, the behavior is relatively synchronous, but there is a chance that multiple threads executing the same logic are running concurrently for other requests.
  3. I am trying to work on it now in terms of isolating the issue. It only breaks some of the time though, so it is tough. I will walk you through the overall flow.
  4. This is happening on multiple ASP.NET Core endpoints. The general flow is that we have our controller, which has a dependency injected class (scoped) that has another dependency injected class for our data layer. The data layer uses dapper. We are trying to trace the time around the data layer operations.

Like I said I will try to work to create a sandbox project either this evening or during the day, but I hope those details help.

@jaybyrrd
Copy link
Contributor Author

jaybyrrd commented Apr 11, 2019

Two Stacktraces with line numbers that I just managed to pull:

System.NullReferenceException : Object reference not set to an instance of an object.
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
   at ThirdParty.LitJson.JsonMapper.RegisterExporter[T](ExporterFunc`1 exporter) in \aws-xray-sdk-dotnet\sdk\src\Core\ThirdParty\LitJson\JsonMapper.cs:line 914
   at Amazon.XRay.Recorder.Core.Internal.Emitters.JsonSegmentMarshaller..ctor() in \aws-xray-sdk-dotnet\sdk\src\Core\Internal\Emitters\JsonSegmentMarshaller.cs:line 41
   at Amazon.XRay.Recorder.Core.Internal.Emitters.UdpSegmentEmitter..ctor() in \aws-xray-sdk-dotnet\sdk\src\Core\Internal\Emitters\UdpSegmentEmitter.cs:line 45
   at Amazon.XRay.Recorder.Core.AWSXRayRecorder..ctor() in \aws-xray-sdk-dotnet\sdk\src\Core\AWSXRayRecorder.netcore.cs:line 47
   at Amazon.XRay.Recorder.Core.AWSXRayRecorderBuilder.Build() in \aws-xray-sdk-dotnet\sdk\src\Core\AwsXrayRecorderBuilder.cs:line 262
   at Amazon.XRay.Recorder.Core.AWSXRayRecorder.get_Instance() in \aws-xray-sdk-dotnet\sdk\src\Core\AWSXRayRecorder.netcore.cs:line 156

Which results from a call to:

AWSXRayRecorder.Instance.TraceMethod("somestring", () => _repository.Execute(parameters));

This second stacktrace happens less often, but it does happen:

System.InvalidOperationException : Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
   at ThirdParty.LitJson.JsonMapper.RegisterExporter[T](ExporterFunc`1 exporter) in \aws-xray-sdk-dotnet\sdk\src\Core\ThirdParty\LitJson\JsonMapper.cs:line 914
   at Amazon.XRay.Recorder.Core.Internal.Emitters.JsonSegmentMarshaller..ctor() in \aws-xray-sdk-dotnet\sdk\src\Core\Internal\Emitters\JsonSegmentMarshaller.cs:line 45
   at Amazon.XRay.Recorder.Core.Internal.Emitters.UdpSegmentEmitter..ctor() in \aws-xray-sdk-dotnet\sdk\src\Core\Internal\Emitters\UdpSegmentEmitter.cs:line 45
   at Amazon.XRay.Recorder.Core.AWSXRayRecorder..ctor() in \aws-xray-sdk-dotnet\sdk\src\Core\AWSXRayRecorder.netcore.cs:line 47
   at Amazon.XRay.Recorder.Core.AWSXRayRecorderBuilder.Build() in \aws-xray-sdk-dotnet\sdk\src\Core\AwsXrayRecorderBuilder.cs:line 262
   at Amazon.XRay.Recorder.Core.AWSXRayRecorder.get_Instance() in \aws-xray-sdk-dotnet\sdk\src\Core\AWSXRayRecorder.netcore.cs:line 156

And this came from the same method invocation. Looks like both stacktraces die at the exact same line on the same IDictionary.

I will probably dig into it more a bit later today.

Jay

jaybyrrd pushed a commit to jaybyrrd/aws-xray-sdk-dotnet that referenced this issue Apr 11, 2019
@jaybyrrd
Copy link
Contributor Author

I have run all the tests that caught this exception in our software as well as our integration tests on the deployed API with this version of the software and I can no longer reproduce the initial issue I was posting about.

I do have concerns about whether or not I have introduced a lock that is actually safe since this change was quick and made to one place with only slight consideration to overall usage, but it still passes all unit tests locally. Please let me know if this works or if we need to do something more involved.

jaybyrrd pushed a commit to jaybyrrd/aws-xray-sdk-dotnet that referenced this issue Apr 11, 2019
@yogiraj07
Copy link
Contributor

Hi @jaybyrrd ,
Thanks for the PR. Allow me some time to understand the effect of adding lock in the hot code path. I will post an update soon. I greatly appreciate your efforts.

@jaybyrrd
Copy link
Contributor Author

Sounds good, the change I made was very much a shot in the dark and while it seems to work, I definitely think a more familiar set of eyes are needed.

@matheusmaximo
Copy link

I have the same problem here. It looks like to be intermittent.

@jaybyrrd
Copy link
Contributor Author

jaybyrrd commented Apr 30, 2019

@yogiraj07 I have updated the PR and I think it satisfies fixing the bug. I am going through final stages of testing here, but wanted to follow up on this.

@matheusmaximo, if you want, try to pull the code from that PR and see if it solves the intermittent problem for you.

@jaybyrrd
Copy link
Contributor Author

jaybyrrd commented May 8, 2019

@matheusmaximo the pull request has been merged and you should be good to go once it releases.

@matheusmaximo
Copy link

Still waiting for a release to check if it works on mine.

@jaybyrrd
Copy link
Contributor Author

I have created a pull request to create a release commit @yogiraj07

@yogiraj07 yogiraj07 added the bug label May 13, 2019
@yogiraj07
Copy link
Contributor

@jaybyrrd and @matheusmaximo , I would make a release per schedule soon. Please let me know if this issue is currently blocking you and will prioritize accordingly.

@srprash
Copy link
Collaborator

srprash commented May 22, 2019

Hello,
This has been fixed in release 2.6.2. Please refer to the Changelog.
Thanks for being patient.

Prashant.

@srprash srprash closed this as completed May 22, 2019
@matheusmaximo
Copy link

It worked for me. Thank you all.

@udlose
Copy link
Contributor

udlose commented Jan 23, 2023

Just fyi, this issue is still not fixed. The root cause is in LitJSON which it appears that AWSXRayRecorder has a dependency on. I have filed a PR with that repo and hopefully it will be merged soon - LitJSON/litjson#142.

@udlose
Copy link
Contributor

udlose commented Feb 16, 2024

Just to provide an update here, my pr for the fix in LitJSON (LitJSON/litjson#142) was merged on 11/19/23 - LitJSON/litjson#142 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants