Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Investigate performance degradation between 2.3.0 and 2.4.0 version #2888

Closed
bbarani opened this issue Nov 14, 2022 · 8 comments
Closed
Assignees
Labels
bug Something isn't working untriaged Issues that have not yet been triaged

Comments

@bbarani
Copy link
Member

bbarani commented Nov 14, 2022

Describe the bug

Performance test results showed increase of indexing latency in 2.4.0 version compared to 2.3.0 version as listed here. We need to identify the root cause if the degradation ( i.e. Is the degradation caused due to perf tool setup OR related to OpenSearch core changes)

To reproduce

The results are published in 2.4.0 release issue

Expected behavior

No response

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

No response

Additional context

No response

Relevant log output

No response

@bbarani bbarani added bug Something isn't working untriaged Issues that have not yet been triaged labels Nov 14, 2022
@bbarani bbarani changed the title [Bug]: Investigate performance degradation stats between 2.3.0 and 2.4.0 version [Bug]: Investigate performance degradation between 2.3.0 and 2.4.0 version Nov 14, 2022
@anasalkouz
Copy link
Member

I can see the latency increased on 2.4 on 10/21/2022. @adnapibar may be Rabi if we can see what was merged on that date.

Screen Shot 2022-11-14 at 3 30 05 PM

@adnapibar
Copy link
Contributor

Doesn't look like any significant change
image

@adnapibar
Copy link
Contributor

Do we know why there is a significant difference in the p99 between the perf tests ran today vs 11/10?
e.g. OpenSearch 2.3.0 vs 2.4.0 Linux (With security): 1.66% (11/10) vs -42.42% (today)

@andrross
Copy link
Member

The NYC taxis dataset does define a couple geo_point data types. This change to add support for GeoJSON was backported to the 2.x branch on October 21, so it lines up with the regression. @heemin32 is there any chance this change resulted in a performance regression during indexing?

The throughput graph shows the change in performance a little more clearly, in my opinion:

Screen Shot 2022-11-14 at 5 04 40 PM

@navneet1v
Copy link
Contributor

navneet1v commented Nov 15, 2022

The change in question defines a new way of inputting the geo point. It doesn't make any change in the indexing.

Will be doing deep dive on this.

@heemin32
Copy link
Contributor

The change include creation of XContent for array format so that it can be extendible by plugin correctly. This might have caused the index delay for point data in array format.

opensearch-project/OpenSearch@a282d39

@heemin32
Copy link
Contributor

NYC taxis dataset contains geopoint in array format. I suspect the degradation is from the change that I made in AbstractPointGeometryFieldMapper.java.

Instead of reverting the entire commit, I am going to revert the change in AbstractPointGeometryFieldMapper.java which shouldn't impact on existing functionality.

@bbarani
Copy link
Member Author

bbarani commented Nov 17, 2022

Closing this issue as we found the root cause of the degradation. You can view more info on this PR

@bbarani bbarani closed this as completed Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged Issues that have not yet been triaged
Projects
None yet
Development

No branches or pull requests

6 participants