Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup a nightly runs for semantic search at noaa workload #4731

Closed
martin-gaievski opened this issue May 29, 2024 · 4 comments
Closed

Setup a nightly runs for semantic search at noaa workload #4731

martin-gaievski opened this issue May 29, 2024 · 4 comments
Labels
enhancement New Enhancement

Comments

@martin-gaievski
Copy link
Member

Is your feature request related to a problem? Please describe

Similar to https://opensearch.org/benchmarks/ , we would like to set up nightly runs that runs semantic search workloads every night, publish performance metrics, and make it available for public.

Describe the solution you'd like

For nightly run we need to get results for following configuration:

Cluster configuration:

  • 3 data nodes of type r5.4xlarge or similar, with 16 vCPU cores, 64Gb+ RAM
  • 1 to 3 leader nodes

2 different run for cases when concurrent_segment_search is enabled and disabled.

Workload 1: Concurrent_segment_search disabled:
Workload test procedure:

  • hybrid-query-aggs-full

Workload parameters:

  • number_of_shards:6
  • max_num_segments:8
  • concurrent_segment_search_enabled:'false'

Workload 2: Concurrent_segment_search enabled:
Workload test procedure:

  • hybrid-query-aggs-full

Workload parameters:

  • number_of_shards:6
  • max_num_segments:8
  • concurrent_segment_search_enabled:'true'

Describe alternatives you've considered

No response

Additional context

Workload is part of the opensearch-benchmarks-workloads repo, thus results should be easily reproducible by users.

@martin-gaievski martin-gaievski added enhancement New Enhancement untriaged Issues that have not yet been triaged labels May 29, 2024
@martin-gaievski
Copy link
Member Author

For reference we can use following command to run workloads:

workload 1:

opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="number_of_shards:6,max_num_segments:8,concurrent_segment_search_enabled:'false'"

workload 2:

opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="number_of_shards:6,max_num_segments:8,concurrent_segment_search_enabled:'true'"

@rishabh6788 rishabh6788 removed the untriaged Issues that have not yet been triaged label May 29, 2024
@rishabh6788
Copy link
Collaborator

The nightly runs have been scheduled. Will wait for this week to generate enough data to create public dashboards.

@martin-gaievski
Copy link
Member Author

@rishabh6788 Can we please make few changes to this setup:

  1. change ec2 instance type of data node to r5.2xlarge

  2. change command for running workload:

opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="/noaa_semantic_search/params/one_replica_no_concurrent_segment_search.json"

only different comparing to what we have today is reference to params from the file vs defining params as part of the command

  1. add one more workload run on the same cluster configuration with following command:
opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="/noaa_semantic_search/params/one_replica_with_concurrent_segment_search.json"

Related changes are merged to workloads repo: opensearch-project/opensearch-benchmark-workloads#319

@rishabh6788
Copy link
Collaborator

Closing this issue. Please re-open if you have any comments or concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New Enhancement
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants