Setup a nightly runs for semantic search at noaa workload #4731

martin-gaievski · 2024-05-29T16:40:55Z

Is your feature request related to a problem? Please describe

Similar to https://opensearch.org/benchmarks/ , we would like to set up nightly runs that runs semantic search workloads every night, publish performance metrics, and make it available for public.

Describe the solution you'd like

For nightly run we need to get results for following configuration:

Cluster configuration:

3 data nodes of type r5.4xlarge or similar, with 16 vCPU cores, 64Gb+ RAM
1 to 3 leader nodes

2 different run for cases when concurrent_segment_search is enabled and disabled.

Workload 1: Concurrent_segment_search disabled:
Workload test procedure:

hybrid-query-aggs-full

Workload parameters:

number_of_shards:6
max_num_segments:8
concurrent_segment_search_enabled:'false'

Workload 2: Concurrent_segment_search enabled:
Workload test procedure:

hybrid-query-aggs-full

Workload parameters:

number_of_shards:6
max_num_segments:8
concurrent_segment_search_enabled:'true'

Describe alternatives you've considered

No response

Additional context

Workload is part of the opensearch-benchmarks-workloads repo, thus results should be easily reproducible by users.

martin-gaievski · 2024-05-29T18:13:54Z

For reference we can use following command to run workloads:

workload 1:

opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="number_of_shards:6,max_num_segments:8,concurrent_segment_search_enabled:'false'"

workload 2:

opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="number_of_shards:6,max_num_segments:8,concurrent_segment_search_enabled:'true'"

rishabh6788 · 2024-05-29T21:07:17Z

The nightly runs have been scheduled. Will wait for this week to generate enough data to create public dashboards.

martin-gaievski · 2024-06-14T00:25:19Z

@rishabh6788 Can we please make few changes to this setup:

change ec2 instance type of data node to r5.2xlarge
change command for running workload:

opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="/noaa_semantic_search/params/one_replica_no_concurrent_segment_search.json"

only different comparing to what we have today is reference to params from the file vs defining params as part of the command

add one more workload run on the same cluster configuration with following command:

opensearch-benchmark execute-test --workload="noaa_semantic_search" --test-procedure=hybrid-query-aggs-full --pipeline=benchmark-only --target-host=http://myserver:80 --kill-running-processes --workload-params="/noaa_semantic_search/params/one_replica_with_concurrent_segment_search.json"

Related changes are merged to workloads repo: opensearch-project/opensearch-benchmark-workloads#319

rishabh6788 · 2024-06-14T17:54:31Z

Closing this issue. Please re-open if you have any comments or concerns.

martin-gaievski added enhancement New Enhancement untriaged Issues that have not yet been triaged labels May 29, 2024

rishabh6788 removed the untriaged Issues that have not yet been triaged label May 29, 2024

rishabh6788 mentioned this issue May 29, 2024

Setup a nightly runs for semantic search at noaa workload. #4733

Merged

This was referenced Jun 14, 2024

Update noaa_semantic_search benchmark run parameters #4780

Merged

Update noaa_semantic_search performance run data instance to r5.2xlarge #4781

Merged

rishabh6788 closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup a nightly runs for semantic search at noaa workload #4731

Setup a nightly runs for semantic search at noaa workload #4731

martin-gaievski commented May 29, 2024

martin-gaievski commented May 29, 2024

rishabh6788 commented May 29, 2024

martin-gaievski commented Jun 14, 2024

rishabh6788 commented Jun 14, 2024

Setup a nightly runs for semantic search at noaa workload #4731

Setup a nightly runs for semantic search at noaa workload #4731

Comments

martin-gaievski commented May 29, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

martin-gaievski commented May 29, 2024

rishabh6788 commented May 29, 2024

martin-gaievski commented Jun 14, 2024

rishabh6788 commented Jun 14, 2024