Skip to content

Conversation

yuinumaz
Copy link
Contributor

@yuinumaz yuinumaz commented Sep 1, 2025

Description

From Nov 19, 2024, Amazon OpenSearch Serverless supports Point in Time (PIT).
https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-opensearch-serverless-pit-search/?nc1=h_ls
So we can enable Point in time for Amazon OpenSearch Serverless
https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/java/org/opensearch/dataprepper/plugins/source/opensearch/worker/client/SearchAccessorStrategy.java#L152

Issues Resolved

Resolves ##5493

Check List

For Amazon OpenSearch Serverless collections, search_after will be used because neither point_in_time nor scroll are supported by collections.
  • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yui Numazawa <[email protected]>
@graytaylor0
Copy link
Member

Thanks for making this change! Were you able to verify end to end that this worked with point in time on serverless?

@yuinumaz
Copy link
Contributor Author

I verified that PIT worked well on simple test.

Preparation

  1. Pipeline Configuration
    AOSS (vector collection) -> stdout
aoss2s3:
  source:
    opensearch:
      acknowledgments: true
      hosts: [ "https://xxxx.ap-northeast-1.aoss.amazonaws.com"]
      indices:
        include:
          - index_name_regex: "hotels-index"
      aws:
        serverless: true
        region: "ap-northeast-1"
        sts_role_arn: "arn:aws:iam::xxxx:role/PipelineRole"
      search_options:
        search_context_type: "point_in_time"
  sink:
    - stdout:
  1. Documents and Index in vector collection
    I created a index and added documents as follows.
PUT /hotels-index
{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "location": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2"
      }
    }
  }
}

POST /_bulk
{ "index": { "_index": "hotels-index"} }
{ "location": [5.2, 4.4] }
{ "index": { "_index": "hotels-index" } }
{ "location": [5.2, 3.9] }
{ "index": { "_index": "hotels-index" } }
{ "location": [4.9, 3.4] }
{ "index": { "_index": "hotels-index" } }
{ "location": [4.2, 4.6] }
{ "index": { "_index": "hotels-index" } }
{ "location": [3.3, 4.5] }

ref: https://docs.opensearch.org/latest/vector-search/getting-started/index/

Results

[ec2-user@xxxx opensearch-data-prepper-2.13.0-SNAPSHOT-linux-x64]$ ./bin/data-prepper 
Reading pipelines and data-prepper configuration files from Data Prepper home directory.
/usr/bin/java
Found openjdk version  of 11.0
2025-09-22T06:01:35,092 [main] INFO  org.opensearch.dataprepper.pipeline.parser.transformer.DynamicConfigTransformer - No transformation needed
2025-09-22T06:01:36,053 [main] INFO  org.opensearch.dataprepper.plugins.kafka.extension.KafkaClusterConfigExtension - Applying Kafka Cluster Config Extension.
2025-09-22T06:01:36,819 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.config.DataPrepperServerConfiguration - Creating data prepper server without authentication. This is not secure.
2025-09-22T06:01:36,819 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.config.DataPrepperServerConfiguration - In order to set up Http Basic authentication for the data prepper server, go here: https://github.com/opensearch-project/data-prepper/blob/main/docs/core_apis.md#authentication
2025-09-22T06:01:36,947 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.HttpServerProvider - Creating Data Prepper server without TLS. This is not secure.
2025-09-22T06:01:36,949 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.HttpServerProvider - In order to set up TLS for the Data Prepper server, go here: https://github.com/opensearch-project/data-prepper/blob/main/docs/configuration.md#server-configuration
2025-09-22T06:01:37,854 [aoss2s3-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.truststore.TrustStoreProvider - Using the trust all manager to create trust manager.
2025-09-22T06:01:38,127 [aoss2s3-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.source.opensearch.worker.client.SearchAccessorStrategy - Using search_context_type set in the config: 'point_in_time'
2025-09-22T06:01:38,132 [aoss2s3-sink-worker-2-thread-1] WARN  org.opensearch.dataprepper.plugins.sourcecoordinator.inmemory.InMemorySourceCoordinationStore - The in_memory source coordination store is not recommended for production workloads. It is only effective in single node environments of Data Prepper, and can run into memory limitations over time if the number of partitions is too great.
2025-09-22T06:01:38,136 [aoss2s3-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.source.opensearch.OpenSearchService - The opensearch source will start processing data at 2025-09-22T06:01:36.607816Z. It is currently 2025-09-22T06:01:38.136130Z
2025-09-22T06:01:39,134 [pool-7-thread-1] INFO  org.opensearch.dataprepper.plugins.source.opensearch.worker.PitWorker - Starting processing for index: 'hotels-index'
{"location":[5.2,4.4]}
{"location":[3.3,4.5]}
{"location":[5.2,3.9]}
{"location":[5.2,3.9]}
{"location":[4.9,3.4]}
{"location":[4.2,4.6]}
{"location":[4.2,4.6]}
{"location":[3.3,4.5]}
{"location":[5.2,4.4]}
{"location":[4.9,3.4]}
2025-09-22T06:01:42,620 [acknowledgement-callback-1] INFO  org.opensearch.dataprepper.plugins.source.opensearch.worker.WorkerCommonUtils - Received acknowledgment of completion from sink for index hotels-index

@graytaylor0 graytaylor0 merged commit e9cbfa9 into opensearch-project:main Sep 24, 2025
5 of 6 checks passed
@graytaylor0
Copy link
Member

Thanks for making and testing this change @yuinumaz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants