diff --git a/docs/source/getting-started/ansible.rst b/docs/source/getting-started/ansible.rst
index 0483a917..3ea30f88 100644
--- a/docs/source/getting-started/ansible.rst
+++ b/docs/source/getting-started/ansible.rst
@@ -2,18 +2,22 @@
Set Up a Spark Cluster with Ansible
===================================
-An `Ansible `_ playbook is provided in the `ansible `_ folder of our Git repository. The Ansible playbook will install the pre-requisites, Spark, on the master and workers added to the ``ansible/inventory/hosts`` file. Scylla-migrator will be installed on the spark master node.
+An `Ansible `_ playbook is provided in the `ansible folder`_ folder of our Git repository. The Ansible playbook will install the pre-requisites, Spark, on the master and workers added to the ``ansible/inventory/hosts`` file. Scylla-migrator will be installed on the spark master node.
1. Update ``ansible/inventory/hosts`` file with master and worker instances
2. Update ``ansible/ansible.cfg`` with location of private key if necessary
3. The ``ansible/template/spark-env-master-sample`` and ``ansible/template/spark-env-worker-sample`` contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them.
4. run ``ansible-playbook scylla-migrator.yml``
-5. On the Spark master node: ::
+5. On the Spark master node:
+
+ .. code-block:: bash
cd scylla-migrator
./start-spark.sh
-6. On the Spark worker nodes: ::
+6. On the Spark worker nodes:
+
+ .. code-block:: bash
./start-slave.sh
diff --git a/docs/source/getting-started/aws-emr.rst b/docs/source/getting-started/aws-emr.rst
index 4d523e11..80b7f7cf 100644
--- a/docs/source/getting-started/aws-emr.rst
+++ b/docs/source/getting-started/aws-emr.rst
@@ -4,18 +4,25 @@ Set Up a Spark Cluster with AWS EMR
This page describes how to use the Migrator in `Amazon EMR `_. This approach is useful if you already have an AWS account, or if you do not want to manage your infrastructure manually.
-1. Download the ``config.yaml.example`` from our Git repository. ::
+1. Download the ``config.yaml.example`` from our Git repository.
+
+ .. code-block:: bash
wget https://github.com/scylladb/scylla-migrator/raw/master/config.yaml.example \
--output-document=config.yaml
+
2. `Configure the migration `_ according to your needs.
-3. Download the latest release of the Migrator. ::
+3. Download the latest release of the Migrator.
+
+ .. code-block:: bash
wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar
-4. Upload them to an S3 bucket. ::
+4. Upload them to an S3 bucket.
+
+ .. code-block:: bash
aws s3 cp config.yaml s3:///scylla-migrator/config.yaml
aws s3 cp scylla-migrator-assembly.jar s3:///scylla-migrator/scylla-migrator-assembly.jar
@@ -24,13 +31,17 @@ This page describes how to use the Migrator in `Amazon EMR /scylla-migrator/config.yaml /mnt1/config.yaml
aws s3 cp s3:///scylla-migrator/scylla-migrator-assembly.jar /mnt1/scylla-migrator-assembly.jar
-5. Upload the script to your S3 bucket as well. ::
+5. Upload the script to your S3 bucket as well.
+
+ .. code-block:: bash
aws s3 cp copy-files.sh s3:///scylla-migrator/copy-files.sh
@@ -48,7 +59,9 @@ This page describes how to use the Migrator in `Amazon EMR `_ it according to your needs.
-6. Finally, run the migration. ::
+6. Finally, run the migration.
+
+ .. code-block:: bash
docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
--master spark://spark-master:7077 \
diff --git a/docs/source/getting-started/spark-standalone.rst b/docs/source/getting-started/spark-standalone.rst
index acefc663..47ebb4b8 100644
--- a/docs/source/getting-started/spark-standalone.rst
+++ b/docs/source/getting-started/spark-standalone.rst
@@ -6,18 +6,24 @@ This page describes how to set up a Spark cluster on your infrastructure and to
1. Follow the `official documentation `_ to install Spark on each node of your cluster, and start the Spark master and the Spark workers.
-2. In the Spark master node, download the latest release of the Migrator. ::
+2. In the Spark master node, download the latest release of the Migrator.
+
+ .. code-block:: bash
wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar
-3. In the Spark master node, copy the file ``config.yaml.example`` from our Git repository. ::
+3. In the Spark master node, copy the file ``config.yaml.example`` from our Git repository.
+
+ .. code-block:: bash
wget https://github.com/scylladb/scylla-migrator/raw/master/config.yaml.example \
--output-document=config.yaml
4. `Configure the migration `_ according to your needs.
-5. Finally, run the migration as follows from the Spark master node. ::
+5. Finally, run the migration as follows from the Spark master node.
+
+ .. code-block:: bash
spark-submit --class com.scylladb.migrator.Migrator \
--master spark://:7077 \
diff --git a/docs/source/migrate-from-cassandra-or-parquet.rst b/docs/source/migrate-from-cassandra-or-parquet.rst
index b6bdbdb4..e50d4763 100644
--- a/docs/source/migrate-from-cassandra-or-parquet.rst
+++ b/docs/source/migrate-from-cassandra-or-parquet.rst
@@ -13,7 +13,7 @@ In file ``config.yaml``, make sure to keep only one ``source`` property and one
Configuring the Source
----------------------
-The data `source` can be a Cassandra or ScyllaDB database, or a Parquet file.
+The data ``source`` can be a Cassandra or ScyllaDB table, or a Parquet file.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Reading from Cassandra or ScyllaDB
@@ -25,7 +25,7 @@ In both cases, when reading from Cassandra or ScyllaDB, the type of source shoul
source:
type: cassandra
- # host name of one of the nodes of your database cluster
+ # Host name of one of the nodes of your database cluster
host:
# TCP port to use for CQL
port: 9042
@@ -117,18 +117,88 @@ In case the object is not public in the S3 bucket, you can provide the AWS crede
source:
type: parquet
- path: s3a://my-bucket/my-key.parquet
+ path: s3a://
credentials:
accessKey:
secretKey:
-Where ```` and ```` should be replaced with your actual AWS access key and secret key.
+Where ```` and ```` should be replaced with your actual AWS access key and secret key.
-The Migrator also supports advanced AWS authentication options such as using `AssumeRole `_. Please read the `configuration reference ` for more details.
+The Migrator also supports advanced AWS authentication options such as using `AssumeRole `_. Please read the `configuration reference `__ for more details.
---------------------------
Configuring the Destination
---------------------------
+The migration ``target`` can be Cassandra or Scylla. In both cases, we use the type ``cassandra`` in the configuration. Here is a minimal ``target`` configuration to write to Cassandra or ScyllaDB:
+
+.. code-block:: yaml
+
+ target:
+ # can be 'cassandra' or 'scylla', it does not matter
+ type: cassandra
+ # Host name of one of the nodes of your target database cluster
+ host:
+ port: 9042
+ keyspace:
+ # Name of the table to write. If it does not exist, it will be created on the fly.
+ # It has to have the same schema as the source table. If needed, you can rename
+ # columns along the way, look at the documentation page “Rename Columns”.
+ table:
+ # Consistency Level for the target connection
+ # Options are: LOCAL_ONE, ONE, LOCAL_QUORUM, QUORUM.
+ consistencyLevel: LOCAL_QUORUM
+ # Number of connections to use to Scylla/Cassandra when copying
+ connections: 16
+ # Spark pads decimals with zeros appropriate to their scale. This causes values
+ # like '3.5' to be copied as '3.5000000000...' to the target. There's no good way
+ # currently to preserve the original value, so this flag can strip trailing zeros
+ # on decimal values before they are written.
+ stripTrailingZerosForDecimals: false
+
+Where ````, ````, and ```` should be replaced with your specific values.
+
+Additionally, you can also set the following optional properties:
+
+.. code-block:: yaml
+
+ target:
+ # ... same as above
+
+ # Datacenter to use
+ localDC:
+
+ # Authentication credentials
+ credentials:
+ username:
+ password:
+ # SSL as per https://github.com/scylladb/spark-cassandra-connector/blob/master/doc/reference.md#cassandra-ssl-connection-options
+ sslOptions:
+ clientAuthEnabled: false
+ enabled: false
+ # all below are optional! (generally just trustStorePassword and trustStorePath is needed)
+ trustStorePassword:
+ trustStorePath:
+ trustStoreType: JKS
+ keyStorePassword:
+ keyStorePath:
+ keyStoreType: JKS
+ enabledAlgorithms:
+ - TLS_RSA_WITH_AES_128_CBC_SHA
+ - TLS_RSA_WITH_AES_256_CBC_SHA
+ protocol: TLS
+ # If we do not persist timestamps (when preserveTimestamps is false in the source)
+ # we can enforce in writer a single TTL or writetimestamp for ALL written records.
+ # Such writetimestamp can be e.g. set to time BEFORE starting dual writes,
+ # and this will make your migration safe from overwriting dual write
+ # even for collections.
+ # ALL rows written will get the same TTL or writetimestamp or both
+ # (you can uncomment just one of them, or all or none)
+ # TTL in seconds (sample 7776000 is 90 days)
+ writeTTLInS: 7776000
+ # writetime in microseconds (sample 1640998861000 is Saturday, January 1, 2022 2:01:01 AM GMT+01:00 )
+ writeWritetimestampInuS: 1640998861000
+
+Where ````, ````, ````, and ```` should be replaced with your specific values.
diff --git a/docs/source/migrate-from-dynamodb.rst b/docs/source/migrate-from-dynamodb.rst
index b2f80442..86b675ea 100644
--- a/docs/source/migrate-from-dynamodb.rst
+++ b/docs/source/migrate-from-dynamodb.rst
@@ -3,3 +3,175 @@ Migrate from DynamoDB
=====================
+This page explains how to fill the ``source`` and ``target`` properties of the `configuration file `_ to migrate data:
+
+- from a DynamoDB table, a ScyllaDB’s Alternator table, or a `DynamoDB S3 export `_,
+- to a DynamoDB table or a ScyllaDB’s Alternator table.
+
+In file ``config.yaml``, make sure to keep only one ``source`` property and one ``target`` property, and configure them as explained in the following subsections according to your case.
+
+----------------------
+Configuring the Source
+----------------------
+
+The data ``source`` can be a DynamoDB or Alternator table, or a DynamoDB S3 export.
+
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Reading from DynamoDB or Alternator
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In both cases, when reading from DynamoDB or Alternator, the type of source should be ``dynamodb`` in the configuration file. Here is a minimal ``source`` configuration to read a DynamoDB table:
+
+.. code-block:: yaml
+
+ source:
+ type: dynamodb
+ table:
+ region:
+
+Where ```` is the name of the table to read, and ```` is the AWS region where the DynamoDB instance is located.
+
+To read from the Alternator, you need to provide an ``endpoint`` instead of a ``region``:
+
+.. code-block:: yaml
+
+ source:
+ type: dynamodb
+ table:
+ endpoint:
+ host: http://
+ port:
+
+Where ```` and ```` should be replaced with the host name and TCP port of your Alternator instance.
+
+In practice, your source database (DynamoDB or Alternator) may require authentication. You can provide the AWS credentials with the ``credentials`` property:
+
+.. code-block:: yaml
+
+ source:
+ type: dynamodb
+ table:
+ region:
+ credentials:
+ accessKey:
+ secretKey:
+
+Where ```` and ```` should be replaced with your actual AWS access key and secret key.
+
+The Migrator also supports advanced AWS authentication options such as using `AssumeRole `_. Please read the `configuration reference `_ for more details.
+
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Reading a DynamoDB S3 Export
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To read the content of a table exported to S3, use the ``source`` type ``dynamodb-s3-export``. Here is a minimal source configuration:
+
+.. code-block:: yaml
+
+ source:
+ type: dynamodb-s3-export
+ # Name of the S3 bucket where the DynamoDB table has been exported
+ bucket:
+ # Key of the `manifest-summary.json` object in the bucket
+ manifestKey:
+ # Key schema and attribute definitions, see https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TableCreationParameters.html
+ tableDescription:
+ # See https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_AttributeDefinition.html
+ attributeDefinitions:
+ - name:
+ type:
+ - ...
+ # See https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_KeySchemaElement.html
+ keySchema:
+ - name:
+ type:
+ - ...
+
+
+Additionally, you can also provide the following optional properties:
+
+.. code-block:: yaml
+
+ source:
+ # ... same as above
+
+ # Connect to a custom endpoint instead of the standard AWS S3 endpoint
+ endpoint:
+ # Specify the hostname without a protocol
+ host:
+ port:
+
+ # AWS availability region
+ region:
+
+ # Connection credentials:
+ credentials:
+ accessKey:
+ secretKey:
+
+ # Whether to use “path-style access” in S3 (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html). Default is false.
+ usePathStyleAccess: true
+
+Where ````, ````, ````, ````, and ```` should be replaced with your specific values.
+
+The Migrator also supports advanced AWS authentication options such as using `AssumeRole `_. Please read the :doc:`configuration reference ` for more details.
+
+---------------------------
+Configuring the Destination
+---------------------------
+
+The migration ``target`` can be DynamoDB or Alternator. In both cases, we use the configuration type ``dynamodb`` in the configuration. Here is a minimal ``target`` configuration to write to DynamoDB or Alternator:
+
+.. code-block:: yaml
+
+ target:
+ type: dynamodb
+ # Name of the table to write. If it does not exist, it will be created on the fly.
+ table:
+ # Split factor for reading/writing. This is required for Scylla targets.
+ scanSegments: 1
+ # Throttling settings, set based on your database capacity (or wanted capacity)
+ readThroughput: 1
+ # Can be between 0.1 and 1.5, inclusively.
+ # 0.5 represents the default read rate, meaning that the job will attempt to consume half of the read capacity of the table.
+ # If you increase the value above 0.5, spark will increase the request rate; decreasing the value below 0.5 decreases the read request rate.
+ # (The actual read rate will vary, depending on factors such as whether there is a uniform key distribution in the DynamoDB table.)
+ throughputReadPercent: 1.0
+ # At most how many tasks per Spark executor?
+ maxMapTasks: 1
+ # When transferring DynamoDB sources to DynamoDB targets (such as other DynamoDB tables or Alternator tables),
+ # the migrator supports transferring live changes occurring on the source table after transferring an initial
+ # snapshot.
+ # Please see the documentation page “Stream Changes” for more details about this option.
+ streamChanges: false
+
+Where ```` should be replaced with your specific value.
+
+Additionally, you can also set the following optional properties:
+
+.. code-block:: yaml
+
+ target:
+ # ... same as above
+
+ # Connect to a custom endpoint. Mandatory if writing to Scylla Alternator.
+ endpoint:
+ # If writing to Scylla Alternator, prefix the hostname with 'http://'.
+ host:
+ port:
+
+ # AWS availability region:
+ region:
+
+ # Authentication credentials:
+ credentials:
+ accessKey:
+ secretKey:
+
+ # When streamChanges is true, skip the initial snapshot transfer and only stream changes.
+ # This setting is ignored if streamChanges is false.
+ skipInitialSnapshotTransfer: false
+
+Where ````, ````, ````, ````, and ```` are replaced with your specific values.
+
+The Migrator also supports advanced AWS authentication options such as using `AssumeRole `_. Please read the :doc:`configuration reference `_ for more details.