-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Set up a Spark cluster with Ansible (and remove corresponding section from README) - Set up a Spark cluster with AWS EMR - Set up a Spark cluster manually - Set up a Spark cluster with Docker (and remove corresponding section from README) - Configure source for C* migration - Minor fixes in `config.yaml.example` - Minor typo fixes in Ansible files
- Loading branch information
Showing
12 changed files
with
376 additions
and
87 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
======================= | ||
Configuration Reference | ||
======================= | ||
|
||
------------------ | ||
AWS Authentication | ||
------------------ | ||
|
||
When reading from DynamoDB or S3, or when writing to DynamoDB, the communication with AWS can be configured with the properties ``credentials``, ``endpoint``, and ``region`` in the configuration: | ||
|
||
.. code-block:: yaml | ||
credentials: | ||
accessKey: <access-key> | ||
secretKey: <secret-key> | ||
# Optional AWS endpoint configuration | ||
endpoint: | ||
host: <host> | ||
port: <port> | ||
# Optional AWS availability region, required if you use a custom endpoint | ||
region: <region> | ||
Additionally, you can authenticate with `AssumeRole <https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html>`_. In such a case, the ``accessKey`` and ``secretKey`` are the credentials of the user whose access to the resource (DynamoDB table or S3 bucket) has been granted via a “role”, and you need to add the property ``assumeRole`` as follows: | ||
|
||
.. code-block:: yaml | ||
credentials: | ||
accessKey: <access-key> | ||
secretKey: <secret-key> | ||
assumeRole: | ||
arn: <role-arn> | ||
# Optional session name to use. If not set, we use 'scylla-migrator'. | ||
sessionName: <role-session-name> | ||
# Note that the region is mandatory when you use `assumeRole` | ||
region: <region> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,43 @@ | ||
=================================== | ||
Set Up a Spark Cluster with Ansible | ||
=================================== | ||
|
||
An `Ansible <https://www.ansible.com/>`_ playbook is provided in the `ansible <https://github.com/scylladb/scylla-migrator/tree/master/ansible>`_ folder of our Git repository. The Ansible playbook will install the pre-requisites, Spark, on the master and workers added to the ``ansible/inventory/hosts`` file. Scylla-migrator will be installed on the spark master node. | ||
|
||
1. Update ``ansible/inventory/hosts`` file with master and worker instances | ||
2. Update ``ansible/ansible.cfg`` with location of private key if necessary | ||
3. The ``ansible/template/spark-env-master-sample`` and ``ansible/template/spark-env-worker-sample`` contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them. | ||
4. run ``ansible-playbook scylla-migrator.yml`` | ||
5. On the Spark master node: :: | ||
|
||
cd scylla-migrator | ||
./start-spark.sh | ||
|
||
6. On the Spark worker nodes: :: | ||
|
||
./start-slave.sh | ||
|
||
7. Open Spark web console | ||
|
||
- Ensure networking is configured to allow you access spark master node via TCP ports 8080 and 4040 | ||
- visit ``http://<spark-master-hostname>:8080`` | ||
|
||
8. Review and modify ``config.yaml`` based whether you're performing a migration to CQL or Alternator | ||
|
||
- If you're migrating to Scylla CQL interface (from Cassandra, Scylla, or other CQL source), make a copy review the comments in ``config.yaml.example``, and edit as directed. | ||
- If you're migrating to Alternator (from DynamoDB or other Scylla Alternator), make a copy, review the comments in ``config.dynamodb.yml``, and edit as directed. | ||
|
||
9. As part of ansible deployment, sample submit jobs were created. You may edit and use the submit jobs. | ||
|
||
- For CQL migration: edit ``scylla-migrator/submit-cql-job.sh``, change line ``--conf spark.scylla.config=config.yaml \`` to point to the whatever you named the ``config.yaml`` in previous step. | ||
- For Alternator migration: edit ``scylla-migrator/submit-alternator-job.sh``, change line ``--conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.dynamodb.yml \`` to reference the ``config.yaml`` file you created and modified in previous step. | ||
|
||
10. Ensure the table has been created in the target environment. | ||
11. Submit the migration by submitting the appropriate job | ||
|
||
- CQL migration: ``./submit-cql-job.sh`` | ||
- Alternator migration: ``./submit-alternator-job.sh`` | ||
|
||
12. You can monitor progress by observing the Spark web console you opened in step 7. Additionally, after the job has started, you can track progress via ``http://<spark-master-hostname>:4040``. | ||
|
||
FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
================================== | ||
Set Up a Spark Cluster with Docker | ||
================================== | ||
|
||
This page describes how to set up a Spark cluster locally on your machine by using Docker containers. This approach is useful if you do not need a high-level of performance, and want to quickly try out the Migrator without having to set up a real cluster of nodes. It requires Docker and Git. | ||
|
||
1. Clone the Migrator repository. :: | ||
|
||
git clone https://github.com/scylladb/scylla-migrator.git | ||
cd scylla-migrator | ||
|
||
2. Download the latest release of the ``scylla-migrator-assembly.jar`` and put it in the directory ``migrator/target/scala-2.13/``. :: | ||
|
||
mkdir -p migrator/target/scala-2.13 | ||
wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar \ | ||
--directory-prefix=migrator/target/scala-2.13 | ||
|
||
3. Start the Spark cluster. :: | ||
|
||
docker compose up -d | ||
|
||
4. Open the Spark web UI. | ||
|
||
http://localhost:8080 | ||
|
||
Tip: add the following aliases to your ``/etc/hosts`` to make links work in the Spark UI :: | ||
|
||
127.0.0.1 spark-master | ||
127.0.0.1 spark-worker | ||
|
||
5. Rename the file ``config.yaml.example`` to ``config.yaml``, and `configure </getting-started/#configure-the-migration>`_ it according to your needs. | ||
|
||
6. Finally, run the migration. :: | ||
|
||
docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \ | ||
--master spark://spark-master:7077 \ | ||
--conf spark.driver.host=spark-master \ | ||
--conf spark.scylla.config=/app/config.yaml \ | ||
/jars/scylla-migrator-assembly.jar | ||
|
||
The ``spark-master`` container mounts the ``./migrator/target/scala-2.13`` dir on ``/jars`` and the repository root on ``/app``. | ||
|
||
7. You can monitor progress by observing the Spark web console you opened in step 4. Additionally, after the job has started, you can track progress via ``http://localhost:4040``. | ||
|
||
FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.