This article discussing about how to extract and synchronize the data from on-premises Oracle to MaxCompute. The article is written for software engineer or DBA or architect who already have a basic understanding of Alibaba Cloud MaxCompute, RDS for MySQL and VPC network components, and also have a basic knowledge of Oracle database.
In this article we will deploy an Oracle database 11.2.0.4.0(not RAC) in an ECS, which we take it as an existed Oracle database in customer production environment. We use data generator tool to mock data for testing. As DTS doesn’t currently support direct migrate data from Oracle database to MaxCompute, we need a MySQL(RDS) as a transit node in the target AlibabaCloud account. The migration process is divide into two phase, the data will firstly migrated from the origin database to the transit node, and then from transit node to MaxCompute, the architecture is like:
Note: Data Integration provides another way for periodically data synchronization from Oracle to MaxCompute, but this method can only be used when the source table has a modification time field. Read more.
For deploying and installing Oracle database is not one of the topics that the article is going to discuss, but you can follow the Oracle document to install a Oracle Database 11g on an ECS.
Before migrate and synchronize the data from origin database to RDS for MySQL, we should make sure the origin DB already satisfies these prerequisites: Migrate data from on-premises Oracle to RDS for MySQL
1.Origin DB account: SYSDBA
2.Target DB privilege: Write/Read
3.Origin Oracle DB version is one of 10g,11g,12c.
4.Oracle enable supplemental log, supplemental_log_data_pk and supplemental_log_data_ui。
5.Oracle enable archive log module.
- Firstly, connect to Oracle database using sysdba account, and setup instance:
[oracle@oracle11g ~]$ sqlplus / as sysdba
SQL> **shutdown immediate;**
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> **startup mount;**
ORACLE instance started.
Total System Global Area 1.2549E+10 bytes
Fixed Size 12155024 bytes
Variable Size 6744442736 bytes
Database Buffers 5771362304 bytes
Redo Buffers 21397504 bytes
Database mounted.
SQL> **alter database archivelog;**
Database altered
SQL> **alter database open;**
Database altered
- Use Data Generator generate mock data and insert into Oracle database
For this demo, we mock a user table t_user:
CREATE TABLE "GEOFFREY"."T_USER"
( "ID" NUMBER(10,0) NOT NULL ENABLE,
"ADDR" VARCHAR2(35 BYTE),
"GMT_MODIFIED" DATE NOT NULL ENABLE,
CONSTRAINT "T_USER_PK" PRIMARY KEY ("ID")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
TABLESPACE "TEST_DATA" ENABLE
) SEGMENT CREATION DEFERRED
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
TABLESPACE "TEST_DATA" ;
- And generate up to 1M rows of test data to the table:
Now we have 1 million mocked rows in t_user table.
1.Purchase a RDS for MySQL 5.6 database from RDS console, the MySQL instance below will be as the instance to transit the data from Oracle to MaxCompute:
2.Create a migration account in the transit node, in this case we created normal type of account named dba_mgr:
3.Create the target database(db_1) from the console and grant the read write privilege to the account we created from step above:
Go to the DTS console with the same account of the RDS for MySQL, and create migration task named oracle2mc
Step 1: Setup the oracle connection string and account in the console,
Previous to next step, we run ‘Test the connection’ in order to get rid of any connection issues.
Step 2: Specify the source objects you want to apply in the task from Migration objects box, also check Migrate object structure, Migrate existing data and Replicate data changes:
Note:
1.Data migration only copies data and structure from source database to target database, it has no influence on source database.
2.DDL operations are not allowed during the process of data migration. Otherwise, the migration task may fail.
To change the target object name, we can also edit in the box on the Selected objects box:
It’s really convenient when we need to filter out any column from the source table:
Step 3. Check new column option in order to ensure the consistency of data synchronization in case that there are no primary key table. Then click the Pre-check and start button at the bottom left, and wait for the pre-check process:
Fortunately, we passed all the check items. Now, let’s purchase and start the migration task:
Starting migrate:
Finished existing data migration and keep on replicating :
Create MaxCompute project from DataWorks console:
Now we have a new MaxCompute project named ‘e_commerce’:
1.Go to DTS console and Create Synchronization task:
2.In this step, you need not to authorize manually, DTS will handle it:
3.After configured the data transmission task successfully, there will be a synchronization task in the list:
4.Let’s wait here until the Data Initialization task completed:
5.Now we may switch to DataWorks console to verify whether the data sources had been synchronized successfully:
This section describes how to get the full data of table for a specific time based on the base table and incremental log data which synchronized to MaxCompute. DTS provides the ability to achieve full data merging with MaxCompute SQL.
The full data at time 't' is obtained by MaxCompute SQL which merge from base table and log table. MaxCompute SQL is written as follows:
INSERT OVERWRITE TABLE t_user
SELECT id
,addr
,gmt_modified
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY t.id ORDER BY record_id DESC, after_flag DESC) AS row_number
,record_id
,operation_flag
,after_flag
,id
,addr
,gmt_modified
FROM (
SELECT incr.record_id
,incr.operation_flag
,incr.after_flag
,incr.id
,incr.addr
,incr.gmt_modified
FROM t_user_log incr
WHERE utc_timestamp < unix_timestamp(dateadd(GETDATE(), -8, 'HH'))
AND modifytime_year <= TO_CHAR(GETDATE(), 'yy')
UNION ALL
SELECT 0 AS record_id
,'I' AS operation_flag
,'Y' AS after_flag
,base.id
,base.addr
,base.gmt_modified
FROM t_user_base base
) t
) gt
WHERE gt.row_number = 1
AND gt.after_flag = 'Y'
;
You can also use the Dataworks to add a Merge nodes before subsequent computational analysis operations. When the full data merge is completed, subsequent computational analysis nodes can be automatically scheduled. At the same time, the scheduling period can be configured to perform periodic data offline analysis.
Up to now, we already finished the data migration & synchronization tasks from Oracle to MaxCompute, now we completes the MySQL database data synchronization to MaxCompute and full data merge.
Don't hesitate to contact us if you have questions or remarks.