MIMIC-IV was used as the source dataset for this demonstration. The dataset is originally available on Physionet and requires credentials to access the data.
Follow the instructions mention on OHDSI WhiteRabbit Github Page.
A typical ETL script generated by RabbitInAHat will contain an INSERT
and SELECT
statement bundled into a single query.
What does that mean for a non-database expert?
It means that the query will insert into the target database schema based on the transformations performed on the source schema in the select statement. It happens when the source and target schema are defined in a common database server.
What if the source schema is stored in a remote server?
If the source schema is stored at a remote server, you can gain access to the remote database server by allowing the source admin to provide the user with access priveleges,
OR
Fetch the raw database from the remote server and deploy in your target database server. (Not ideal)
Which database server should be used for deployment?
For deployment, any relational database can be used such as MySQL, Postgres etc.
Are there other ways to build ETL pipelines instead of creating SQL scripts?
Yes, there can be several ways to build an ETL pipeline. Broadly one can,
- Utilise the ETL specifications which contain the mappings to create ETL pipelines using any programming language.
- Store the data in a NoSQL database if needed. This depends on the downstream applications requirements.