Skip to content
QIUSHI BAI edited this page Aug 12, 2019 · 2 revisions

MySql Adapter

System's Logic & Quick Start:

  • Before quick start:

    • Follow the instructions on this page to install sbt.
    • Install Docker (version at least 1.10) on the local machine;
    • Clone Cloudberry codebase from github.
    ~> git clone https://github.com/ISG-ICS/cloudberry.git
    

    Suppose the repository is cloned to the folder ~/cloudberry.

  • Step 1: run script/dockerRunxxxDB.sh: insert data into MySQL

    ~/> cd ~/cloudberry/examples/twittermap
    ~/twittermap> ./script/dockerRunMySqlDB.sh
    

    The 2nd command will download and run a prebuilt MySQL docker container. The process will take 2-5 minutes or even longer, depending on your network speed.

  • Step 2: setup config file in Cloudberry: Copy twittermap/web/cloudberry-mysqldb.conf's content into cloudberry/neo/conf/application.conf and cover the original codes.

  • Step 3: run cloudberry

    ~/> cd ~/cloudberry/cloudberry
    ~/cloudberry> sbt "project neo" "run"
    
  • Step 4: change config file in Twittermap: change # sqlDB = MySQL into sqlDB = MySQL, this change would:

    • Let the Twittermap register MySQLMigration's schema, i.e.,schema in cloudberry when using MySQL DB, (file MySqlMigration_20170810 might be renamed into cloudberryDDL in the future).
    • Use common/services-mysql.js instead when sending queries to cloudberry middleware. The apply of services-mysql.js is set in views/twittermap/main.scala.html.
  • Step 5: run Twittermap

    ~/> cd ~/cloudberry/examples/twittermap
    ~/twittermap> sbt "project web" "run 9001"
    
  • Step 6: MySqlMigration file would register schemas by sending them to cloudberry, the services-mysql.js file would send send request and get response from cloudberry middleware.

  • Step 7: MySqlGenerator would translate the request query into MySQL's query, and send it to MySqlConn, MySqlConn would send these queries to MySQL DB. The result would be send back to Twittermap. The website would be visited on http://localhost:9001 now

The Schema:

In file ./script/ingestMySqlTwitterToLocalCluster.sh, you'll the see the schema of sample data's schema in MySQL DB:

CREATE TABLE IF NOT EXISTS `twitter_ds_tweet` (
  `place.full_name` varchar(255) DEFAULT NULL,
  `place.bounding_box` varchar(255) DEFAULT NULL,
  `place.country_code` varchar(255) DEFAULT NULL,
  `user.friends_count` bigint(20) NOT NULL,
  `user.description` text NOT NULL,
  `favorite_count` bigint(20) NOT NULL,
  `geo_tag.countyID` bigint(20) DEFAULT NULL,
  `user.location` varchar(255) NOT NULL,
  `place.type` varchar(255) DEFAULT NULL,
  `geo_tag.cityName` varchar(255) DEFAULT NULL,
  `user.id` double NOT NULL,
  `geo_tag.stateName` varchar(255) DEFAULT NULL,
  `geo_tag.cityID` double DEFAULT NULL,
  `is_retweet` double NOT NULL,
  `text` text NOT NULL,
  `user.screen_name` varchar(255) NOT NULL,
  `retweet_count` bigint(20) NOT NULL,
  `place.country` varchar(255) DEFAULT NULL,
  `in_reply_to_user` bigint(20) NOT NULL,
  `user.statues_count` varchar(255) NOT NULL,
  `id` bigint(20) NOT NULL,
  `coordinate` point DEFAULT NULL,
  `place.id` varchar(255) DEFAULT NULL,
  `in_reply_to_status` bigint(20) NOT NULL,
  `geo_tag.stateID` bigint(20) DEFAULT NULL,
  `create_at` datetime NOT NULL,
  `user.create_at` datetime NOT NULL,
  `place.name` varchar(255) DEFAULT NULL,
  `lang` varchar(255) NOT NULL,
  `user.lang` varchar(255) NOT NULL,
  `user.name` varchar(255) NOT NULL,
  `geo_tag.countyName` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `create_at` (`create_at`),
  FULLTEXT KEY `idx` (`text`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • Flattened data: We've flattened the hierarchy records, and use their name to demonstrate their original hierarchy relation. Like "user.id" denoted the id for user.

  • Leave out array columns: Since the format of array is not supported in MySqlDB, we left out column hashtags and user_mentions from MySqlDB. Hence the Twittermap's query would also exclude the HashtagRequest.

  • Full text search: add full text search index on column text.

Sample Datasets:

We changed the original sample data sample.adm.gz into its json format: sample.json to read each record via php. This process can be done via running twittermap/script/reconstructAdmSampleTweetToJson.sh (requires python environment).

Docker Containers:

In file ./script/dockerRunMySqlDB.sh, we use image mysql:latest for setting up MySQL database.

Also in file ./script/dockerRunMySqlDB.sh, we use image spittet/php-mysql docker container to inject records from ./script/sample.json into mysql-container via php. And publish mysql-container to port 6603. Remember to pass a Gateway from dockerRunMySql.sh to ingestMySqlSampleTweetToLocalCluster.sh as mysql connection's host, it might be different under different circumstances.

After you have installed the docker container for the first time, you can restart it using the following command in case the docker is shutdown:

~> docker start mysql-container

If you want to shutdown the docker container later, you can use the following command:

~> docker stop mysql-container

Cloudberry Middleware:

SQLDB's Generator structure:

  • SQLGenerator: an basic abstract class for all SQL family generators.
  • MySqlGenerator: an extended class from SQLGenerator for MySQL DB.
  • PostgreSqlGenerator: an extended class from SQLGenerator for PostgreSQL DB.

Here are some descriptions about the Generator:

  • SQLGenerator can cover up most translation functions for SQL family.

  • MySqlGenerator contains (also for PostgreSqlGenerator):

    1. quote, truncate, fullTextMatch, geoAsText, pointGetCoord (p.s.: geo type is very special and complex in MySQL).
    2. parseCreate
    3. fieldType2MySQLType
    4. parseUpsertMeta
    5. initExprMap
    6. parseTextRelation
    7. parseUnnest
    8. parseGroupByFunc (case Interval)
    9. parseGeoCell

    These values and functions are different in most DB of SQL, so we specify them out into their own generator. And most other functions (like parseQuery, parseAppend, parseLookup … ) is currently sharable for MySQL & PostgreSQL.

  • Unsupported fieldTypes in MySqlGenerator:

    The schema send to the generator should not contain any Bag, Hierarchy, or Record fields.

  • MetaData in SQLGenerator:

    Considering the particularity of MetaData, we specially use function parseUpsertMeta to insert json type records into the berry.meta table.

Connector:

  • The connector only differs in the url, but you might have to add another dependency when adding a new sql database, because the driver would be different during the connection.

TwitterDataStoreWithoutHashtag & TwitterDataStoreWithHashtag:

  • The former is for MySql's sample tweet schema, the latter is for PostgreSQL. For the reason that MySQL doesn't support BagField(in middleware) or Array type(in database), we left the hashtag and user_mentions out in TwitterDataStoreWithoutHashtag . PostgreSQL supports Array though.

Migration_20160814:

  • You need to add a match case in Migration_20160814, which would send the metaData's query of creation.

  • Also, in class Cloudberry, a case should be added for config.AsterixLang. It would decide which DB, generator, connector the back-end would use.

Twittermap:

CloudberryDDL (currently named as (MySql)Migration_20170810):

Contains the cloudberry's table's schema, register the schema each time the Twittermap starts up.

The schema for twiiter (in SqlDB is named twitter_ds_tweet) should be the same as back-end's TwitterDataStore's schema.

common/services-mysql.js:

Sends request to middleware. in this js file, we removed byHashtagRequest, for its field is not supported in MySQL.

Configuration:

Twittermap's conf:

Change web/conf/application.conf:

# sqlDB = MySQL

into

sqlDB = MySQL

Cloudberry's conf:

Copy the twittermap/web/cloudberry-mysql.conf's content into cloudberry/neo/conf/application.conf and cover the original codes. These two files differs in asterixdb.lang and its accordance url.

How to Add a New SQL Database:

  • Back-end:

    • add a Generator (extends SQL’s Generator).

    • add a Conn (extends SQL’s Conn).

    • (probably) add a Back-end Schema: TwitterDataStore:

      • If your schema differs from those already exists: TwitterDataStoreWithHashtag & TwitterDataStoreWithoutHashtag.
  • Front-end:

    • (probably) add a services-xx.js & Migration file: send request & registration.

      • If your schema differs from those already exists: TwitterDataStoreWithHashtag & TwitterDataStoreWithoutHashtag.
      • The migration's tweet schema should be same as Back-end tweet schema.
  • Config files: 

    • add config file as twittermapweb/conf/cloudberry-xxx.conf, and amend the asterixdb.lang, add its url of connection.
    • add config (# sqlDB = xxxDB) in twittermapweb/conf/application.conf.
  • Add dependency for db's driver.

  • Insertion files (run DB in a docker container):

    • add dockerRun___DB.sh
    • add insert___DBSampleTweetToLocalCluster.sh

Future work:

Generalize Twittermap’s common/services-xx.js:

Write a new module(‘cloudberry.util’) as a basic module for sharable request functions in those services files. Reconstruct services-xx.js files, add ‘cloudberry.util’ to their dependencies.

Reconstruct Twittermap applications:

Reconstruct the directory as follow:

  • Twittermap
    • TwittermapForAsterixDB
    • TwittermapForMySqlDB
    • TwittermapForPostgreSqlDB
    • DrugMap

The TwitermapForxxxDB application differs in:

  • web/conf/cloudberry-xxxdb.conf
  • web/public/common/services.js
  • web/app/model/Migration_date.scala