We are building a tier 1 service to clense, ingest, store and query data representing people's employment history and company data. The data lives locally in a relational database (PostgresSQL)The ingestion and querying is carrier out via an ORM (SQL Alchemy). This provides The data is lighly clensed to make relational mapping more accurate between the tables and columns (Logic can be found in load script). The backend app is built using Flask with intentions to make it easily extensible to a front end framework in the future.
- All the companies people worked for are not in the companies table
- There are some companies that have no people data stored
- Some companies have multiple entries in the companies table
- The company name people provide might not match the name in company data. There are companies that match the people data based on company name and company linkedin.
- There is a one to many mapping between company linkedin names and companies names in the companies table.
- If people did not provide an end date, we assume they currently work there.
- If the title people provide includes the substring founder, they are a founder at the company
- Companies can be referenced accurately by people using company name or company linkedin
- Dowload project and set up on your local machine.
- Set up a virtual enviorment for the proejct with python 3.8 and above. (Lower versions might lead to package dependencies issues for Flask_SQLAlchemy.)
- cd into the main project folder and run the following command to install all dependencies.
pip install -r requirments.txt - Once all the required packages are installed, launch the app by running the command below.
When the app is run, the data is automatically loaded onto the database. This allows the set up process to be quick and efficient. In the future, there could be a need to split these executions into two components for better maintainability of the engine. The data can be accessed at the following URL:
flask runpostgresql://admin:admin@localhost:5432/poojakale - There are 3 APIs defined as follows:
/average-funding-by-person/[person_id] /companies-by-person/[person_id] /investors-by-company/[company_li_name]