This project work with data that could have come from a real-world web application, with fields representing information that a web server would record, such as HTTP status codes and URL paths. The web server and the reporting tool both connect to the same database, allowing information to flow from the web server into the report. Exploring a large database with over a million rows we building and refine complex queries and use them to draw business conclusions from data.
For this project three questions will be ansewred:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
This project use a webserver with PostgreSQL and Python3. You can run this in a real server or in a virtual machine. If you need help to install and setup a virtual machine with VirtualBox and Vagrant, check the reference link bellow on Resources.
-
Clone this git repositoriy:
$ git clone https://github.com/psaviott/Logs-Analysis.git'
-
Download the newsdata file.
-
Setup database
- Create a database named news
user=> CREATE DATABASE news;
- Load newsdata.sql into local database:
$ psql -d news -f newsdata.sql
- Runnig the program with:
$ python3 log-analysis.py
you can see a return like this:
- Python3 Programming Language
- PostgreSQL Relational Database
- Philipe Saviott - psaviott
- PostgreSQL documentation
- Python3 documentation
- VirtualBox and Vagrant by Tania Rascia