Skip to content

Commit 374aa8a

Browse files
committed
version 3.0.0 - IP Geolocation integration
see changelog for more information
1 parent c3232ee commit 374aa8a

File tree

6 files changed

+48
-33
lines changed

6 files changed

+48
-33
lines changed

.github/CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
- version 2.1.4 - 01/02/2025 - add import_device TABLE to separate import_client TABLE
1010
- version 2.1.5 - 01/03/2025 - move platformNode column from import_client to import_device
1111
- version 2.1.6 - 01/09/2025 - repository name change - ApacheLogs2MySQL to apache-logs-to-mysql
12-
- version 3.0.0 - 01/28/2025 - IP Geolocation integration, several table & column renames, many process refinements - see changelog
12+
- version 3.0.0 - 01/28/2025 - IP Geolocation integration, several table & column renames, many process refinements
1313
- [1.0.1] apache_logs.error_systemCodeID corrected line - INTO logsystemCode to INTO logsystemCodeID
1414
- [1.0.1] remove debugging - SELECT statement from apache_logs.process_access_import, process_error_import & normalize_useragent.
1515
- [1.0.1] remove whitespace and commented out old code on all stored programs
@@ -98,7 +98,7 @@
9898
- [3.0.0] rename TABLES `log_clientname` to `log_client`, `log_servername` to `log_server`
9999
- [3.0.0] rename COLUMNS `clientnameid` to `clientid`, `servernameid` to `serverid` throughout application tables and processes.
100100
- [3.0.0] modify `process_access_parse` and `process_error_parse` WHERE CLAUSES for server_name UPDATE commands.
101-
- [3.0.0] add 16 stored functions for log attribute tables to return names for Slice and dice is a data analysis in drill-down Web interface.
101+
- [3.0.0] add 16 stored functions for primary attribute tables to return names for Slice and dice is a data analysis in drill-down Web interface.
102102
- [3.0.0] modify and reworded all console log messages in `logs2mysql.py` to standardize messages for each process. Added COLORS to coordinate message types for better readability.
103103
- [3.0.0] modify all database INDEX NAMES for standardization and consolidation.
104104
- [3.0.0] tested simultaneously uploading logs from 10 VPS with multiple VirtualHosts on each Server processing thousands of files in different formats and millions of log records.

.github/CONTRIBUTING.md

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,9 @@
1-
To contribute any Issues or Errors found using application please create a `New issue` under repository `Issues` tab.
1+
To contribute Issues or Errors found using application please create a `New issue` under repository `Issues` tab.
22

33
To contribute Ideas or Comments please create a `New discussion` under repository `Discussions` tab.
44

5-
To contribute Apache Access or Error Log Formats commonly used that application should process please start `New discussion` about that.
5+
To contribute Apache Access or Error Log Formats commonly used that application should process please start `New discussion`.
66

77
Any organizations, people or person with multiple Apache servers that find application a godsend in log collection monetary contributions are appreciated. Repository has my ***Buy Me a Coffee*** & ***Venmo*** links.
88

9-
I volunteer for a nonprofit organization that wanted to import their Apache website logs into MySQL tables to query data. The Executive Director loves MySQL so I decided to research existing solutions that used MySQL. I thought it would be two or three days of my time.
10-
11-
First I installed the Apache log_sql_mysql modules which did create a single MySQL mostly empty table of the access log with no control or customization and many other issues. Next I experimented with several simple log file parsers but none normalized the parsed log data into a MySQL database. Finally I reviewed other available Apache logging solutions that didn't use MySQL including GoAccess, Logstash, Apache Viewer, DataDog and others as well as CrowdStrike and Solarwinds Loggly.
12-
13-
Mid-September 2024 after all my research I decided to write a simple solution which snowballed into a complete application. All October I worked long hours around the clock. November I spent incorporating the application into VPS websites and applications I oversee while making improvements along the way. Version 2.0.0 fixed the major issues encountered and is the application baseline. December I spent refining the major changes made in Version 2.0.0. Version 2.1.5 was last code change to fix client identification issue when OS version changes by adding `import_device` TABLE.
14-
15-
First 2 weeks of January 2025 I spent processing millions of records from 10 VPS simultaneously to single MySQL Server. Version 3.0.0 is last major change with IP Address geoLocation and a final pass through to fine tune processes and rename some tables and columns. This version of application is production ready.
16-
17-
The final version is less Python and more SQL and much faster processing millions of records. At this point, I have over 1050 hours of research, design, iteration & development into application. It is much more time then I intended to invest into this project but it did produce my first open-source software.
18-
19-
That's how volunteering, lack of a viable MySQL solution and a flexible schedule came together just right to allow me to dive deep into this project.
20-
21-
### “Timing, degree and conviction are the three wise men in this life.” — Robert I. Fitzhenry
22-
239
Monetary contributions made will be reflected in development of [Web Interface](https://github.com/WillTheFarmer/mysql-to-apache-echarts) for this MySQL `apache_logs` schema.

.github/README.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,24 @@ and normalizing data into database designed for reports & data analysis.
66
Imports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql***
77
LogFormat defined :point_down:
88

9-
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset. Error Log view images :point_down:
9+
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization
10+
on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset. Error Log view images :point_down:
1011

11-
All processing stages are encapsulated within one "Import Load" that captures process metrics, notifications and errors into MySQL import tables. Every log data record is traceable back to the computer, folder, file, load process, parse process and import process it came from.
12+
All processing stages are encapsulated within one "Import Load" that captures process metrics, notifications and errors into MySQL import tables.
13+
Every log data record is traceable back to the computer, folder, file, load process, parse process and import process it came from.
1214

13-
Multiple Access and Error logs and formats can be loaded, parsed and imported along with User Agent parsing and IP Address geoLocation retrieval in a single execution. A single execution can also be configured to only load logs to Server.
14-
### Console Process Messages - 4 LogFormats, 2 ErrorLogFormats & 6 MySQL Stored Procedures
15+
Multiple Access and Error logs and formats can be loaded, parsed and imported along with User Agent parsing and IP Address geolocation retrieval in a single execution.
16+
A single execution can also be configured to only load logs to Server.
17+
### Process Messages in Console - 4 LogFormats, 2 ErrorLogFormats & 6 MySQL Stored Procedures
1518
![Processing Messages Console](./assets/processing_messages_console.png)
16-
New version has [MaxMind GeoIP2](https://github.com/maxmind/GeoIP2-python) Python API integration with 5 additional MySQL tables for IP geoLocation data. Two DB-IP Lite databases are required - `IP to City` and `IP to ASN`. Free DB-IP Lite databases can be found at [DB-IP](https://db-ip.com/db/lite.php)
17-
18-
A visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) and currently under development. The Web interface consists of Express.js web application frameworks with Drill Down Capability & [Apache ECharts](https://github.com/apache/echarts) frameworks for Data Visualization.
19+
ApacheLogs2MySQL has [MaxMind GeoIP2](https://github.com/maxmind/GeoIP2-python) Python API integration with 6 MySQL tables for IP geolocation data normalization.
20+
Two DB-IP Lite databases are required - `IP to City` and `IP to ASN`. Free DB-IP Lite databases can be found at [DB-IP](https://db-ip.com/db/lite.php)
1921

2022
Database Schema ***apache_logs*** designed to accommodate unlimited servers & domains. Step-by-step guide for easy installation :point_down:
23+
24+
A visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) and currently under development.
25+
The Web interface consists of [Express](https://github.com/expressjs/express) web application frameworks with Drill Down Capability
26+
& [Apache ECharts](https://github.com/apache/echarts) frameworks for Data Visualization.
2127
## Entity Relationship Diagram of apache_logs schema tables
2228
![Entity Relationship Diagram](./assets/entity_relationship_diagram.png)
2329
Diagram created with open-source database diagrams editor [chartdb/chartdb](https://github.com/chartdb/chartdb)
@@ -105,7 +111,8 @@ LogFormat "%v,%p,%h,%l,%u,%t,%I,%O,%S,%B,%{ms}T,%D,%^FB,%>s,\"%H\",\"%m\",\"%U\"
105111
|%{VARNAME}C|ADDED - The contents of cookie VARNAME in request sent to server. Only version 0 cookies are fully supported. Format String is optional.|
106112
|%L|ADDED - The request log ID from the error log (or '-' if nothing has been logged to the error log for this request). Look for the matching error log line to see what request| caused what error.
107113
## Two supported Error Log Formats
108-
Application processes Error Logs with ***default format*** for threaded MPMs (Multi-Processing Modules). If running Apache 2.4 on any platform and ErrorLogFormat is not defined in config files this is the Error Log format.
114+
Application processes Error Logs with ***default format*** for threaded MPMs (Multi-Processing Modules). If running Apache 2.4 on any platform
115+
and ErrorLogFormat is not defined in config files this is the Error Log format.
109116
Information from: https://httpd.apache.org/docs/2.4/mod/core.html#errorlogformat
110117
```
111118
ErrorLogFormat "[%{u}t] [%-m:%l] [pid %P:tid %T] %7F: %E: [client\ %a] %M% ,\ referer\ %{Referer}i"
@@ -289,7 +296,7 @@ Normalization ensures that data is organized in a way that makes sense for the d
289296
MySQL `apache_logs` schema currently has 55 Tables, 908 Columns, 188 Indexes, 72 Views, 8 Stored Procedures and 90 Functions to process Apache Access log in 4 formats
290297
& Apache Error log in 2 formats. Database normalization at work!
291298
## MySQL Access Log View by Browser - 1 of 66 schema views
292-
Current schema views are Access and Error Attribute Primary tables created in normalization process with simple aggregate values.
299+
Current schema views are Access and Error primary attribute tables created in normalization process with simple aggregate values.
293300
These are primitive data presentations of the log data warehouse. ApacheLogs2MySQL is the 'EL' of the 'ELK' Stack. The Web interface
294301
[MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) in development is the 'K' of the 'ELK' Stack.
295302

apache_logs_schema.sql

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,25 @@
1+
-- # Licensed under the Apache License, Version 2.0 (the "License");
2+
-- # you may not use this file except in compliance with the License.
3+
-- # You may obtain a copy of the License at
4+
-- #
5+
-- # http://www.apache.org/licenses/LICENSE-2.0
6+
-- #
7+
-- # Unless required by applicable law or agreed to in writing, software
8+
-- # distributed under the License is distributed on an "AS IS" BASIS,
9+
-- # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
-- # See the License for the specific language governing permissions and
11+
-- # limitations under the License.
12+
-- #
13+
-- # version 3.0.0 - 01/28/2025 - IP Geolocation integration, table & column renames, refinements - see changelog
14+
-- #
15+
-- # Copyright 2024 Will Raymond <[email protected]>
16+
-- #
17+
-- # CHANGELOG.md in repository - https://github.com/WillTheFarmer/apache-logs-to-mysql
18+
-- #
19+
-- file: apache_logs_schema.sql
20+
-- synopsis: Data definition language (DDL) for creating MySQL schema apache_logs for ApacheLogs2MySQL application
21+
-- author: Will Raymond <[email protected]>
22+
123
CREATE DATABASE IF NOT EXISTS `apache_logs` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ /*!80016 DEFAULT ENCRYPTION='N' */;
224
USE `apache_logs`;
325
-- MySQL dump 10.13 Distrib 8.0.40, for Win64 (x86_64)

logs2mysql.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
from time import time
3939
from time import ctime
4040
from datetime import datetime
41-
load_dotenv() # Loads variables from .env into the environment
41+
load_dotenv() # Loads variables from .env into the environment
4242
mysql_host = getenv('MYSQL_HOST')
4343
mysql_port = int(getenv('MYSQL_PORT'))
4444
mysql_user = getenv('MYSQL_USER')
@@ -76,7 +76,7 @@
7676
geoip2_city = getenv('GEOIP2_CITY')
7777
geoip2_asn = getenv('GEOIP2_ASN')
7878
geoip2_process = int(getenv('GEOIP2_PROCESS'))
79-
# makes process start, complete, info and error messages noticeable in console - all error messages start with 'ERROR - ' for keyword log search
79+
# Readability of process start, complete, info and error messages in console - all error messages start with 'ERROR - ' for keyword log search
8080
class bcolors:
8181
GREEN = '\33[32m'
8282
GREENER = '\033[92m'
@@ -98,7 +98,7 @@ class bcolors:
9898
'database': mysql_schema,
9999
'local_infile': True
100100
}
101-
# information to identify & register import upload client
101+
# Information to identify & register import load clients
102102
def get_device_id():
103103
sys_os = system()
104104
if sys_os == "Windows":
@@ -591,7 +591,7 @@ def processLogs():
591591
# SECONDARY PROCESSES BELOW: Client Module UPLOAD is done with load, parse and import processes of access and error logs. The below processes enhance User Agent and Client IP log data.
592592
# Initially UserAgent and GeoIP2 processes were each in separate files. After much design consideration and application experience and Code Redundancy being problematic
593593
# the decision was made to encapsulate all processes within the same "Import Load" which captures and logs all execution metrics, notifications and errors
594-
# into MySQL tables for each execution. Every log datarecord can be tracked back to the file, folder, computer, load process, parse process and import process it came from.
594+
# into MySQL tables for each execution. Every log data record can be tracked back to the file, folder, computer, load process, parse process and import process it came from.
595595
# Processes may require individual execution even when NONE of above processes are executed. If this Module is run automatically on a client server to upload Apache Logs to centralized
596596
# MySQL Server the processes below will never be executed. In some cases, only the processes below are needed for execution on MySQL Server or another centralized computer.
597597
# In some cases, ALL processes above and below will be executed in a single "Import Load" execution. Therefore, the encapsulation of all processes in a single module.

watch4logs.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,12 @@
2828
from watchdog.events import FileSystemEventHandler
2929
from dotenv import load_dotenv
3030
from logs2mysql import processLogs
31-
load_dotenv() # Loads variables from .env into the environment
31+
load_dotenv() # Loads variables from .env into the environment
3232
watch_path = os.getenv('WATCH_PATH')
3333
watch_recursive = bool(int(os.getenv('WATCH_RECURSIVE')))
3434
watch_interval = int(os.getenv('WATCH_INTERVAL'))
3535
watch_log = int(os.getenv('WATCH_LOG'))
36-
# make error messages noticeable in console - all error messages start with 'ERROR - ' for keyword log search
36+
# Readability of event messages in console
3737
class bcolors:
3838
GREEN = '\33[32m'
3939
GREENER = '\033[92m'

0 commit comments

Comments
 (0)