Skip to content

Commit 9478474

Browse files
committed
Add request log ID to error & access logs
request_log_id to access and error formats functionality. Enables easier association with access and error records.
1 parent 857b5fb commit 9478474

File tree

8 files changed

+319
-332
lines changed

8 files changed

+319
-332
lines changed

.github/CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
- version 1.1.0 - 11/18/2024 - major changes
33
- version 1.1.1 - 11/20/2024 - keyword replacement
44
- version 2.0.0 - 11/30/2024 - backward incompatible
5+
- version 2.1.0 - 12/09/2024 - request_log_id functionality
56
- [1.0.1] apache_logs.error_systemCodeID corrected line - INTO logsystemCode to INTO logsystemCodeID
67
- [1.0.1] remove debugging - SELECT statement from apache_logs.process_access_import, process_error_import & normalize_useragent.
78
- [1.0.1] remove whitespace and commented out old code on all stored programs
@@ -38,3 +39,14 @@
3839
- [2.0.0] add file - CHANGLOG.md - for single place to list all code and database changes
3940
- [2.0.0] The rest of changes made last month are many modifications and additions from previous version will remain undocumented.
4041
- [2.0.0] This version is the application baseline
42+
- [2.1.0] add request_log_id to access and error formats functionality. Enables easier association with access and error records.
43+
- [2.1.0] add columns to load_error_default & load_access_csv2mysql TABLES
44+
- [2.1.0] modify process_error_parse - replace POSITION function with LOCATE function, removed unrequired brackets, add parsing logic for %v and %L String Formats.
45+
- [2.1.0] modify process_error_import - add normalization for request_log_id, replace POSITION function with LOCATE function
46+
- [2.1.0] modify process_access_parse - add parsing for request_log_id, replace POSITION function with LOCATE function
47+
- [2.1.0] modify process_access_import - add normalization for request_log_id, replace POSITION function with LOCATE function
48+
- [2.1.0] modify process_access_import - cookie,remoteuser,remoteLogName,referer,requestLogID values of '-' set to NULL. No normalized table value is created.
49+
- [2.1.0] modify process_access_import - add servernameid to requestlogid before normalization to avoid duplicate requestlogid on consolidation of multiple domain logs.
50+
- [2.1.0] modify process_error_import - add servernameid to requestlogid before normalization to avoid duplicate requestlogid on consolidation of multiple domain logs.
51+
- [2.1.0] modify apacheLogs2MySQL.py - add .replace('"', ' in.') to all useragent attributes before UPDATE statement execute. The " in attribute value breaks UPDATE String.
52+
- [2.1.0] update README.md to describe and explain additional request_log_id functionality

.github/CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
To contribute Apache Access or Log formats commonly used that the application is not setup to do please email [email protected] with subject <sup>Additional Import Formats</sup> and I'll consider incorporating the format into import process.
1+
To contribute Apache Access or Log formats commonly used that application is not setup for please email [email protected] with subject <sup>Additional Import Formats</sup> and I'll consider incorporating the format.
22

33
To contribute ideas or comments email [email protected] with subject <sup>ideas or comments</sup> and I'll be sure to reply.
44

5-
Any organizations, people or person with multiple Apache servers that find this application to be a godsend in log collection financial contributions are greatly appreciated.
5+
Any organizations, people or person with multiple Apache servers that find application to be a godsend in log collection financial contributions are greatly appreciated.
66

77
### The Repository's ***- Sponsor this project -*** lists three methods to make financial contributions.
88

9-
There's not alot of code in the final version but I have 800 plus hours of research, design, iteration & development into application. During October I worked crazy long hours around the clock on this. It is way more time then I intended to invest into this project and another project did get put on the back burner but it did produce my first open-source software.
9+
There's not alot of code in the final version but I have 850 plus hours of research, design, iteration & development into application. During October I worked long hours around the clock on this. It is way more time then I intended to invest into this project and another project did get put on the back burner but it did produce my first open-source software.
1010

11-
I volunteer for a nonprofit organization that needed a simple solution to import Apache logs into MySQL. The Executive Director loves MySQL. First I installed the Apache log_sql_mysql modules which did create a single MySQL mostly empty table of the access log with no control or customization. Next I researched many available Apache logging solutions including GoAccess, Logstach, Apache Viewer, DataDog and others as well as CrowdStrike and Soloarwinds Loggly. I also reviewed several simple log file parsers but none that normalized the parsed log data into a MySQL database. After all my investigating I decided to write a simple solution which snowballed into this complete solution.
11+
I volunteer for a nonprofit organization that needed a simple solution to import Apache logs into MySQL. The Executive Director loves MySQL. First I installed the Apache log_sql_mysql modules which did create a single MySQL mostly empty table of the access log with no control or customization and many other issues. Next I researched available Apache logging solutions including GoAccess, Logstach, Apache Viewer, DataDog and others as well as CrowdStrike and Soloarwinds Loggly. I also reviewed several simple log file parsers but none normalized the parsed log data into a MySQL database. After all my investigating I decided to write a simple solution which snowballed into this complete solution.
1212

1313
That's how volunteering, lack of a viable MySQL solution and a flexible schedule came together just right to allow me to dive deep into this project.
1414

.github/README.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,6 @@ ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema to automate i
33

44
Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0.
55

6-
Version 2.0.0 fixes issues encountered running 4 weeks on 7 VPS with 2 to 6 VirtualHosts on each VPS. Each VPS is running `watch4logs.py` in PM2 connecting to one MySQL server `apache_logs` schema. Application currently consolidates Access and Error logs from 32 domains.
7-
86
Imports Access Logs in Apache LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql*** LogFormat defined below.
97

108
Imports Error Logs in Apache ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset. Error Log view images below.
@@ -73,10 +71,10 @@ LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" v
7371
|%v|The canonical ServerName of the server serving the request.|
7472
|%p|The canonical port of the server serving the request.|
7573

76-
Application is designed to use the ***csv2mysql*** LogFormat. LogFormat has comma-separated values and adds 7 Format Strings. A complete list of Format Strings
74+
Application is designed to use the ***csv2mysql*** LogFormat. LogFormat has comma-separated values and adds 8 Format Strings. A complete list of Format Strings
7775
with descriptions indicating added Format Strings below.
7876
```
79-
LogFormat "%v,%p,%h,%l,%u,%t,%I,%O,%S,%B,%{ms}T,%D,%^FB,%>s,\"%H\",\"%m\",\"%U\",\"%q\",\"%{Referer}i\",\"%{User-Agent}i\",\"%{farmwork.app}C\"" csv2mysql
77+
LogFormat "%v,%p,%h,%l,%u,%t,%I,%O,%S,%B,%{ms}T,%D,%^FB,%>s,\"%H\",\"%m\",\"%U\",\"%q\",\"%{Referer}i\",\"%{User-Agent}i\",\"%{farmwork.app}C\",%L" csv2mysql
8078
```
8179
|Format String|Description|
8280
|-------------|-----------|
@@ -101,6 +99,7 @@ LogFormat "%v,%p,%h,%l,%u,%t,%I,%O,%S,%B,%{ms}T,%D,%^FB,%>s,\"%H\",\"%m\",\"%U\"
10199
|%{Referer}i|The "Referer" (sic) HTTP request header. This gives the site that the client reports having been referred from.|
102100
|%{User-Agent}i|The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.|
103101
|%{VARNAME}C|ADDED - The contents of cookie VARNAME in request sent to server. Only version 0 cookies are fully supported. Format String is optional.|
102+
|%L|ADDED - The request log ID from the error log (or '-' if nothing has been logged to the error log for this request). Look for the matching error log line to see what request| caused what error.
104103
## Two supported Error Log Formats
105104
Application processes Error Logs with ***default format*** for threaded MPMs (Multi-Processing Modules). If you're running Apache 2.4 on any platform and ErrorLogFormat is not defined in config files this is the Error Log format.
106105
|Format String|Description|
@@ -118,14 +117,19 @@ Application processes Error Logs with ***default format*** for threaded MPMs (Mu
118117
```
119118
ErrorLogFormat "[%{u}t] [%-m:%l] [pid %P:tid %T] %7F: %E: [client\ %a] %M% ,\ referer\ %{Referer}i"
120119
```
121-
Application also processes Error Logs with this ***additional format*** which adds `%v - The canonical ServerName`. Easiest way to identify error logs for each domain is add `%v` to ErrorLogFormat.
120+
Application also processes Error Logs with ***additional format*** which adds:
121+
1) `%v - The canonical ServerName` - This is easiest way to identify error logs for each domain is add `%v` to ErrorLogFormat.
122+
2) `%L - Log ID of the request` - This is easiest way to associate Access record that created an Error record. Apache mod_unique_id.generate_log_id() only called when error occurs and will not cause performance degradation under error-free operations.
122123

123-
To use this format place the `ErrorLogFormat` before `ErrorLog` in `apache2.conf` to set error log format for Server and ALL VitualHosts on Server.
124-
|Format String|Description - The spaces on each side of comma are required.|
124+
To use this format place `ErrorLogFormat` before `ErrorLog` in `apache2.conf` to set error log format for ***Server*** and ***VitualHosts*** on Server.
125+
|Format String|Description - `Space` required on left-side of `Commas` to parse data properly|
125126
|-------------|-----------|
126127
|%v|The canonical ServerName of the server serving the request.|
128+
|%L|Log ID of the request. A %L format string is also available in `mod_log_config` to allow to correlate access log entries with error log lines. If `mod_unique_id` is loaded, its unique id will be used as log ID for requests.|
129+
130+
***Important:*** `Space` required on left-side of `Commas` as defined below:
127131
```
128-
ErrorLogFormat "[%{u}t] [%-m:%l] [pid %P:tid %T] %7F: %E: [client\ %a] %M% ,\ referer\ %{Referer}i , %v"
132+
ErrorLogFormat "[%{u}t] [%-m:%l] [pid %P:tid %T] %7F: %E: [client\ %a] %M% ,\ referer\ %{Referer}i ,%v ,%L"
129133
```
130134
## Two options to attach ServerName & ServerPort to Access & Error logs
131135

@@ -151,6 +155,7 @@ Log file naming conventions enable the use of UPDATE statements:
151155
```
152156
UPDATE apache_logs.import_file SET server_name='farmfreshsoftware.com', server_port=443 WHERE server_name IS NULL AND name LIKE '%farmfreshsoftware%';
153157
UPDATE apache_logs.import_file SET server_name='farmwork.app', server_port=443 WHERE server_name IS NULL AND name LIKE '%farmwork%';
158+
UPDATE apache_logs.import_file SET server_name='ip255-255-255-255.us-east.com', server_port=443 WHERE server_name IS NULL AND name LIKE '%error%';
154159
```
155160
First option requires uncommenting `os.getenv` to load variables at top of `apacheLogs2MySQL.py`. By default, variables are defined and set to an empty string. Below is a screenshot of the variables loaded at top of `apacheLogs2MySQL.py` with commented `os.getenv` code.
156161
![load_settings_variables.png](./assets/load_settings_variables.png)
@@ -241,7 +246,7 @@ The second parameter enables Python Client modules to run simultaneously on mult
241246
Database normalization is the process of organizing data in a relational database to improve data integrity and reduce redundancy.
242247
Normalization ensures that data is organized in a way that makes sense for the data model and attributes, and that the database functions efficiently.
243248

244-
MySQL `apache_logs` schema has 46 tables, 772 columns, 128 indexes, 56 views, 7 stored procedures and 41 functions to process Apache Access log in 4 formats
249+
MySQL `apache_logs` schema has 47 tables, 779 columns, 125 indexes, 56 views, 7 stored procedures and 42 functions to process Apache Access log in 4 formats
245250
& Apache Error log in 2 formats. Database normalization at work!
246251
## MySQL Access Log View by URI
247252
MySQL View - apache_logs.access_requri_list - data from LogFormat: combined & csv2mysql

.github/SUPPORT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
### Please include Apache Logformat or ErrorLogFormat if a missing or malformed data issue.
44

5-
### My office hours are 'almost always' unless I am doing farm chores, attending my kid's sporting events or sleeping. I rarely leave the farm so if I do not reply within 24 hours I am working on a project outside. If I do not reply within 48 hours something bad might have occurred.
5+
### My office hours are 'almost always' unless I am doing farm chores, attending my kid's sporting events or sleeping. Rarely do I leave the farm so if I do not reply within 24 hours I am working on a project outside. If I do not reply within 48 hours something bad might have happened.
66

77
### I am designer, developer and support staff with vast knowledge of Apache, MySQL and Python capable of fixing any issue with application.
88

.gitignore

Lines changed: 0 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -6,118 +6,19 @@ __pycache__/
66
# Distribution / packaging
77
.Python
88
build/
9-
develop-eggs/
109
dist/
11-
downloads/
12-
eggs/
13-
.eggs/
14-
lib/
15-
lib64/
16-
parts/
17-
sdist/
1810
var/
19-
wheels/
20-
share/python-wheels/
21-
*.egg-info/
22-
.installed.cfg
23-
*.egg
24-
MANIFEST
25-
26-
# PyInstaller
27-
# Usually these files are written by a python script from a template
28-
# before PyInstaller builds the exe, so as to inject date/other infos into it.
29-
*.manifest
30-
*.spec
3111

3212
# Installer logs
3313
pip-log.txt
34-
pip-delete-this-directory.txt
3514

3615
# Unit test / coverage reports
37-
htmlcov/
38-
.tox/
39-
.nox/
40-
.coverage
41-
.coverage.*
4216
.cache
43-
nosetests.xml
4417
coverage.xml
4518
*.cover
46-
*.py,cover
47-
.hypothesis/
4819
.pytest_cache/
4920
cover/
5021

51-
# Translations
52-
*.mo
53-
*.pot
54-
55-
# Django stuff:
56-
*.log
57-
local_settings.py
58-
db.sqlite3
59-
db.sqlite3-journal
60-
61-
# Flask stuff:
62-
instance/
63-
.webassets-cache
64-
65-
# Scrapy stuff:
66-
.scrapy
67-
68-
# Sphinx documentation
69-
docs/_build/
70-
71-
# PyBuilder
72-
.pybuilder/
73-
target/
74-
75-
# Jupyter Notebook
76-
.ipynb_checkpoints
77-
78-
# IPython
79-
profile_default/
80-
ipython_config.py
81-
82-
# pyenv
83-
# For a library or package, you might want to ignore these files since the code is
84-
# intended to run in multiple environments; otherwise, check them in:
85-
# .python-version
86-
87-
# pipenv
88-
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89-
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90-
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91-
# install all needed dependencies.
92-
#Pipfile.lock
93-
94-
# poetry
95-
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
96-
# This is especially recommended for binary packages to ensure reproducibility, and is more
97-
# commonly ignored for libraries.
98-
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
99-
#poetry.lock
100-
101-
# pdm
102-
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
103-
#pdm.lock
104-
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
105-
# in version control.
106-
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
107-
.pdm.toml
108-
.pdm-python
109-
.pdm-build/
110-
111-
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
112-
__pypackages__/
113-
114-
# Celery stuff
115-
celerybeat-schedule
116-
celerybeat.pid
117-
118-
# SageMath parsed files
119-
*.sage.py
120-
12122
# Environments
12223
.env
12324
.venv
@@ -126,34 +27,3 @@ venv/
12627
ENV/
12728
env.bak/
12829
venv.bak/
129-
130-
# Spyder project settings
131-
.spyderproject
132-
.spyproject
133-
134-
# Rope project settings
135-
.ropeproject
136-
137-
# mkdocs documentation
138-
/site
139-
140-
# mypy
141-
.mypy_cache/
142-
.dmypy.json
143-
dmypy.json
144-
145-
# Pyre type checker
146-
.pyre/
147-
148-
# pytype static type analyzer
149-
.pytype/
150-
151-
# Cython debug symbols
152-
cython_debug/
153-
154-
# PyCharm
155-
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
156-
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
157-
# and can be added to the global gitignore or merged into this file. For a more nuclear
158-
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
159-
#.idea/

apacheLogs2MySQL.py

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,3 @@
1-
# coding: utf-8
2-
# version 2.0.0 - 11/30/2024 - Comprehensive Update
3-
#
4-
# Copyright 2024 Will Raymond <[email protected]>
5-
#
61
# Licensed under the Apache License, Version 2.0 (the "License");
72
# you may not use this file except in compliance with the License.
83
# You may obtain a copy of the License at
@@ -15,12 +10,16 @@
1510
# See the License for the specific language governing permissions and
1611
# limitations under the License.
1712
#
18-
# CHANGELOG.md in GitHub repository - https://github.com/WillTheFarmer/ApacheLogs2MySQL
13+
# version 2.1.0 - 12/09/2024 - add request_log_id to error & access formats
14+
#
15+
# Copyright 2024 Will Raymond <[email protected]>
16+
#
17+
# CHANGELOG.md in repository - https://github.com/WillTheFarmer/ApacheLogs2MySQL
1918
"""
2019
:module: apacheLogs2MySQL
2120
:function: processLogs()
2221
:synopsis: processes apache access and error logs into MySQL for apachelogs2MySQL application.
23-
:author: [email protected] (Will Raymond)
22+
:author: Will Raymond <[email protected]>
2423
"""
2524
import os
2625
import platform
@@ -567,22 +566,34 @@ def processLogs():
567566
if useragent_log >= 2:
568567
print('Parsing record ' + str(recID) + ' - ' + str(ua) )
569568
# Accessing user agent's browser attributes
570-
# br = str(ua.browser) # returns Browser(family=u'Mobile Safari', version=(5, 1), version_string='5.1')
569+
strua = str(ua)
570+
strua = strua.replace('"', ' in.') # must replace " in string for error occurs
571+
br = str(ua.browser) # returns Browser(family=u'Mobile Safari', version=(5, 1), version_string='5.1')
572+
br = br.replace('"', ' in.') # must replace " in string for error occurs
571573
br_family = str(ua.browser.family) # returns 'Mobile Safari'
574+
br_family = br_family.replace('"', ' in.') # must replace " in string for error occurs
572575
#ua.browser.version # returns (5, 1)
573576
br_version = ua.browser.version_string # returns '5.1'
577+
br_version = br_version.replace('"', ' in.') # must replace " in string for error occurs
574578
# Accessing user agent's operating system properties
575579
os = str(ua.os) # returns OperatingSystem(family=u'iOS', version=(5, 1), version_string='5.1')
580+
os = os.replace('"', ' in.') # must replace " in string for error occurs
576581
os_family = str(ua.os.family) # returns 'iOS'
582+
os_family = os_family.replace('"', ' in.') # must replace " in string for error occurs
577583
#ua.os.version # returns (5, 1)
578584
os_version = ua.os.version_string # returns '5.1'
585+
os_version = os_version.replace('"', ' in.') # must replace " in string for error occurs
579586
# Accessing user agent's device properties
580587
dv = str(ua.device) # returns Device(family=u'iPhone', brand=u'Apple', model=u'iPhone')
588+
dv = dv.replace('"', ' in.') # must replace " in string for error occurs
581589
dv_family = str(ua.device.family) # returns 'iPhone'
590+
dv_family = dv_family.replace('"', ' in.') # must replace " in string for error occurs
582591
dv_brand = str(ua.device.brand) # returns 'Apple'
592+
dv_brand = dv_brand.replace('"', ' in.') # must replace " in string for error occurs
583593
dv_model = str(ua.device.model) # returns 'iPhone'
584-
updateSql = ('UPDATE access_log_useragent SET ua="'+str(ua) +
585-
'", ua_browser="' + br_family +
594+
dv_model = dv_model.replace('"', ' in.') # must replace " in string for error occurs
595+
updateSql = ('UPDATE access_log_useragent SET ua="'+ strua +
596+
'", ua_browser="' + br +
586597
'", ua_browser_family="' + br_family +
587598
'", ua_browser_version="' + br_version +
588599
'", ua_os="' + os +

0 commit comments

Comments
 (0)