Python script that parses an Excel table of UserAgents together with their count number to produce statistics on the data.
Python 3.9.7
Install the following libraries before running the script:
- pandas
- Install anaconda or miniconda as described here, then open a terminal and run:
$ conda install pandas
- openpyxl
- In a terminal, run:
$ pip install openpyxl
- ua_parser
- In a terminal, run:
$ pip install ua_parser
- matplotlib.pyplot
- In a terminal, run:
$ python -m pip install -U pip $ python -m pip install -U matplotlib
- Create a folder with ua_parsing.py and your Excel table of UserAgents and relative count number (cnt). We will use Unique_UserAgents_sample.xlsx.
- In the script, you can adapt the following lines according to your need, if you want different output file names
outputFileName = 'Unique_UserAgents_parsed.xlsx'
osFile = 'OS_count.xlsx'
browserFile = 'browse_count.xlsx'
deviceFile = 'device_count.xlsx'
plotOsFileName = 'piechart_os.png'
plotBrowserFileName = 'piechart_browser.png'
plotDeviceFileName = 'piechart_device.png'
- Open a terminal in the created folder and run:
python3 ua_parsing.py yourfilename.xlsx
-
If you use the data of this example, you can sobstiute yourfilename.xlsx with Unique_UserAgents_sample.xlsx (which is this file)
-
As output, you will have an Excel table of the parsed UserAgent data (here an example).
-
Afterwards, some statistics are calculated to verify the shares in the data among the Operative Systems, browsers and device type used.
- The script outputs three Excel tables, one for each share (OS, browser, device type)
The main files in this repo are:
- ua_parsing.py: the Python script that parses the user agent data.
- Unique_UserAgents_sample.xlsx: sample table of unique UserAgents and their relative count number (these are e.g. counts of how many unique UserAgents clicked on a certain web page.)
- Unique_UserAgents_parsed.xlsx: output table of the parsed data. It contains multiple columns with information about Operative System, browser and device type used, together with the count number of the input table.
- OS_count.xlsx, browse_count.xlsx and device_count.xlsx: three sample output tables calculated by the script. They are obtained by a groupby() function.
- piechart_os.png, piechart_browser.png and piechart_device.png: the pie charts plotted based on the output tables of the previous point.
- Windows
- iOS
- Mac
- OS X
- Android
- Linux
- Firefox
- Chrome
- Safari
- Opera
- Edge
- Computer
- Mobile (smartphones + tablets)
Author: Luca Iacolettig - iacolettig(dot)luca(at)gmail.com
Distributed under the GNU GPL v3 license. See LICENSE for more information.