Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: option to provide a static url->title mapping for the "Page titles" report #234

Open
wants to merge 2 commits into
base: 3.x-dev
Choose a base branch
from

Conversation

mackuba
Copy link
Contributor

@mackuba mackuba commented Nov 28, 2018

I want to use Matomo to track visits on my site, and I decided to use only log analytics. I'm aware of the limitations, that some things will be missing compared to JS tracking - like resolution info, outgoing links, plugin support etc., and I'm fine with that.

However, I've realized that on a relatively rarely changing site like mine - a blog with a few articles posted per year at most - I can solve one of these things, namely the "Page titles" which are normally read in JS from the <title> tag, by providing a static list of url -> title mappings in a file passed in a parameter to the import script.

The file looks like this (the first = is treated as a separator and both sides are trimmed from whitespace, but of course we can change the format to e.g. CSV or something else):

https://mackuba.eu/ = mackuba.eu
https://mackuba.eu/2018/09/07/new-stuff-from-wwdc-2018/ = New stuff from WWDC 2018 &ndash; mackuba.eu
https://mackuba.eu/2018/07/10/dark-side-mac-2/ = Dark Side of the Mac: Updating Your App &ndash; mackuba.eu
https://mackuba.eu/2018/06/11/notifications-in-ios-12/ = What's new in notifications in iOS 12 &ndash; mackuba.eu
...

Now, when I call import_logs.py with --page-titles-from=page_titles.txt, whenever the parser sees a hit with URL e.g. https://mackuba.eu/2018/07/10/dark-side-mac-2/, it will set the action_name to "Dark Side of the Mac: Updating Your App &ndash; mackuba.eu", and so on. My "Page titles" report looks just like with the JS tracker version, and I only need to remember to update the file whenever I post a new article (or better, automate it).

I believe quite a lot of sites using log analytics could use something like this. Depending on the size and type of the site the file can be maintained manually, or built from a database of articles/pages using a script or an action on the server. In my case, I wrote a small script that loads my sitemap.xml file and then goes through all link items listed there, fetches each HTML and extracts the <title> from it.

@tsteur tsteur changed the base branch from master to 3.x-dev January 13, 2020 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant