A cross-platform Python script to download LiveJournal blogs and seamlessly convert them into Markdown files.
Note
This project is a revived continuation of the original work by Jamisonfitz.
The original project is unmaintained and relied on Windows-specific dependencies (pywin32) that caused crashes on modern platforms. This repository fixes those compatibility errors, updates modern dependencies, and ensures smooth execution across Windows, macOS, and Linux.
This script is a utility to download, archive, and convert LiveJournal posts into markdown (.md) files. It provides an automated solution to backup LiveJournal blogs, preserving each post's content, title, and publishing date for posts that are public.
- Original project and core LiveJournal archiving concept by Jamisonfitz: https://github.com/Jamisonfitz/livejournal2markdown
- Current modifications were produced largely by the AI agent GitHub Copilot (Raptor mini Preview) with additional coding and guidance from a human operator.
- Fetches all post permalinks from the provided LiveJournal blog.
- Downloads each post and saves it as a markdown file.
- Adjusts the file creation and modification dates to match the post date.
- Automatically generates filenames based on post title and date.
- Supports current LiveJournal post HTML structure and multiple page layout variants.
- Outputs the progress to the console including the number of found posts and the file being archived.
- The original link to the LiveJournal post is appended to the end of each markdown file.
- Python 3.x
-
Setup:
- Clone this repository to your local machine.
- Navigate to the directory.
- Install the required Python packages by running:
pip install -r requirements.txt
-
Run the Tool:
- Execute the script using Python:
python main.py
- Execute the script using Python:
-
Output:
- The markdown files will be saved in the
Scraped Journalsdirectory where themain.pyscript is executed.
- The markdown files will be saved in the
- BeautifulSoup4: For HTML parsing.
- requests: To make HTTP requests.
- pywin32: To modify file creation and modification dates on Windows.
If you'd like to contribute, please fork the repository and make changes as you'd like. Pull requests are warmly welcome.
For this repository, use the experimental branch for testing and development. When changes are approved, merge experimental into main and publish from main.
Example commands:
git checkout experimental
git push origin experimental
# after approval
git checkout main
git merge experimental
git push origin mainIf you encounter any issues or have feature requests, please file an issue on the GitHub project.
- v1.0.1 (06-05-2026): Added compatibility for current LiveJournal HTML layouts and multiple page variants, including newer
aentry-postmarkup and olderb-singlepostmarkup.
Please use this tool responsibly. Making rapid or aggressive requests may lead to IP bans or other restrictions from LiveJournal. Always respect the terms of service of any platform you interact with.
This project is licensed under the MIT License.
- The script modifies the creation and modification dates of the markdown files to reflect the date of the original LiveJournal post. This way, the file metadata matches the original publishing date of the content.
- This version adds compatibility for current LiveJournal HTML layouts and updated date string formats.
- It also includes fallback handling for multiple LiveJournal page structure variants such as newer
aentry-postmarkup and olderb-singlepostmarkup. - This tool is designed for public LiveJournal blogs. It may not work correctly with private or locked posts.
- Optional Tor anonymization support has been added.
- If Tor is missing, the script can prompt to install it and optionally remove it after the session.
- The script now supports SOCKS5 proxies via
requests[socks].
- The script includes optional browser automation helpers for navigating WebAssembly-powered pages such as JupyterLite and Pyodide.
- This helper code is present in
main.py, but full Selenium-based execution requiresseleniumand a compatible browser driver such as Firefox/GeckoDriver. - In this environment, browser-mode has not been run end-to-end because Selenium and geckodriver are not installed.
- The script now prompts whether to save each post as Markdown or as an HTML-like file.
- HTML mode preserves the post body structure below the site header, closer to the original page appearance, and writes
.htmlfiles. - The script downloads page images into a shared
assets/images/folder so the same image URL is only saved once and reused across multiple archived posts. - Links that point to any
livejournal.comdomain are rewritten to*.livejournal.invalidreplicas so they look similar but cannot resolve to a real website. - Markdown mode continues to produce readable archival Markdown files with headings and links.
- The HTML-to-Markdown conversion has been upgraded to preserve common LiveJournal content structures such as headings, lists, links, images, code blocks, blockquotes, and tables.
- If
markdownifyis installed, the script will use it for better conversion fidelity; otherwise it falls back to an enhanced built-in converter.
For this repository, use the experimental branch for testing and development. When changes are approved, merge experimental into main and publish from main.
Example commands:
git checkout experimental
git push origin experimental
# after approval
git checkout main
git merge experimental
git push origin mainThank you to those who archived LiveJournal, I thought mine was long gone after nearly two decades I wanted a way to back it up that was still readable.
AI can save internet history from the "digital dark age" by continuously crawling and creating verifiable backups, but it becomes truly heartfelt when it acts as an empathetic digital curator. Instead of just storing cold code, AI can give fading web pages and forgotten online communities a beautiful, human-centric second life.
~Kahless
