Skip to content

Conversation

@chris-t-jansen
Copy link
Contributor

I stumbled across #4271 while looking through old issues and thought it looked interesting, so I decided to take a shot at it.

Rather than a ready-to-merge implementation, this PR is more a proof-of-concept, hence I've opened it as a draft PR. It proves the possibility and outlines the general process, but leaves a fair bit to be desired in terms of implementation. It also begs the question of whether TWiR wants a search function at all since it's been going fine without one for 2.5 years now, but I leave that question to the appropriate authorities.

I tested the updated build process in this PR on macOS and Windows through WSL. I tested the website on Chromium on a MacBook Pro, on Firefox on a Windows PC, on Firefox on Android, and on Safari on a simulated iPhone on a MacBook. In my experience, building the website took an additional 3-5 seconds with the indexing step to enable search functionality, and I didn't notice any additional or unusual slowness when loading the website.

If you've got any questions, comments, concerns, etc., don't hesitate to let me know.

Problem

The TWiR site previously had search functionality provided by Pelican's search plugin, which under the hood is powered by Stork. However, due to an issue causing website loading issues on iOS/iPad devices, the search functionality got temporarily removed, and hasn't been revisited since as far as I can tell.

In diving into this issue, I began by reenabling the old search functionality and trying to build the site, but after spending an afternoon playing whack-a-mole with numerous errors, I still couldn't get it to build. I didn't find that too surprising, however, as Pelican's search plugin was last updated 2 years ago, and Stork officially ceased development 3 years ago. At this point I was left to find an alternative, of which the developer of Stork helpfully provided several possible options. After looking over the recommendations, I landed on Pagefind as the best-looking alternative from my perspective.

Solution

Pagefind is a static site search binary that, like Stork, is written in Rust (yay!). Because static sites, by definition, don't have a server to handle search queries live, Pagefind works by "indexing" the site at build time and bundling that output into the static site, which a little JavaScript magic on the client's side turns into a functional search bar. Admittedly, it's pretty nifty.

Pagefind conveniently has a Python wrapper package on PyPi, which makes installation a breeze. However, Pagefind is a little too new for the python:3.8.16 image that the build container is derived from, so I had to bump it up to the python:3.9.25 image. I know that can theoretically cause breaking compatibility issues, though I didn't encounter any in testing, even when bumping it as high as python:3.11.14.

The Pagefind settings out of the box are pretty reasonable, but adding a few tags to the site's HTML templates gives it a big boost, such as always sorting results by the most recent issue date.

Cons

With a significant feature implementation like this, there will always be some negative consequences, no matter how much I enjoy saying "zero-cost abstraction" into the mirror. The biggest potential problem is likely the image bump from python:3.8.16-slim to python:3.9.25-slim due to the possibility of compatibility issues, though as I said I didn't encounter any.

Additionally, the indexing process does add time to every build of the website, around 3-5 seconds in my experience. Some kind of --dont-index flag could probably be added to the justfile to suppress the indexing step during development if it's a major issue, though it never bothered me.

Probably the biggest current issue is simply the design. I used the prebuilt search bar bundled with Pagefind and inserted it where the old search bar was before removal. The bar's horizontal spacing is a little awkward, and the search result text doesn't seem to change color in dark mode, making it nigh unreadable in that situation. Furthermore, it might make more sense to put the search function on its own page and link to that page in the header. All worthwhile questions, but I decided that trying to address them before starting a discussion was putting the cart before the horse, so I left the implementation as-is.

Replaces the old pelican-search function with [Pagefind](https://pagefind.app/) ([repo](https://github.com/pagefind/pagefind)), a Rust-based static site search binary.
Moves Pagefind indexing into a container to be system agnostic
@chris-t-jansen
Copy link
Contributor Author

I did some testing, and there is a way to get around bumping the build container's version, though I can't say I'm a huge fan of it. Basically, instead of installing the Python wrapper package for Pagefind, you can download a precompiled binary directly from Pagefind's GitHub releases. However, the catch here is that, instead of simply downloading a single package and calling it a day, you have to download the correct binary for the architecture of the image, which is dependent on the machine you're building on.

For example, when building on my Apple Silicon macOS machine, the architecture of the container is arm64, but when building on my Windows machine through WSL, the architecture is amd64. Handling just those two cases requires inserting a somewhat verbose case statement into the Dockerfile:

RUN apt-get install -y nodejs
RUN npm install -g sass juice

# determine arch type and install matching Pagefind binary
RUN dpkgArch="$(dpkg --print-architecture)"; \
    case "${dpkgArch##*-}" in \
		amd64) export PAGEFIND_ARCH_DL_LINK="https://github.com/Pagefind/pagefind/releases/download/v1.4.0/pagefind_extended-v1.4.0-x86_64-unknown-linux-musl.tar.gz" ;; \
		arm64) export PAGEFIND_ARCH_DL_LINK="https://github.com/Pagefind/pagefind/releases/download/v1.4.0/pagefind_extended-v1.4.0-aarch64-unknown-linux-musl.tar.gz" ;; \
	esac; \
    curl -L -o pagefind_extended.tar.gz $PAGEFIND_ARCH_DL_LINK; \
    tar -xvf pagefind_extended.tar.gz;

# pelican setup
COPY content content
COPY plugins plugins

Once that's properly installed, then you run the Pagefind binary directly from the justfile instead of through Python:

podman run -it \
	-v {{justfile_directory()}}/output-website:/usr/twir/output \
	twir:latest \
	./pagefind_extended --site /usr/twir/output

Personally, I don't like the approach as much as the Python wrapper for the simple reasons that it seems more prone to failure and more difficult to maintain. Upgrading the dependency is also a thoroughly manual process, since the names of the precompiled binaries aren't guaranteed to stay the same between releases. However, if bumping the build container version from python:3.8.16 to python:3.9.25 is a non-starter, then there is a way around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant