Is there anything preventing wp2static being run externally? #760
Replies: 9 comments
-
Great thinking, @petewilcock! (thanks for the recent donation, btw, a big help!) If it could be done as a standalone add-on, that would be awesome. The @john-shaffer may have some thoughts on this, as he's integrated WP2Static for https://staticweb.io, which is all AWS stack and could help their use case, too... I've got an Issue open to revisit async parallel requests while crawling, which should speed things up (but use more resources on host!). I haven't tried Batch yet, but quite familiar with Lambda, so should be able to help test things. The only short term changes expected in core and all the addons is around code quality tooling, code improvements, so shouldn't break any interfaces besides getting all using Guzzle, as core is now doing in latest builds. SiteSauce is an external service, which uses some a WP plugin to get inside info, not sure if that may give any ideas. WP dev site on AWS in same region I'd assume as ideal setup to work with this? Maybe can have a 1-click deployable infra eventually for the ideal AWS setup for non-public dev WP site, the external crawling and deployment through to S3/CloudFront with some Lambdas thrown in for forms/search handling? Can you ELI5 for me the Fargate and serverless MySQL setup you use? Exciting stuff! |
Beta Was this translation helpful? Give feedback.
-
So, I've written a custom Terraform module where I provide a couple of attributes and it deploys a Cloudfront-enabled S3-backed website, along with the ECS cluster/service config for a dockerised Wordpress. On launch the container automatically checks for and installs wp2static and populates config variables into wp_options (fed in from Terraform on deployment) to preconfigure the plugin with the correct values. I toggle the scale-up of the Wordpress container, which runs serverless on ECS and connects to a serverless RDS database. Start-up time is < 1 minute and costs essentially $0.00 when not in use. EFS backs the Wordpress files and is attached to the container on launch. So I do my modifications, fire off wp2static, and then shut it down again. Means I don't have to host the Wordpress install locally and all the target files are already in AWS. There's a lot more going on in the Terraform module which sets up the AWS environment nicely but that's basically it. Currently I have to run the plugin a couple of times before it deploys without something erroring and it's probably a resource issue somewhere (PHP errors don't emit into container logs - but I need to fix that) My main question for you is: Does the crawl or process URLs need any inside info about Wordpress to work? At least, anything that can't be fired off to an external process to start the jobs. I'm imagining a process like this:
Whole process lives in AWS end-to-end and should be super fast. Any failures could be logged to a fail-queue for optional retries of just the failed parts of the run. I know a lot about AWS but virtually nothing about creating Wordpress plugins, so if you can give me a heads up either with the updated boilerplate or a few pointers I'm sure I can figure out the rest :) |
Beta Was this translation helpful? Give feedback.
-
Super cool setup!
Wonder if adding a hook within WP2Static's logger class may be of use there... ie post copy of log to CloudWatch API?
WP Site and destination URLs, mainly. For your setup, my gut feel is to advise not to use WP2Static and just do an outside-in crawl, like I do with Appi.sh. I think in the repo's issues, I've listed some better CLI tools than If really wanting to go the WP2Static route, I'd encourage you to do my work for me and rewrite the crawling engine to do parallel/async requests :D That should achieve close to same goal without bending it too far from what it's setup to do currently. Thinking out loud, if I had to do it with WP2Static like that, one way would be to use the WP_CLI commands and just have WP2Static do the Without automated test coverage (which I'm working on), I can't easily remember the crawl mechanism differences between WP2Static and Static HTML Output, but there's some logic to crawl and parse URLs from CSS, XML files, etc and feed those back into the crawl. @john-shaffer's been doing some great work with the https://github.com/leonstafford/wp2static-addon-advanced-crawling add-on, which helps for WP2Static. Back to the non-WP2Static version, using WP2Static's initial crawl could still be an advantage, but if you were doing your own discovery of new URLs while crawling, the time saving may not be that impressive. Cost of scraping with Lambda could be high. Outside the box idea is to really make WP a static site generator (now thinking WP2Static isn't quite one, as it's crawling mostly from outside-in). So, using output buffering and calling the same WP functions which generate pages, but don't make the requests via a web server at all, just drive it all from PHP. The requests are the slowest part of the whole thing, as you're seeing. Anyway, plenty of options and I think you've done the hardest parts already, with a reproducible, optimized end to end environment, that's really cool! With your spin-up/down of environments, are you wearing cost for the at-rest containers/custom images? That was something I couldn't solve years back with AWS. Azure looked promising for a bit, but still accrued costs. If there's a way to offer solid end to end remote dev site + deployment site hosting like you're doing (for free or a few $ a month), I'd love to throw some support behind it! |
Beta Was this translation helpful? Give feedback.
-
Oh damn, now you've really got me thinking.... If all permutations of possible Wordpress URLs are calculable (and the advanced crawler is mostly seeking out linked assets embedded within pages) and we can make WP simply export the generated index.html code of all the pages, then post-processing and crawling could become completely unnecessary? i.e. Could we override the value of wp_home/site used when WP page generate function is called? No need for post-crawl rewrites in this scenario? I guess we lose the cleanup of comments and other bloat, but park that for the moment. And then, thinking in purely AWS terms, crawling can be avoided because all of the image/css/font assets you'd want to grab already exist on the EFS volume and simply need to be synced over to the S3 target. Sync being the key word there as I might have oversimplified and missed something in there, not fully appreciating how each stage of the current process works, but feels possible? Now, if I was sticking with the wp2static route, I'd probably avoid curling the public URLs from my crawler function. If executed entirely within a VPC none of the traffic would need to go outbound to the public internet via the internet gateway - as well as being much faster you'd save on transfer costs. It'd just be a slight adjustment of the curling logic and ECS tasks can use service discovery to provide a static local endpoint for Wordpress. Think of it like crawling your localhost, much faster and the internet doesn't need to get involved. Sorry I'm rambling away here at the possibilities, but to answer your other question, the only costs at rest are EFS storage costs for Wordpress files (a maximum of $0.30 per gb-month if I'm not using an infrequent access storage class), and extremely minimal ECR storage costs for the container so it's there when I want to deploy it (barely a couple of cents a month). So... completely negligible. Main thing to bear in mind with my set-up is the running costs whilst you're editing pages - ECS is really cheap to run Wordpress for my posts-edit time but my choice of Aurora Serverless is comparatively expensive if I were to keep it running for a long time (kind of like lambda itself). Now, if I wanted to shave down costs further (albeit losing the lovely benefits of RDS with its auto-backups), I could just run mySQL as a secondary container in ECS and have Wordpress talk to that. Again EFS could host the underlying files of the database so we're really talking barely a few cents. As I mentioned there's more to my module than just the containers - it also sets up a CodeBuild pipeline to pull the source image of Wordpress from Dockerhub, rebake it with my customisations, and push it into ECR ready for my deployment. I published a tiny piece of that recently in this module, but I should really tidy up and package the whole thing for general use as it sounds like some people might get a lot out of it. |
Beta Was this translation helpful? Give feedback.
-
Awesome sauce! So, the WP pure SSG option hasn't been trialed yet, so could be some unknowns there... We should be able to mock the Site URL and such. Other option could be to configure the internal DNS in that section to resolve the production domain to that localhost, so the site actually uses prod domain and figure out how to transform that when working on dev site... The only real hesitation I had to attack that before was conflict with other plugins/themes which may be using output buffering, as I believed that could take precedence over our output buffering, but I say that with no confidence in my knowledge :D Big question now: who are the users? Making something for others creates more value than just yourself, so I guess some scenarios:
How long is current setup, from say a new AWS account and how many steps required for a beginner to get up and running? How to upgrade base image (less important with non-public instances)? Import export tools abound, so manually exporting and importing into newer image should be acceptable. I'm excited to support this either way. You'll see more excitement from me if it's targeted at either/both:
Very excited :D |
Beta Was this translation helpful? Give feedback.
-
Started testing that non-webserver option, seems promising. ie will stick notes in https://github.com/leonstafford/real-wp-ssg-test |
Beta Was this translation helpful? Give feedback.
-
Ok let's have a chat! Hit me up on email :) |
Beta Was this translation helpful? Give feedback.
-
I'm a bit allergic to meetings, so let's see how that goes or email back n forth for a bit. Some nice tips from @szepeviktor here: https://github.com/leonstafford/real-wp-ssg-test/issues/1 |
Beta Was this translation helpful? Give feedback.
-
Both of you must have a WP.org account, then you can chat on Slack! https://make.wordpress.org/chat/ |
Beta Was this translation helpful? Give feedback.
-
I've been thinking about scalability for the wp2static jobs of crawling, processing, and deploying.
For an increasingly large site this takes a long time, and has to run on the host machine of the Wordpress installation and seems susceptible to OOM or general resource contention.
I'm interested in investigating how to externalise these processes (e.g in AWS Lambda, or possibly AWS Batch) to fan out resources rapidly to get a site examined, crawled, and processed. What do you think @leonstafford ?
I'm an eat/sleep AWS dude so I'm happy to invest some time into looking at this. My current Wordpress staging set-up uses ECS Fargate and Serverless mySQL so maximum cheapness in pushing static updates, so it'd be great to iterate on it further.
Beta Was this translation helpful? Give feedback.
All reactions