Hugo: Happy Accidents & Hard Truths (Part 2 - The Pipeline)
Series: The Hugo Journey

Hugo: Happy Accidents & Hard Truths (Part 2 - The Pipeline)

“I love it when a plan comes together.” - The A-Team

And so, here we are onto part two. It was a bit delayed because a house remodel had to start, so I was away from the computer for a week. After a week, I kept trying to remind myself what I was working on and what was next on the Hugo series’ agenda. In this case, it was all about my code pipeline.

Why am I doing this

To get Hugo from my machine to a nice place in the cloud, I wanted an automated way to ensure it happened. Sure, I can test locally, make it all work, run a few commands, and poof, the site is alive in all its glory.

Like in part one, there are some goals that I need to achieve.

  • Simple to manage and scalable as my website grows
  • Get some basic testing in place so I don’t have to remember which testing commands are needed.

Getting the basic pipeline up was fairly straightforward. There are a few other bits, but this was the core piece needed so that the pipeline was watching for the latest merge to master:

WebsiteContentPipeline:
  Type: 'AWS::CodePipeline::Pipeline'
  Properties:
    Name: 'WebSite-Content-CI-CD-Pipeline'
    RoleArn: !GetAtt WebsiteContentPipelineServiceRole.Arn
    ArtifactStore:
      Type: S3
      Location: !ImportValue 'GDS-WEBSITE-CONTENT-WebsiteContentS3BucketName'
    Stages:
      - Name: Website_Content_Source_Code_Copy_Stage # Needed to get the code from Codecommit and to S3 for code build to verify
         Actions:
          - Name: Fetch_Website_Content_Packages_Code
            ActionTypeId:
              Category: Source
              Owner: AWS
              Provider: CodeCommit
              Version: '1' # Action provider version, nothing I control
            Configuration:
              RepositoryName: "gds-website-content"
              BranchName: master
              OutputArtifactFormat: CODEBUILD_CLONE_REF
            OutputArtifacts:
              - Name: Website_Content_Artifact #.zip of artifact pulled from CodeCommit

I’ll eventually have a full package online for review, but this should be good enough for now.

My buildspec is where much of the magic now lives. It does several things, including ensuring the build was pushed to S3 and that CloudFront’s cache was invalidated.

version: 0.2 # Current Code Build Version controlled by AWS

# Granite Dog Systems Website Content Buildspec for updating the website

phases:
  install:
    runtime-versions:
      golang: 1.24
      nodejs: 22 # add npm support
    commands:
      - echo "Install tools for testing and deployment"
      - n 22 # activate npm, without it, next command fails
      - npm install -g linkinator # check links in hugo for correctness
  
  pre_build:
    commands:
      - cd gds-site-source
      - echo "Set Binary executable permissions"
      - chmod 755 ./bin/hugo
      - echo "Print Hugo version for logs"
      - ./bin/hugo version
      - echo "Set max concurrent requests to improve system speed."
      - aws configure set default.s3.max_concurrent_requests 20

  build:
    commands:
      - echo "Building Hugo site..."
      - ./bin/hugo --gc --minify --baseURL "https://www.granitedogsystems.com/"
      - echo "Checking links for errors."
      - linkinator ./public --recurse --silent --root ./public --skip "https://www.facebook.com" --skip "https://www.granitedogsystems.com"
  
  post_build:
    commands:
      # We want all assets with a one-year expiry.  If they haven't changed, then Hugo will not change the fingerprint.  If they have, a new fingerprint = a new filename for browsers
      - echo "Sync Assets (CSS, Images, etc.) first."
      - aws s3 sync public/ s3://$WebsiteContentBucketName --delete --cache-control "max-age=31536000" --exclude "*.html" --exclude "health.txt" --no-follow-symlinks --exclude "proc/*" --exclude "sys/*" --exclude "dev/*"
      # This is the core content; if we have typos, whatever, we want users to come back every time for HTML, and we want health.txt never to be cached. 
      - echo "Sync HTML files."
      - aws s3 sync public/ s3://$WebsiteContentBucketName --delete --exclude "*" --include "*.html" --include "health.txt" --cache-control "max-age=0, no-cache, no-store, must-revalidate"
      # Brute force, but avoids complex CloudFront CLI commands
      - echo "Invalidating CloudFront Cache for new data to be uploaded."
      - aws cloudfront create-invalidation --distribution-id $CLOUDFRONT_DIST_ID --paths "/*"

# Enable codebuild caching (best effort) for Hugo to prevent rebuilding assets that haven't changed and for installed modules.
cache:
  paths:
    - 'gds-site-source/resources/_gen/**/*'
    - '/usr/local/lib/node_modules/**/*' # Caches the global npm folder

Breaking Down the Buildspec

Let’s break down what is going on in this CodeBuild, and then we’ll wrap up with the learnings that I’ll need to resolve in the future.

Originally, I had a submodule I used to compile Hugo whenever I needed to. This created a needless bind to GitHub that I didn’t much enjoy. Calling a third-party endpoint typically means a slower build and, frankly, a newer version could break my builds needlessly. This also ensures that my build will always succeed, even if GitHub is down or my build environment is throttled.

Instead, I grabbed the Extended ARM64 binary to leverage AWS CodeBuild cost savings (since my build infrastructure uses ARM/Graviton ). I committed it directly, so I always have the exact version I want, fully self-contained.

- echo "Set Binary executable permissions."
- chmod 755 ./bin/hugo
- echo "Print hugo version for logs."
- ./bin/hugo version

You’ll notice that I set a baseurl during Hugo’s minify operation.

./bin/hugo --gc --minify --baseURL "https://www.granitedogsystems.com/"

This is because when testing locally, I am using my laptop’s IP address so I can quickly check the look-up of my site on my phone, and, of course, it is not part of the core domain. By putting this in code commit, I can ensure that the files that Hugo generates have the correct information.

You might be wondering about these variables $WebsiteContentBucketName and $CLOUDFRONT_DIST_ID. These are being injected from my CloudFormation code that CodeBuild can reference. The ID is used in other areas of my code, and if it ever changes for whatever reason, I don’t have to update a bunch of different places.

Blessing the Build

In my buildspec, I make reference to using Linkinator. Its core function in life is to verify that my internal and external URL references remain valid.

- linkinator ./public --recurse --silent --root ./public --skip "https://www.facebook.com" --skip "https://www.granitedogsystems.com"

As part of the Linkinator setup, I have it configured to scan all files in the Hugo-generated ./public folder. This helps ensure I stay on top of older articles, letting me know when external references are broken and need to be updated. How many of us have been reading a tech article for whatever reason, and the links don’t work?

I am skipping two URLs, and one of them might raise a few eyebrows: my own site. Really, I need it to look at the public folder that Hugo generates to check what does or doesn’t look right. Plus, if my site is down for any reason, this shouldn’t break my build.

Next up is skipping checking Facebook. I include a link to an article, and they are notorious for blocking bots, etc., and they may treat my codebuilds as something nefarious. I don’t want Facebook blocking a build.

I did try to get some other automated testing working, but more on that later.

Beyond internal and external URL checks, it also verifies image, script, and URL fragment references. Really, a very nice utility that probably covers 90% of what every site needs checked before publishing.

Moving Site Data to the S3 Bucket

Moving on to S3, I created two commands to improve overall performance. Again, this is the beauty of a well-crafted CI/CID. Figure something out, and then I can leave it.

- aws s3 sync public/ s3://$WebsiteContentBucketName --delete --cache-control "max-age=31536000" --exclude "*.html" --exclude "health.txt" --no-follow-symlinks --exclude "proc/*" --exclude "sys/*" --exclude "dev/*"

# This is the core content, if we have typos, whatever, we want users to come back 
Every time for HTML, and we want health.txt never to be cached. 

- echo "Sync HTML files."
- aws s3 sync public/ s3://$WebsiteContentBucketName --delete --exclude "*" --include "*.html" --include "health.txt" --cache-control "max-age=0, no-cache, no-store, must-revalidate"

First, I sync everything except my health.txt and HTML content pages, and set the lifetimes of my assets (CSS/Images/Scripts) to one year. When Hugo does its minify magic, it’ll add a unique hash to each file’s filename. If this file never changes, it never gets touched again, and a user never needs to download it.

Those with eagle eyes may notice proc, sys, and dev in the sync command. I had an early bug where I accidentally started copying the entire OS folder structure into an S3 bucket. Adding these lines gives me a bit more protection if such a bug occurs in the future. That was kind of an expensive mistake I’d rather not repeat. Not talking 1000’s or 100’s of dollars, but I think my S3 bill was around $10, which should have been much cheaper.

Beyond just the bigger S3 storage bill. The bug was so bad that my code build ran for too long, exceeding my monthly free tier in AWS CodeBuild.

Where my core content is, the HTML, along with the health.txt files, should have no lifetime. I am notorious for needing to make tweaks or fix spelling errors. I don’t need people who see a mistake to see it for a year plus. Though if you keep looking at this page for a year, that’d be interesting in itself, but I digress.

Now, why do I not want the health.txt page with a lifetime? I want my monitoring to check CloudFront → S3 bucket to ensure that the flow is always working. If the health page is cached, I’ll only know that CloudFront is running, but not the calls all the way back to my S3 bucket. This is worth the extra cost for better peace of mind. I’ll dive deeper into monitoring in a future write-up.

Getting Pushed to CloudFront

Finally, we have CloudFront.

# Brute force, but avoids complex CloudFront CLI commands
- echo "Invalidating CloudFront Cache for new data to be uploaded."
- aws cloudfront create-invalidation --distribution-id $CLOUDFRONT_DIST_ID --paths "/*"

The above code snippet is very brute force for my CloudFront invalidation, but I tend to lean towards simplicity. Sure, I can create a complex command specifying what should or shouldn’t be invalidated, but in my opinion, it introduces headaches I don’t need right now since my website is fairly straightforward. Saving a few pennies for a site of this size would be offset by a micromanaged, hard-to-maintain invalidation command.

As part of any automated release train, one should include basic testing, but as you can see, only Linkinator is included.

linkinator ./public --recurse --silent --root ./public --skip "https://www.facebook.com" --skip "https://www.granitedogsystems.com"

Its job, for reference, is to verify that the links on my site function correctly, which is fairly critical. Again, overkill for what I need for a site of my size, but good habits are crucial.

Lessons Learned, and there is always Tomorrow.

I did try to get pa11y running, but struggled to get it working on my build host. For the record, I am using Amazon Linux 2023 (AL2023) running on ARM. To run correctly, pa11y needs Chromium , which, so far in my early testing, I have not been able to get running. As much as I like to chase speed, accessibility is a big thing for me, and I don’t want to leave screenreaders behind.

Lighthouse from Google had the same tie-in with Chromium. I wanted to get this running to ensure my website loads fairly quickly and to set a score that would cause a build to fail if it ever dips below a defined number.

When I ran my site through the Lighthouse engine (which powers PageSpeed Insights ), I got a score of 87 on mobile and 99 on desktops. The main fix was to improve the image compression Hugo applies when converting to WebP. I tried to fix the .css file loading order, but after multiple hours of trying, I ended up breaking the core of my site. 87 is good enough for now, but it’s a fix I’ll need to work on in the future.

An 87 is a great score IMO, and in general, users seeing my website on a phone shouldn’t experience any major issues. Chasing a vanity score doesn’t yield much return on my business investment, and I need to keep moving this project forward.

For both Lighthouse and pa11y, I ran them locally and made fixes, but I really need these to be part of the CI/CD pipeline. Eventually, I am going to put these in a custom Docker container, along with maybe a few other tools that I think any of the deployment pipelines can build.

I want my builds to be fast, and there is a hit every time I try to install utilities as part of every build, for example:

install:
  runtime-versions:
    golang: 1.24
    nodejs: 22 # add npm support
  commands:
    - echo "Install tools for testing and deployment."
    - n 22 # activate npm, without it, next command fails
    - npm install -g linkinator # check links in Hugo for correctness

Although fairly quick, adding more tools, etc., can start to negatively affect build times and, in my opinion, be prone to errors if versions change unexpectedly. Maybe Linkinator changes a flag and no longer runs, etc.

That said, I am using a caching flag in my build process as a best effort to minimize how often things need to be downloaded.

cache:
  paths:
    - 'gds-site-source/resources/_gen/**/*'
    - '/usr/local/lib/node_modules/**/*' # Caches the global npm folder

I say best effort since AWS can decide to run my build on a new host, which would remove the cache.

I also use CloudFormation to manage my infrastructure with AWS, so if you want to have this cache to actually work, this needs to be added as part of your code:

Type: 'AWS::CodeBuild::Project'
<other_code>      
Source:
        Type: CODEPIPELINE
        BuildSpec: 'buildspecs/buildspec.yaml'
        GitCloneDepth: 1 # needed to add submodule support so data is copied correctly
      Cache:
        Type: LOCAL
        Modes: 
          - LOCAL_CUSTOM_CACHE # Enables the 'cache' paths in the buildspec.yml

Final Thoughts

I think this mostly covers my journey getting Hugo into an official CI/CD pipeline. As stated earlier, I’ll be putting together a repository with the code to make this work for better reference. Because I am a weird guy, I want to build some automation to make the process more seamless. And yeah, I’ll write about how I make that work, too.

For now, my pipeline is working and ready for production. It ensures that when I merge my code, this code is pushed live within minutes.

On the side, I have been working on cleaning up some metrics, reporting, and, of course, security. Stay tuned for those upcoming articles.

Series: The Hugo Journey