Tag Archives: caching

Speed up Azure DevOps pipelines with node_modules caching

I’ve been working with Azure DevOps a lot lately and despite the horrible name the product has really grown on me. The improvements to both the Microsoft hosted agents and the yaml pipelines have really made it a better product over the last few years.

I get the suspicion that GitHub and GitHub actions will replace pipelines eventually but that is a topic for another post.

Unfortunately, it can be hard to find good examples or documentation online for some common Azure DevOps pipeline problems since it is not the most popular CI solution on the market.

Recently we had some dotnet core microservices yaml build pipelines running on the Microsoft hosted agents and the builds were really fast. Usually builds are completing in under 2 minutes per run without going out of our way to optimize for speed.

A new microservice came alive and this one needed an angular front end in addition to the dotnet core api. This requirement brought along the need to manage npm modules and build the angular app along with the usual dotnet restore / build / publish steps. After introducing these new npm build steps the pipeline build times tripled.

You know what they say about node_modules…

It was time to spend some time optimizing the new pipeline.

The first thing I did was I changed the git checkout depth of the main repo. This usually brings in about 30 seconds to a minute of time saved. The speed benefits of this tweak will vary repo by repo.

Next, I followed the documentation for using pipeline caching in Azure DevOps to try to speed up the npm steps.

Inside this gist are the relevant parts of the steps I ended up using to cut the pipeline run times in half.

After setting up the variable for npm_config_cache the first cache step allows partial cache hits for the npm configuration files. This step is mostly taken directly from the Microsoft documentation linked above.

In my case the files which need to be hashed and used for cache validation were package.json and npm-shrinkwrap.json together. Depending on the project, package-lock.json might be the more appropriate file to hash.

The next caching step was the one where my google searches came up short and I had to figure it out through trial and error. I needed to cache node_modules for the project so that I could skip npm ci (similar to npm install) re-downloading all the modules. This was the most time consuming step of the pipeline changes. With this cache step there are no partial cache hits which is achieved by omitting the restoreKeys input. When there is a cache hit it will populate the pipeline variable MODULES_CACHE_RESTORED.

Later in the pipeline there is a condition on the npm ci step to check the variable MODULES_CACHE_RESTORED and if that variable is ‘true’ indicating a full cache hit, the npm ci and downloading of node_modules is skipped. All the modules come from the cache download which is a much shorter time than running the npm ci step.

The only downside to this approach is that when there is a node_modules cache miss and the project is configured to have npm install also build the project, the pipeline might build the project twice. Node_modules cache misses will happen whenever the node_module definition files change or the pipeline runs in a new branch for the first time.