Secrets Scanning

What Is Secrets Scanning?

Secrets scanning is the automated practice of detecting sensitive credentials — API keys, database passwords, OAuth tokens, private keys, connection strings, and other authentication materials — that have been accidentally committed to source code repositories. When a developer inadvertently pushes a secret to a Git repository, it becomes part of the permanent commit history and is potentially visible to anyone with repository access, including in public repositories where the entire internet can see it.

The problem is both widespread and dangerous. A 2023 GitGuardian report found that over 10 million new secrets were exposed in public GitHub commits in a single year. Automated bots continuously scan public repositories for exposed credentials, and studies have shown that a newly committed AWS access key is discovered and exploited within minutes. Even in private repositories, exposed secrets create significant risk — a single compromised account or a future decision to open-source the code can turn a private secret into a public breach.

Secrets scanning tools address this risk by scanning repository contents, commit history, pull requests, and CI/CD logs for patterns that match known secret formats. AWS access keys follow a recognizable AKIA prefix pattern. GitHub personal access tokens start with ghp_. Stripe API keys, Google Cloud credentials, and SSH private keys all have distinctive signatures that automated scanning can detect with high accuracy.

How It Works

Secrets scanners use a combination of pattern matching (regular expressions tuned to known credential formats) and entropy analysis (detecting high-randomness strings that look like generated secrets) to identify exposed credentials.

Here is an example of accidentally committed secrets in application code:

# Vulnerable: hardcoded credentials in source code
import boto3

AWS_ACCESS_KEY = "AKIAIOSFODNN7EXAMPLE"
AWS_SECRET_KEY = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

s3_client = boto3.client(
    's3',
    aws_access_key_id=AWS_ACCESS_KEY,
    aws_secret_access_key=AWS_SECRET_KEY
)

A secrets scanner detects the AKIA prefix and the high-entropy strings, flagging both lines. The fix uses environment variables or a secrets manager:

# Fixed: credentials loaded from environment variables
import os
import boto3

s3_client = boto3.client(
    's3',
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY']
)

For production environments, a dedicated secrets manager is the proper solution:

# Production-grade: retrieve secrets from AWS Secrets Manager
import boto3
import json

def get_db_credentials():
    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId='prod/database/credentials')
    return json.loads(response['SecretString'])

creds = get_db_credentials()
connection = psycopg2.connect(
    host=creds['host'],
    user=creds['username'],
    password=creds['password']
)

Secrets scanning integrates into multiple stages of the development workflow. Pre-commit hooks catch secrets before they enter the repository:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks

CI/CD pipeline scanning catches anything that bypasses pre-commit hooks:

# GitHub Actions: scan every push and pull request
name: Secrets Scan
on: [push, pull_request]

jobs:
  gitleaks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for scanning all commits
      - uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

GitHub’s built-in secret scanning feature also monitors pushes to repositories and can block pushes containing detected secrets when push protection is enabled.

Why It Matters

Exposed secrets are one of the most common and immediately exploitable security failures in modern software development. Unlike vulnerabilities that require a complex attack chain, an exposed credential gives an attacker direct, authenticated access to whatever system the credential protects — cloud infrastructure, databases, third-party APIs, payment processors, or CI/CD pipelines.

The consequences of secret exposure scale with the privilege level of the compromised credential. An exposed read-only API key might leak data. An exposed AWS root account key can lead to complete cloud infrastructure compromise, cryptomining charges, data exfiltration, and ransomware deployment. The Uber breach in 2022 began with credentials found in a PowerShell script on an internal network share.

Once a secret is committed to a Git repository, removing it is far more complex than deleting the file. The secret persists in the commit history indefinitely unless the history is rewritten — a destructive operation that affects every developer working on the repository. Even after history rewriting, the secret may persist in forks, cached CI builds, pull request diffs, and backup systems. The only truly safe response to an exposed secret is to rotate it immediately: revoke the compromised credential and issue a new one.

This is why prevention through scanning is far more valuable than detection after the fact. Catching a secret in a pre-commit hook before it enters the repository eliminates the need for credential rotation, history rewriting, and incident investigation entirely.

Best Practices

Implement pre-commit hooks as the first line of defense. Tools like Gitleaks, detect-secrets, and truffleHog can scan staged changes before they are committed, preventing secrets from entering the repository at all.
Enable push protection on your Git hosting platform. GitHub, GitLab, and Bitbucket all offer server-side scanning that blocks pushes containing detected secrets, catching anything that bypasses client-side hooks.
Scan the full commit history, not just the latest code. A secret committed and then deleted in a subsequent commit is still in the repository history. Full-history scanning detects these buried secrets.
Rotate compromised secrets immediately upon detection. Do not simply remove the secret from the code and consider it fixed. Revoke the exposed credential and issue a replacement before removing the code reference.
Use a secrets manager for all production credentials. Tools like HashiCorp Vault, AWS Secrets Manager, and Azure Key Vault centralize credential storage, enforce access controls, enable rotation, and maintain audit logs.

Common Mistakes

Relying solely on .gitignore to protect secrets. .gitignore prevents files from being tracked, but it does not protect secrets committed before the ignore rule was added, secrets embedded in tracked files, or secrets in configuration files that developers do not think to ignore.
Committing secrets to “private” repositories and assuming they are safe. Private repositories are only as secure as the access controls around them. A leaked personal access token, a misconfigured fork, or a future open-sourcing decision can expose every secret in the history.
Removing the secret from code without rotating the credential. Deleting the file or line containing the secret does not revoke the credential. The secret remains in Git history and may already have been harvested by automated scanners. Always rotate first, then clean up the code.
Using environment variables in Dockerfiles or CI configuration visible to logs. While environment variables are better than hardcoded secrets, exposing them in Dockerfile ENV instructions, CI log output, or build arguments can leak them through different channels. Use build-time secret mounting or runtime secrets injection instead.