SAST Tools Compared: What They Catch and What They Miss

Static Application Security Testing (SAST) tools are the first line of automated defence against vulnerabilities in source code. They analyse code without executing it, looking for patterns that match known vulnerability classes. But here’s the thing, no single tool catches everything, and the differences between tools in detection capability, false positive rates, and language support are significant. I wanted to understand exactly where the gaps are, so I spent time running these tools against intentionally vulnerable code and comparing their output. This post is my honest assessment of what they actually catch, what they miss, and where manual review has to pick up the slack.

How SAST Tools Work

SAST tools generally use one or more of these techniques:

Pattern matching: Regex or AST-based rules that flag known dangerous function calls or code patterns. Fast but shallow.
Taint analysis: Tracks data flow from sources (user input) to sinks (dangerous functions). Catches injection vulnerabilities but requires modelling of the framework’s input sources.
Control flow analysis: Follows execution paths to detect issues like null dereferences, resource leaks, or unreachable code.
Type-state analysis: Tracks the state of objects through their lifecycle (e.g., “file opened” → “file read” → “file closed”) to detect protocol violations.

Most tools combine these techniques, with varying depth. The trade-off is always between precision (fewer false positives) and recall (fewer false negatives). What the research consistently shows is that most teams end up tuning toward fewer false positives because developers stop paying attention to noisy tools, which means real bugs slip through.

Python: Bandit vs. Semgrep

What Bandit Catches

Bandit is a Python-specific tool that scans AST nodes for known dangerous patterns. I’d recommend it as a starting point for any Python project because it just works out of the box.

# Bandit flags this: B608 (hardcoded SQL expressions)
query = "SELECT * FROM users WHERE name = '" + username + "'"
cursor.execute(query)

# Bandit flags this: B301 (pickle usage)
import pickle
data = pickle.loads(user_input)

# Bandit flags this: B602 (subprocess with shell=True)
import subprocess
subprocess.call("ls " + user_dir, shell=True)

Bandit reliably catches direct usage of dangerous APIs: pickle.loads, eval, exec, subprocess with shell=True, yaml.load without SafeLoader, and hardcoded passwords. Its strength is zero configuration, it works out of the box on any Python project.

What Bandit Misses

# Bandit does NOT flag this: indirect taint through a helper
def get_filter(params):
    return f"status = '{params['status']}'"

def query_orders(db, params):
    where = get_filter(params)
    db.execute("SELECT * FROM orders WHERE " + where)

# Bandit does NOT flag this: SSTI through Jinja2
from jinja2 import Template
template = Template(request.form.get("template_text"))
rendered = template.render(user=current_user)

Bandit lacks interprocedural taint analysis. When the dangerous concatenation happens in a helper function, Bandit sees the helper as safe (no direct SQL execution) and the caller as safe (no direct concatenation). This gap is well-documented in the Bandit issue tracker, and it’s the kind of thing that gives teams a false sense of security, they run Bandit in CI and assume they’re covered. Semgrep with custom rules can catch some of these patterns, which is what makes the two tools complementary.

Semgrep Advantages

Semgrep uses pattern-based matching with support for metavariables and taint tracking:

rules:
  - id: sql-injection-format-string
    patterns:
      - pattern: |
          $QUERY = f"...{$VAR}..."
          ...
          $DB.execute($QUERY, ...)
    message: "Potential SQL injection via f-string"
    severity: ERROR
    languages: [python]

Semgrep’s taint mode can follow data across function boundaries within a file, catching patterns that Bandit misses. However, cross-file taint tracking requires Semgrep Pro. What I find interesting about Semgrep is how well it lends itself to project-specific rules, once you identify a pattern your codebase is prone to, you can encode it once and catch it forever.

Java: SpotBugs vs. PMD vs. Semgrep

SpotBugs

SpotBugs analyses compiled bytecode, which gives it access to type information and interprocedural analysis that source-level tools lack. This is a real advantage that becomes obvious when you compare its results to pattern-based tools.

// SpotBugs catches: SQL_INJECTION_JDBC
String query = "SELECT * FROM users WHERE id = " + userId;
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(query);

// SpotBugs catches: COMMAND_INJECTION
Runtime.getRuntime().exec("cmd /c " + userInput);

// SpotBugs catches: PATH_TRAVERSAL_IN
new File(baseDir + "/" + request.getParameter("file"));

SpotBugs with the FindSecBugs plugin provides strong coverage for OWASP Top 10 vulnerabilities in Java. It understands Spring annotations, servlet APIs, and common Java frameworks. From what I’ve seen in benchmarks and my own testing, it’s the strongest free option for Java security scanning.

What SpotBugs Misses

// SpotBugs does NOT flag: deserialization through custom ObjectInputStream wrapper
public class SafeObjectInputStream extends ObjectInputStream {
    @Override
    protected Class<?> resolveClass(ObjectStreamClass desc) throws IOException, ClassNotFoundException {
        // "allowlist" that allows everything
        return super.resolveClass(desc);
    }
}

// SpotBugs does NOT flag: SSRF through RestTemplate with validated-looking URL
String baseUrl = config.getProperty("api.endpoint");
String path = request.getParameter("resource");
String url = baseUrl + "/api/" + path;
ResponseEntity<String> resp = restTemplate.getForEntity(url, String.class);

SpotBugs struggles with custom wrappers around dangerous APIs and with SSRF patterns where the URL construction looks intentional. PMD catches some style issues but has weaker security rules than SpotBugs. The SSRF gap is particularly interesting, when I started digging into why SAST tools miss it, I realised it’s because the URL construction pattern is indistinguishable from legitimate API client code without understanding the application’s intent.

Go: gosec vs. staticcheck

gosec

// gosec catches: G201 (SQL string formatting)
query := fmt.Sprintf("SELECT * FROM users WHERE name = '%s'", name)
rows, err := db.Query(query)

// gosec catches: G304 (file path from variable)
filename := r.URL.Query().Get("file")
data, err := os.ReadFile(filename)

// gosec catches: G401 (weak crypto)
h := md5.New()
h.Write([]byte(password))

gosec is effective at catching direct usage of dangerous patterns in Go. It understands the standard library well and flags weak cryptography, SQL injection, and file path issues. It’s a solid baseline for any Go project.

What gosec Misses

// gosec does NOT flag: SSRF through http.Get with constructed URL
func fetchResource(w http.ResponseWriter, r *http.Request) {
    resource := r.URL.Query().Get("url")
    if strings.HasPrefix(resource, "https://") {
        resp, err := http.Get(resource)
        // ...
    }
}

// gosec does NOT flag: race condition on shared map
var sessions = make(map[string]*Session)

func getSession(w http.ResponseWriter, r *http.Request) {
    id := r.Header.Get("X-Session-ID")
    session := sessions[id]  // concurrent map read without lock
    // ...
}

gosec does not perform taint analysis for SSRF and does not detect race conditions on shared data structures. staticcheck catches some concurrency issues but focuses more on correctness than security. The shared map race condition is one that shows up frequently in Go codebases, Go makes concurrency easy to write and easy to get wrong, and the tooling hasn’t fully caught up to that reality.

C/C++: cppcheck vs. clang-tidy

cppcheck

// cppcheck catches: bufferAccessOutOfBounds
char buf[10];
strcpy(buf, user_input);  // no bounds check

// cppcheck catches: nullPointer
char *ptr = NULL;
if (condition)
    ptr = malloc(100);
*ptr = 'x';  // possible null dereference

// cppcheck catches: memleak
char *data = malloc(1024);
if (error_condition)
    return -1;  // leak: data not freed

clang-tidy

clang-tidy provides deeper analysis through Clang’s AST and includes security-focused checks:

// clang-tidy catches: cert-err34-c (unchecked return from atoi)
int port = atoi(argv[1]);

// clang-tidy catches: bugprone-use-after-move
std::unique_ptr<Config> cfg = std::move(other_cfg);
other_cfg->validate();  // use after move

What Both Miss

// Neither catches: integer overflow leading to undersized allocation
size_t count = parse_header_count(input);  // attacker-controlled
size_t size = count * sizeof(struct Entry);  // overflow if count is large
struct Entry *entries = malloc(size);  // allocates too little
for (size_t i = 0; i < count; i++) {
    read_entry(input, &entries[i]);  // heap buffer overflow
}

Integer overflow leading to heap corruption requires understanding the relationship between the multiplication, the allocation, and the loop bound. Neither cppcheck nor clang-tidy models this end-to-end. This is where runtime tools like AddressSanitizer and fuzzers fill the gap. The research on this is pretty clear, fuzzing finds more C/C++ memory safety bugs per hour of compute than static analysis does, which isn’t a knock on the tools so much as a reflection of how hard it is to reason about memory safety statically.

The Coverage Gap

No single SAST tool provides complete coverage. The gaps fall into predictable categories that become obvious once you start mapping tool capabilities against vulnerability classes:

Gap Category	Example	Tools That Miss It
Cross-function taint	SQL injection via helper	Bandit, gosec
Business logic flaws	Insufficient authorization checks	All SAST tools
Race conditions	TOCTOU on shared state	Most SAST tools
Integer overflow chains	Overflow → undersized alloc → OOB write	cppcheck, clang-tidy
Configuration-dependent	CORS misconfiguration at runtime	All source-level tools
Deserialization gadgets	Pickle/ObjectInputStream chains	SpotBugs (partial)

Detection Strategies

Layered Tool Approach

The most effective strategy combines multiple tools, and the literature on this is consistent:

Language-specific linter (Bandit, gosec, cppcheck): Fast, low false-positive baseline.
Pattern-based scanner (Semgrep): Custom rules for project-specific patterns and framework-specific sinks.
Deep analysis (CodeQL, SpotBugs): Interprocedural taint tracking for complex data flows.
Runtime analysis (AddressSanitizer, ThreadSanitizer, fuzzing): Catches issues that static analysis cannot model.

What I found interesting when researching this is how dramatic the improvement in detection rates is when you layer tools compared to relying on a single one. Each tool catches a different slice of the vulnerability space, and the overlap is smaller than you’d expect.

Writing Custom Rules

When a SAST tool misses a pattern specific to your codebase, write a custom rule rather than relying on manual review to catch it every time:

# Semgrep rule for project-specific pattern
rules:
  - id: unsafe-template-render
    pattern: |
      Template($USER_INPUT).render(...)
    message: "Server-side template injection: user input in Template constructor"
    severity: ERROR
    languages: [python]

Custom Semgrep rules might be the highest-leverage security investment a team can make. You write the rule once, and it catches the pattern forever. The more I’ve worked with Semgrep, the more I’ve come to appreciate how well it bridges the gap between what generic tools catch and what your specific codebase needs.

Remediation

SAST tools are most effective when integrated into the development workflow:

Pre-commit hooks: Run fast linters (Bandit, gosec) on changed files before every commit.
CI pipeline: Run deeper analysis (Semgrep, SpotBugs) on every pull request.
Scheduled scans: Run comprehensive analysis (CodeQL) on the full codebase weekly.
Triage process: Every finding must be triaged as true positive, false positive, or accepted risk. Suppression comments should include justification.

The goal is not zero findings, it’s a consistent process that catches the easy bugs automatically and frees human reviewers to focus on the subtle ones. The teams that get this right are the ones that treat SAST as a complement to manual review, not a replacement for it. That distinction matters more than which specific tools you choose.