String Formatting and Security: A Cross-Language Minefield
String formatting is one of those operations that’s everywhere, and it’s more dangerous than most developers realise when user input gets involved. Every language provides multiple ways to build strings from dynamic data, and each mechanism carries different security implications. From C’s printf family, where a format string bug can read and write arbitrary memory, to Python’s f-strings that can execute attribute lookups, the attack surface is broader than most people think. I wanted to map out the full landscape across languages, and what I found was that each mechanism breaks down in its own unique and sometimes surprising way.
C: Format String Vulnerabilities (CWE-134)
C’s printf family is the most dangerous string formatting API in any mainstream language. When user input controls the format string, an attacker can read from and write to arbitrary memory locations. The more I dug into the exploitation mechanics, the more impressed I was by how powerful these bugs are, a single printf call can become a full arbitrary read/write primitive.
The Vulnerable Pattern
#include <stdio.h>
void log_message(const char *user_input) {
// VULNERABLE: user input as format string
printf(user_input);
}
void log_to_file(FILE *logfile, const char *user_input) {
// VULNERABLE: same issue with fprintf
fprintf(logfile, user_input);
}
If user_input is "%x %x %x %x", printf reads values from the stack. If user_input is "%n", printf writes the number of bytes printed so far to a memory address taken from the stack. With careful crafting, an attacker can achieve arbitrary read/write. The classic attack chain is to leak ASLR addresses first, then pivot to code execution, a two-step approach that’s devastatingly effective according to the exploit literature.
The Fix
void log_message(const char *user_input) {
// SAFE: user input is always a data argument, never the format string
printf("%s", user_input);
}
The rule is absolute: never pass user-controlled data as the first argument to any printf-family function. Compiler warnings (-Wformat-security) catch the obvious cases, but indirect paths through function pointers or variadic wrappers can evade detection. Enabling -Wformat-security -Werror=format-security on every C project is a good baseline practice.
Advanced Exploitation
// Attacker sends: "%08x.%08x.%08x.%08x.%n"
// This reads 4 stack values, then writes to the address
// found at the 5th stack position
// With direct parameter access (POSIX):
// "%5$n" writes to the address at the 5th parameter position
// "%5$hn" writes a short (2 bytes) for more precise control
Format string attacks in C can bypass ASLR by first leaking stack addresses with %x, then using those addresses with %n to write to specific locations. Modern mitigations (RELRO, stack canaries) make exploitation harder but not impossible. The research consistently shows that if you can control a format string in C, code execution is usually achievable given enough patience and knowledge of the target binary.
Python: f-strings, format(), and Template
Python offers three string formatting mechanisms with different security profiles, and the differences matter more than most developers realise.
f-strings and str.format()
# f-strings evaluate expressions at runtime
name = request.args.get("name")
greeting = f"Hello, {name}" # Safe for simple interpolation
# But format() with user-controlled format strings is dangerous
template = request.args.get("template")
result = template.format(user=current_user) # VULNERABLE
When template is {user.password} or {user.__class__.__mro__}, Python’s attribute access in format strings exposes internal object state. The format_map method is equally dangerous:
# Attacker sends template: {user.__init__.__globals__}
# This leaks the global namespace of the user object's module
template = request.form.get("msg")
output = template.format_map({"user": current_user})
What I find particularly insidious about this is that .format() looks completely harmless, it’s just string formatting, right? But the attribute access chain turns it into an information disclosure vulnerability. The {user.__init__.__globals__} technique can dump database credentials from a Flask app, and it’s well-documented in the Python security community. The disconnect between how innocent .format() looks and how dangerous it can be is what makes this a great example of a subtle bug.
string.Template, The Safer Alternative
from string import Template
template_str = request.args.get("template")
t = Template(template_str)
result = t.safe_substitute(name="Alice", role="admin")
string.Template only supports $variable substitution, no attribute access, no method calls, no expression evaluation. safe_substitute ignores missing keys instead of raising exceptions. This is the safest built-in option for user-controlled templates, and it’s what I’d recommend whenever someone asks how to handle user-provided format strings in Python.
Java: String.format() and MessageFormat
String.format()
// Safe: user input as argument
String msg = String.format("Hello, %s", userName);
// VULNERABLE: user input as format string
String template = request.getParameter("template");
String result = String.format(template, userData);
Java’s String.format is less dangerous than C’s printf, there is no %n equivalent that writes to memory. But a user-controlled format string can still cause denial of service:
// Attacker sends: "%1$s%1$s%1$s%1$s%1$s%1$s%1$s%1$s%1$s%1$s" (repeated 1000x)
// This creates an enormous string from a small input, consuming memory
String template = request.getParameter("fmt");
String result = String.format(template, sensitiveData);
// Also leaks sensitiveData.toString() through the format output
This DoS pattern can take down a Java service, the attacker sends a format string that expands a small input into gigabytes of output, and the JVM runs out of heap space. It’s a good reminder that even when memory corruption isn’t possible, format string control is still dangerous.
MessageFormat Injection
// VULNERABLE: user input in MessageFormat pattern
String pattern = request.getParameter("pattern");
String result = MessageFormat.format(pattern, userName, accountBalance);
MessageFormat supports choice formats and nested patterns that can expose arguments the developer didn’t intend to reveal. An attacker sending {1,number,currency} extracts the account balance even if the developer only intended to expose the username. This is a subtle one, I haven’t seen any SAST tool that catches it, and it’s the kind of thing that only surfaces during careful manual review.
Go: fmt.Sprintf and Template Injection
fmt.Sprintf
// Safe: user input as argument
msg := fmt.Sprintf("Hello, %s", userName)
// VULNERABLE: user input as format string
template := r.URL.Query().Get("fmt")
result := fmt.Sprintf(template, userData)
Go’s fmt.Sprintf does not have memory-write capabilities like C, but a user-controlled format string can leak data through %v (which calls the value’s String() method or dumps struct fields) and %+v (which includes field names):
type User struct {
Name string
Email string
Password string // should never be exposed
APIKey string // should never be exposed
}
// Attacker sends fmt=%+v
// Output: {Name:alice Email:alice@example.com Password:s3cret APIKey:ak_live_xxx}
When I tested this pattern, the %+v verb happily printed every struct field including the ones that should have been secret. If a debug endpoint accepts a format parameter, this becomes a trivial credential disclosure. The Go documentation doesn’t warn about this security implication, which is part of why it catches people off guard.
html/template vs. text/template
// SAFE: html/template auto-escapes
import "html/template"
tmpl := template.Must(template.New("page").Parse(userTemplate))
// VULNERABLE: text/template does NOT escape
import "text/template"
tmpl := template.Must(template.New("page").Parse(userTemplate))
Using text/template where html/template is needed is a common Go mistake. The APIs are nearly identical, and the import path is the only difference. A reviewer scanning for template.Must won’t catch this without checking the import. This is a good candidate for a Semgrep rule, the pattern is mechanical and easy to encode.
JavaScript: Template Literals and eval-Adjacent Patterns
Template Literals
// Safe: template literal with simple interpolation
const greeting = `Hello, ${userName}`;
// VULNERABLE: constructing code with template literals
const filter = req.query.filter;
const code = `return items.filter(item => item.${filter} > 0)`;
const fn = new Function('items', code);
const result = fn(items);
JavaScript template literals are not inherently dangerous for simple interpolation. The danger arises when template literals are used to construct code that is then evaluated. The new Function() constructor is effectively eval with a different syntax. I’ve seen this pattern in Node.js codebases where developers were building dynamic query filters, they thought template literals were “safe” because they weren’t using eval directly, but the effect is identical.
Server-Side Template Injection
// Express with EJS, VULNERABLE if user controls template
const ejs = require('ejs');
const template = req.body.template;
const html = ejs.render(template, { user: currentUser });
EJS, Pug, Handlebars, and other Node.js template engines execute code within their template syntax. User-controlled templates lead to remote code execution:
// Attacker sends template: <%= process.mainModule.require('child_process').execSync('id') %>
Full RCE from a template parameter. The attack payload is well-documented and trivial to execute, which is what makes SSTI one of the more alarming vulnerability classes when it shows up.
Detection Strategies
Static Analysis
- C format strings:
-Wformat-security(GCC/Clang) catches direct cases. cppcheck and clang-tidy flagprintf(variable)patterns. - Python format injection: Bandit B608 catches some SQL-related format strings. Semgrep rules can match
.format(with user-controlled receivers. - Java format strings: SpotBugs FORMAT_STRING_MANIPULATION catches
String.formatwith tainted format arguments. - Go template confusion:
go vetdoes not distinguish betweenhtml/templateandtext/template. Custom Semgrep rules are needed.
Manual Review Checklist
Here’s what’s worth looking for when reviewing code for format string issues:
- Search for all format string functions:
printf,sprintf,String.format,fmt.Sprintf,f"...", template literal backticks. - For each occurrence, verify the format string is a compile-time constant, not derived from user input.
- Check template engine usage, verify user input never controls the template itself, only the data passed to it.
- Audit
new Function(),eval(), andexec()calls for user-controlled string construction.
Remediation
The universal fix: treat format strings as code, not data. This is the principle everything else follows from.
- C: Always use
printf("%s", user_input), neverprintf(user_input). Enable-Wformat-security -Werror=format-security. - Python: Use
string.Templatefor user-controlled templates. Never pass user input to.format()or f-strings as the template itself. - Java: Keep format patterns as constants. Use parameterized logging (
log.info("User {} logged in", username)) instead ofString.format. - Go: Use
html/templatefor any output that reaches a browser. Validate format strings against an allowlist if they must be dynamic. - JavaScript: Never construct code strings from user input. Use sandboxed template engines with auto-escaping enabled by default.