Python Security Pitfalls Every Developer Should Know
I’ve spent a lot of time reviewing Python codebases, and the language’s readability and rapid development cycle are exactly what make it dangerous. Python is the default choice for web services, data pipelines, and automation scripts, and that same ease of use hides security pitfalls that experienced developers walk into regularly. The language’s dynamic nature, runtime evaluation, duck typing, implicit conversions, and powerful serialization, creates attack surfaces that simply don’t exist in statically typed languages. In this post, I want to cover the Python-specific anti-patterns that lead to real vulnerabilities, from the well-known pickle deserialization trap to the subtle template injection that can survive code review.
Pickle Deserialization: Arbitrary Code Execution by Design
Python’s pickle module serializes and deserializes arbitrary Python objects. The deserialization process reconstructs objects by calling their __reduce__ method, which can execute arbitrary code. Here’s the thing, this is not a bug. It’s the documented behaviour. Loading untrusted pickle data is equivalent to running eval() on attacker-controlled input, and the more I dug into how pickle works internally, the more alarming the implications became.
The Easy-to-Spot Version
import pickle
from flask import Flask, request
app = Flask(__name__)
@app.route("/load", methods=["POST"])
def load_data():
data = pickle.loads(request.data)
return {"status": "loaded", "keys": list(data.keys())}
Any SAST tool flags pickle.loads on untrusted input. The fix is obvious: use JSON, MessagePack, or another format that doesn’t execute code during deserialization. It still shows up in production, though, which says something about how easy it is to reach for the convenient option.
The Hard-to-Spot Version
import shelve
from flask import Flask, request
app = Flask(__name__)
@app.route("/cache", methods=["POST"])
def update_cache():
key = request.form["key"]
value = request.form["value"]
with shelve.open("/tmp/app_cache") as db:
db[key] = value
return {"status": "cached"}
@app.route("/cache/<key>")
def get_cache(key):
with shelve.open("/tmp/app_cache") as db:
return {"value": db.get(key, "not found")}
This is the one that really surprised me when I first encountered it. shelve uses pickle internally. If an attacker can write to the shelf file (via the /cache POST endpoint or by replacing the file on disk), the shelve.open and subsequent reads deserialize pickle data. The word “pickle” never appears in the source code, so reviewers and SAST tools that grep for pickle.loads miss it entirely. I’ve since found this pattern hiding behind multiprocessing (for sending objects between processes), xmlrpc.client, and some caching backends too. It’s one of those things that once you know to look for, you start seeing everywhere.
How Other Languages Handle This
Java’s ObjectInputStream has a similar problem, deserialization triggers class constructors and readObject methods. But Java’s ecosystem has developed defenses: serialization filters (ObjectInputFilter since JDK 9), allowlists, and libraries like Apache Commons IO’s ValidatingObjectInputStream. Python has no built-in deserialization filter for pickle, which is a notable gap.
// Java: Deserialization with a filter (JDK 9+)
ObjectInputFilter filter = ObjectInputFilter.Config.createFilter(
"java.util.HashMap;java.lang.String;!*"
);
ObjectInputStream ois = new ObjectInputStream(inputStream);
ois.setObjectInputFilter(filter);
Object obj = ois.readObject();
Python’s defence is simpler but blunter: don’t use pickle for untrusted data. Use json, msgpack, or Protocol Buffers.
eval(), exec(), and compile(): The Dynamic Execution Trap
Python’s eval() evaluates an expression and returns the result. exec() executes arbitrary statements. Both accept strings, and both are frequently used with user-controlled input in ways that create code injection vulnerabilities. These keep showing up in “quick and dirty” internal tools that somehow made it to production.
The Easy-to-Spot Version
from flask import Flask, request
app = Flask(__name__)
@app.route("/calc")
def calculate():
expr = request.args.get("expr", "0")
result = eval(expr)
return {"result": result}
An attacker sends expr=__import__('os').system('id') and gets remote code execution. Every security scanner flags this.
The Hard-to-Spot Version
from flask import Flask, request
app = Flask(__name__)
ALLOWED_OPS = {"+", "-", "*", "/", "(", ")", ".", " "}
@app.route("/calc")
def calculate():
expr = request.args.get("expr", "0")
if all(c.isdigit() or c in ALLOWED_OPS for c in expr):
result = eval(expr)
return {"result": result}
return {"error": "invalid expression"}, 400
The character allowlist looks safe, only digits and arithmetic operators. But what I found particularly interesting when I started experimenting with this is that the allowlist includes . and parentheses. An attacker can construct (1).__class__.__bases__[0].__subclasses__() using only allowed characters (digits, dots, parentheses) to enumerate all loaded classes and find one that provides code execution. The character-level filter is insufficient because Python’s object model is navigable through attribute access using only dots and parentheses. It’s a great example of how Python’s dynamism works against you in security contexts.
Comparison: Go’s Approach
Go has no eval() equivalent. There is no way to execute arbitrary Go code at runtime from a string. This eliminates an entire class of vulnerabilities by language design, and honestly, it’s a trade-off worth appreciating.
// Go: No eval(), must parse and evaluate explicitly
package main
import (
"fmt"
"go/ast"
"go/parser"
"go/token"
)
func safeEval(expr string) (float64, error) {
tree, err := parser.ParseExpr(expr)
if err != nil {
return 0, err
}
return evalAST(tree)
}
func evalAST(node ast.Expr) (float64, error) {
switch n := node.(type) {
case *ast.BasicLit:
// parse numeric literal
var val float64
fmt.Sscanf(n.Value, "%f", &val)
return val, nil
case *ast.BinaryExpr:
left, err := evalAST(n.X)
if err != nil {
return 0, err
}
right, err := evalAST(n.Y)
if err != nil {
return 0, err
}
switch n.Op {
case token.ADD:
return left + right, nil
case token.SUB:
return left - right, nil
}
}
return 0, fmt.Errorf("unsupported expression")
}
The Go approach forces explicit parsing and evaluation of only the operations you intend to support. There’s no shortcut that accidentally enables code execution.
Server-Side Template Injection (SSTI)
Python’s template engines (Jinja2, Mako, Django templates) are powerful enough to execute arbitrary code if user input is rendered as a template rather than as data within a template. SSTI is one of those vulnerability classes that I find fascinating because the attack surface is so non-obvious, you’d never guess that a template engine could be an RCE vector until you see it in action.
The Easy-to-Spot Version
from flask import Flask, request
from jinja2 import Template
app = Flask(__name__)
@app.route("/greet")
def greet():
name = request.args.get("name", "World")
template = Template(f"Hello, {name}!")
return template.render()
The user input name is interpolated into the template string before Jinja2 compiles it. An attacker sends name={{config}} and Jinja2 evaluates {{config}}, leaking the Flask configuration (including SECRET_KEY). Sending name={{''.__class__.__mro__[1].__subclasses__()}} enumerates all loaded classes. From information disclosure to full RCE, the escalation path is well-documented in the research literature.
The Hard-to-Spot Version
from flask import Flask, request, render_template_string
app = Flask(__name__)
ERROR_TEMPLATES = {
"not_found": "The resource '{{ name }}' was not found.",
"forbidden": "Access to '{{ name }}' is denied.",
}
@app.route("/error")
def show_error():
error_type = request.args.get("type", "not_found")
name = request.args.get("name", "unknown")
template_str = ERROR_TEMPLATES.get(error_type)
if template_str is None:
template_str = f"Unknown error for '{name}'."
return render_template_string(template_str, name=name)
When error_type matches a known key, name is safely passed as a template variable. But when error_type is unknown, the fallback uses an f-string that interpolates name directly into the template string. The attacker sends type=unknown&name={{config}} and gets SSTI. What makes this one particularly tricky is that the vulnerability only exists in the error path, which reviewers often skim. It’s a good reminder to audit error handling with the same rigor as the happy path, that’s where some of the most interesting bugs hide.
Comparison: Rust’s Approach
Rust’s Tera template engine compiles templates at build time or from trusted files. There’s no render_string_from_user_input pattern in idiomatic Rust web frameworks, which is exactly how it should be.
// Rust (Actix-web + Tera): Templates are loaded from files, not user input
use actix_web::{web, App, HttpServer, HttpResponse};
use tera::Tera;
async fn greet(
tmpl: web::Data<Tera>,
query: web::Query<std::collections::HashMap<String, String>>,
) -> HttpResponse {
let mut ctx = tera::Context::new();
ctx.insert("name", query.get("name").unwrap_or(&"World".to_string()));
let rendered = tmpl.render("greet.html", &ctx).unwrap();
HttpResponse::Ok().body(rendered)
}
The template is a file on disk (greet.html), not a string constructed from user input. User data is always passed as context variables, never as template source.
os.path Traversal and open() Pitfalls
Python’s open() accepts any path string. Combined with user input and insufficient validation, this creates path traversal vulnerabilities. These show up depressingly often in file-serving endpoints.
The Vulnerable Pattern
import os
from flask import Flask, request, send_file
app = Flask(__name__)
UPLOAD_DIR = "/var/app/uploads"
@app.route("/files/<filename>")
def get_file(filename):
path = os.path.join(UPLOAD_DIR, filename)
if not os.path.exists(path):
return {"error": "not found"}, 404
return send_file(path)
An attacker sends filename=../../etc/passwd. os.path.join("/var/app/uploads", "../../etc/passwd") resolves to /etc/passwd. The os.path.exists check passes, and the file is served. Reading through public bug bounty reports, this exact pattern accounts for a surprising number of findings.
The Fix That Is Still Broken
@app.route("/files/<filename>")
def get_file(filename):
if ".." in filename:
return {"error": "invalid path"}, 400
path = os.path.join(UPLOAD_DIR, filename)
return send_file(path)
Blocking .. is insufficient. On some systems, URL encoding (%2e%2e%2f) or double encoding bypasses the check. The correct approach uses os.path.realpath and verifies the resolved path starts with the intended directory:
@app.route("/files/<filename>")
def get_file(filename):
path = os.path.realpath(os.path.join(UPLOAD_DIR, filename))
if not path.startswith(os.path.realpath(UPLOAD_DIR) + os.sep):
return {"error": "invalid path"}, 400
return send_file(path)
Comparison: Java’s Path Handling
Java’s Path.normalize() and Path.startsWith() provide a cleaner API for the same check:
import java.nio.file.Path;
import java.nio.file.Paths;
public boolean isPathSafe(String userInput, String baseDir) {
Path base = Paths.get(baseDir).toAbsolutePath().normalize();
Path resolved = base.resolve(userInput).normalize();
return resolved.startsWith(base);
}
YAML Deserialization with PyYAML
PyYAML’s yaml.load() (without a Loader argument) uses FullLoader by default in recent versions, but older versions used Loader=yaml.Loader which allows arbitrary Python object construction, the same risk as pickle. Vulnerable yaml.load calls in configuration parsers can sit in production for years without anyone noticing.
The Vulnerable Pattern
import yaml
def load_config(path):
with open(path) as f:
return yaml.load(f, Loader=yaml.Loader)
A malicious YAML file can construct arbitrary Python objects:
!!python/object/apply:os.system
args: ['id']
The Safe Pattern
import yaml
def load_config(path):
with open(path) as f:
return yaml.safe_load(f)
yaml.safe_load() only constructs basic Python types (dicts, lists, strings, numbers). It rejects !!python/object tags entirely. Grepping for yaml.load without SafeLoader is one of those quick checks that’s always worth doing.
Comparison: Go’s YAML Handling
Go’s gopkg.in/yaml.v3 only deserializes into declared struct types. There’s no mechanism to construct arbitrary objects from YAML tags, which is a fundamentally safer design:
type Config struct {
Host string `yaml:"host"`
Port int `yaml:"port"`
}
func loadConfig(path string) (*Config, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, err
}
var cfg Config
err = yaml.Unmarshal(data, &cfg)
return &cfg, err
}
The struct definition acts as an implicit allowlist. Fields not declared in the struct are silently ignored.
Detection Strategies
| Tool | What It Catches | Limitations |
|---|---|---|
| Bandit | pickle.loads, eval, exec, yaml.load, subprocess with shell=True |
Does not follow data flow; misses indirect pickle usage (shelve, multiprocessing) |
| Semgrep | Pattern-based detection of SSTI, path traversal, deserialization | Requires rules for each pattern; custom rules needed for project-specific sinks |
| Pylint (security plugins) | Some dangerous function calls | Limited security-specific coverage |
| Safety / pip-audit | Known vulnerabilities in installed packages | Does not analyse source code |
| mypy (strict mode) | Type errors that may indicate unsafe casts | Not security-focused, but catches type confusion |
Manual Review Checklist
Here’s a checklist I’ve put together from reviewing Python codebases:
- Search for
pickle,shelve,marshal,xmlrpc, all use pickle internally. The indirect usage throughshelveis the one that catches people most often. - Search for
eval,exec,compile, check if any argument contains user input. Character allowlists don’t make these safe. - Search for
Template(andrender_template_string, verify user input is never part of the template source. Pay extra attention to error paths. - Search for
yaml.load, verifyLoader=yaml.SafeLoaderor useyaml.safe_load. This is a quick win. - Search for
os.path.joinwith user input, verify the resolved path is constrained to the intended directory usingrealpath. - Search for
subprocesscalls, verifyshell=Falseand arguments are passed as lists, not strings. - Search for
__import__andimportlib, dynamic imports with user-controlled module names enable code execution.
Remediation Patterns
Replace Pickle with JSON
import json
# Instead of pickle.loads(data)
data = json.loads(request.data)
# Instead of pickle.dumps(obj)
serialized = json.dumps(obj)
For complex objects that JSON cannot represent, use msgpack, protobuf, or dataclasses-json with explicit schema definitions. Writing a bit more serialization code is a small price compared to dealing with an RCE.
Use AST-Based Expression Evaluation
import ast
import operator
SAFE_OPS = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
}
def safe_eval(expr: str) -> float:
tree = ast.parse(expr, mode="eval")
return _eval_node(tree.body)
def _eval_node(node):
if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
return node.value
if isinstance(node, ast.BinOp) and type(node.op) in SAFE_OPS:
left = _eval_node(node.left)
right = _eval_node(node.right)
return SAFE_OPS[type(node.op)](left, right)
raise ValueError(f"Unsupported expression: {ast.dump(node)}")
Parse the expression into an AST and evaluate only the nodes you explicitly support. This is safe because the AST walker never calls eval(), it interprets the tree directly. This pattern works well in production for calculator-style features.
Secure Template Rendering
from flask import Flask, request, render_template
app = Flask(__name__)
@app.route("/greet")
def greet():
name = request.args.get("name", "World")
# Pass user input as a variable, never as template source
return render_template("greet.html", name=name)
Always use render_template (file-based) instead of render_template_string. If you must use render_template_string, never interpolate user input into the template string, pass it as a keyword argument. This is the golden rule for Flask template security.
Key Takeaways
- Pickle is
evalfor objects. Never deserialize untrusted pickle data. Audit for indirect pickle usage throughshelve,multiprocessing, andxmlrpc. The indirect usage is what catches most teams off guard. - Character-level allowlists do not make
evalsafe. Python’s object model is navigable through dots and parentheses alone. Replaceevalwith AST-based evaluation. - SSTI hides in error paths and fallback logic. Audit every code path that constructs template strings, not just the happy path. That’s where the interesting bugs tend to live.
os.path.joindoes not prevent traversal. Always resolve withos.path.realpathand verify the prefix. The..blocklist approach is not enough.yaml.safe_loadis the only safe YAML loader. The defaultyaml.loadwithLoader=yaml.Loaderallows arbitrary object construction. This is a quick grep-and-fix.- Python’s dynamic nature is the root cause. Features that make Python productive,
eval, pickle, dynamic imports, template rendering, are the same features that create security vulnerabilities. The defence is to avoid these features on untrusted input entirely, no matter how convenient they seem.