Python Security Code Review Guide
1. Introduction
I put this guide together as a structured approach to security-focused code review for Python applications. Whether you’re just starting to identify security vulnerabilities in Python code or you’re an experienced developer looking for a language-specific checklist, I’ve tried to make it useful at both levels.
Python’s dynamic typing, rich standard library, and extensive third-party ecosystem make it enormously productive, but the more I reviewed Python codebases, the more I realised these same qualities introduce security pitfalls that static analysis alone cannot always catch. What follows covers manual review strategies, common anti-patterns, recommended tooling, and vulnerability patterns organised by class, with cross-references to the intentionally vulnerable examples in this project.
Audience: Security trainees, application developers, code reviewers, and anyone evaluating Python codebases for security weaknesses.
2. Manual Review Best Practices
2.1 Trace Data from Entry to Sink
Python web frameworks (Flask, Django, FastAPI) accept user input through request.args, request.form, request.get_json(), query parameters, headers, and cookies. The approach I’ve found most effective is to trace every external input to where it’s consumed, database queries, shell commands, file operations, HTTP requests, or rendered HTML. Any path from source to sink without validation or sanitisation is a potential vulnerability.
2.2 Inspect String Formatting in Sensitive Contexts
Python offers multiple string formatting mechanisms: f-strings, str.format(), % formatting, and concatenation. When any of these are used to build SQL queries, shell commands, HTML output, or URLs with user-controlled data, treat it as a high-priority finding. This is one of the first things I look for in any Python review.
2.3 Review Import Statements
Scan imports for modules known to introduce risk:
pickle,shelve,marshal, deserialisation of untrusted dataos.system,os.popen,subprocesswithshell=True, command injectioneval,exec,compile, arbitrary code executionyaml.loadwithyaml.Loaderoryaml.FullLoader, YAML deserialisation attacksxml.etree.ElementTree,lxml.etreewith entity resolution, XXE attacks
2.4 Check for Hardcoded Secrets
Search for string literals that look like passwords, API keys, tokens, or connection strings. Common patterns include variables named SECRET_KEY, API_KEY, PASSWORD, TOKEN, or connection strings containing ://user:pass@host.
2.5 Examine Error Handling
Look for bare except Exception blocks that expose stack traces, internal paths, or database details to the caller. In Flask, check for custom error handlers that return traceback.format_exc() or raw exception messages.
2.6 Evaluate Cryptographic Choices
Flag uses of hashlib.md5() or hashlib.sha1() for password hashing or integrity verification. Check for DES, ECB mode, or hardcoded encryption keys. Verify that random (not secrets or os.urandom) is not used for security-sensitive token generation.
2.7 Assess Access Control Logic
For every endpoint, verify:
- Authentication is checked before any business logic executes
- Authorisation checks use server-side session data, not client-supplied headers (e.g.,
X-User-Role) - Object-level access control prevents users from accessing resources belonging to others (IDOR)
2.8 Review Configuration Defaults
Check Flask app.config for DEBUG = True, permissive CORS (Access-Control-Allow-Origin: *), missing security headers, and verbose error responses in production code.
3. Common Security Pitfalls
3.1 Injection
| Anti-Pattern | Risk |
|---|---|
"SELECT * FROM t WHERE x = '" + user_input + "'" |
SQL injection via string concatenation |
"SELECT ... ORDER BY {} {}".format(col, order) |
SQL injection via unvalidated ORDER BY clause |
os.popen("ping -c 3 " + host) |
Command injection via os.popen |
subprocess.Popen(cmd, shell=True) |
Command injection when cmd includes user data |
os.system("tar czf /tmp/{}.tar.gz {}".format(name, path)) |
Command injection via os.system |
Rendering user input directly into HTML with .format() |
Stored and reflected XSS |
| JSONP callback without sanitisation | XSS via callback parameter |
3.2 Broken Access Control
| Anti-Pattern | Risk |
|---|---|
| Endpoint returns full user record (including SSN, salary) without field filtering | Excessive data exposure |
Authorisation check reads role from X-User-Role header instead of session |
Client-side role spoofing |
| No ownership check on document access/update | IDOR, any authenticated user can read/modify any resource |
| Debug/config endpoints with no authentication | Information disclosure of secrets and credentials |
3.3 Cryptographic Failures
| Anti-Pattern | Risk |
|---|---|
hashlib.md5(password.encode()).hexdigest() for password storage |
Weak, unsalted password hashing |
DES.new(key, DES.MODE_ECB) |
Deprecated cipher in insecure mode |
random.seed(int(time.time())) for token generation |
Predictable PRNG seeding |
Hardcoded ENCRYPTION_KEY = b"s3cr3t!!" |
Key exposure in source code |
3.4 Insecure Design
| Anti-Pattern | Risk |
|---|---|
| Plaintext passwords stored in dictionaries | No hashing at all |
| Error responses revealing usernames, email existence, or attempt counts | User enumeration |
| Password reset tokens returned in API response body | Token leakage |
traceback.format_exc() returned to client |
Stack trace information disclosure |
3.5 Security Misconfiguration
| Anti-Pattern | Risk |
|---|---|
app.run(debug=True) in production code |
Debug mode exposes interactive debugger |
app.config["SECRET_KEY"] = "changeme" |
Weak/default secret key |
Access-Control-Allow-Origin reflecting any Origin header |
Overly permissive CORS |
etree.XMLParser(resolve_entities=True, no_network=False) |
XXE via XML entity resolution |
Server header exposing framework and Python version |
Technology fingerprinting |
3.6 Vulnerable Components
| Anti-Pattern | Risk |
|---|---|
yaml.load(payload, Loader=yaml.FullLoader) |
YAML deserialisation (FullLoader still allows some unsafe constructs) |
Template(user_input) with Jinja2 |
Server-side template injection (SSTI) |
Outdated package versions in requirements.txt |
Known CVEs in dependencies |
3.7 Integrity Failures (Deserialisation)
| Anti-Pattern | Risk |
|---|---|
pickle.loads(untrusted_data) |
Arbitrary code execution via pickle |
yaml.load(data, Loader=yaml.Loader) |
Arbitrary object instantiation via YAML |
exec(compile(code, ...)) with remote code |
Remote code execution |
importlib.import_module(user_input) |
Arbitrary module loading |
urllib.request.urlretrieve(url, path) without validation |
Downloading and executing untrusted code |
3.8 Logging and Monitoring Failures
| Anti-Pattern | Risk |
|---|---|
Login response includes password and api_key fields |
Credential leakage in responses |
| No logging of failed authentication attempts | Brute-force attacks go undetected |
| No logging of privilege changes (role updates, deactivations) | Unauthorized changes are invisible |
| Bulk operations with no audit trail | Mass data exfiltration undetected |
| Data export endpoint returns raw passwords | Sensitive data in export payloads |
3.9 SSRF
| Anti-Pattern | Risk |
|---|---|
urllib.request.urlopen(user_url) without allowlist |
Unrestricted SSRF |
Blocklist checking only localhost and 127.0.0.1 (missing 0.0.0.0, [::1], decimal IPs) |
SSRF blocklist bypass |
Proxy endpoint accepting arbitrary base_url parameter |
Open proxy to internal services |
| Webhook callback URLs not validated against internal ranges | SSRF via webhook registration |
3.10 Race Conditions
| Anti-Pattern | Risk |
|---|---|
Check-then-act on self.balance without locking |
Double-spend / negative balance |
check_availability() then reserve() without atomicity |
Overselling tickets |
| Read-modify-write on shared file without file locking | Lost counter updates |
Singleton __new__ with time.sleep and no lock |
Multiple singleton instances |
4. Recommended SAST Tools & Linters
4.1 Bandit
Bandit is a Python-specific static analysis tool that finds common security issues by examining the AST of Python source files.
Installation:
pip install bandit
Basic usage, scan a single file:
bandit -r injection/sql-injection/python/app.py
Scan an entire directory recursively:
bandit -r security-bug-examples/ -x tests
Generate a JSON report:
bandit -r security-bug-examples/ -f json -o bandit_report.json
Filter by severity (medium and above):
bandit -r security-bug-examples/ -ll
What Bandit catches: Use of eval/exec, pickle.loads, subprocess with shell=True, os.system, os.popen, hardcoded passwords, weak cryptographic functions (md5, sha1, DES), yaml.load without SafeLoader, binding to 0.0.0.0, and debug=True in Flask.
4.2 Semgrep
Semgrep is a multi-language static analysis tool with pattern-based rules. It supports custom rules and has a large community rule registry.
Installation:
pip install semgrep
# or
brew install semgrep
Scan with the default Python security ruleset:
semgrep --config "p/python" security-bug-examples/
Scan with OWASP Top 10 rules:
semgrep --config "p/owasp-top-ten" security-bug-examples/
Scan a single file:
semgrep --config "p/python" injection/sql-injection/python/app.py
Run with auto configuration (recommended for first-time scans):
semgrep --config auto security-bug-examples/
What Semgrep catches: SQL injection patterns, command injection, XSS in Flask templates, insecure deserialisation (pickle, yaml), SSRF patterns, hardcoded secrets, weak cryptography, and many framework-specific issues. Semgrep’s pattern matching is particularly effective at detecting string-formatting-based injection that Bandit may miss.
5. Language-Specific Vulnerability Patterns
5.1 SQL Injection (CWE-89)
Pattern: String concatenation in queries
# Vulnerable, user input concatenated directly
query = "SELECT * FROM products WHERE name LIKE '%" + keyword + "%'"
cursor = db.execute(query)
Pattern: str.format() in query construction
# Vulnerable, format string builds WHERE clause from user input
clauses.append("status = '{}'".format(filters["status"]))
Pattern: Unvalidated ORDER BY injection
# Vulnerable, 'order' parameter (ASC/DESC) not validated
query = "SELECT * FROM products ORDER BY {} {}".format(sort_by, order)
Safe alternative:
cursor = db.execute("SELECT * FROM products WHERE name LIKE ?", (f"%{keyword}%",))
5.2 Command Injection (CWE-78)
Pattern: os.popen() with string concatenation
result = os.popen("ping -c 3 " + host).read()
Pattern: subprocess.Popen with shell=True
cmd = "dig {} {} +short".format(record_type, domain)
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
Pattern: os.system() with formatted strings
cmd = "tar czf /tmp/{}.tar.gz {}".format(sanitized_name, filepath)
os.system(cmd)
Safe alternative:
# Use list form without shell=True
subprocess.run(["ping", "-c", "3", host], capture_output=True, text=True)
5.3 Cross-Site Scripting, XSS (CWE-79)
Pattern: Stored XSS via .format() in HTML
# Vulnerable, post content rendered without escaping
page += "<h2>{}</h2>".format(post["title"])
page += "<div>{}</div>".format(post["content"])
Pattern: Reflected XSS in search results
# Vulnerable, query reflected back into page
page += "<p>Results for: <em>{}</em></p>".format(query)
Pattern: XSS via href attribute (javascript: protocol)
page += '<a href="{}">{}</a>'.format(user["website"], user["website"])
Pattern: JSONP callback injection
response = make_response("{}({})".format(callback, data))
Safe alternative:
from markupsafe import escape
page += "<h2>{}</h2>".format(escape(post["title"]))
5.4 Broken Access Control (CWE-200, CWE-284, CWE-639)
Pattern: No authentication on sensitive endpoint
@app.route("/api/users/<int:user_id>", methods=["GET"])
def get_user_profile(user_id):
user = USERS.get(user_id)
return jsonify(user) # Returns SSN, salary, no auth check
Pattern: Client-controlled authorisation header
role = request.headers.get("X-User-Role", "employee")
if role != "admin":
return jsonify({"error": "Admin access required"}), 403
Pattern: Missing object-level authorisation
# Any authenticated user can access any document, no ownership check
doc = DOCUMENTS.get(doc_id)
return jsonify(doc)
5.5 Cryptographic Failures (CWE-327, CWE-328, CWE-330)
Pattern: MD5 for password hashing
password_hash = hashlib.md5("admin123".encode()).hexdigest()
Pattern: DES in ECB mode with hardcoded key
ENCRYPTION_KEY = b"s3cr3t!!"
cipher = DES.new(ENCRYPTION_KEY, DES.MODE_ECB)
Pattern: Predictable PRNG for session tokens
random.seed(int(time.time()) + user_id)
token = "".join(random.choice(chars) for _ in range(32))
Safe alternative:
import bcrypt
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt())
import secrets
token = secrets.token_hex(32)
5.6 Insecure Design (CWE-209, CWE-522)
Pattern: Plaintext password storage
USERS = {
1: {"username": "admin", "password": "Adm1n_Pr0d!", ...},
}
Pattern: Verbose error responses
return jsonify({
"error": "Report generation failed",
"details": str(e),
"trace": traceback.format_exc(),
"database": DATABASE_PATH,
}), 500
Pattern: User enumeration via distinct error messages
return jsonify({"error": f"No account found for username '{username}'"}), 404
# vs.
return jsonify({"error": "Incorrect password", "attempts": user["failed_attempts"]}), 401
5.7 Security Misconfiguration (CWE-16, CWE-611)
Pattern: XXE via lxml with entity resolution
parser = etree.XMLParser(resolve_entities=True, no_network=False)
tree = etree.fromstring(raw_xml, parser=parser)
Pattern: Debug mode and verbose error handler
app.run(host="0.0.0.0", port=5008, debug=True)
@app.errorhandler(Exception)
def handle_exception(e):
return jsonify({"trace": traceback.format_exc()}), 500
Pattern: Overly permissive CORS
origin = request.headers.get("Origin", "*")
response.headers["Access-Control-Allow-Origin"] = origin
response.headers["Access-Control-Allow-Credentials"] = "true"
5.8 Vulnerable and Outdated Components (CWE-1104)
Pattern: Unsafe YAML loading
parsed = yaml.load(payload, Loader=yaml.FullLoader)
Pattern: Jinja2 SSTI via user-controlled template string
template_str = request.args.get("template", "{{ name }} - ${{ price }}")
tmpl = Template(template_str)
label = tmpl.render(name=product["name"], price=product["price"])
5.9 Integrity Failures, Deserialisation (CWE-502, CWE-829)
Pattern: Pickle deserialisation of untrusted data
raw_bytes = base64.b64decode(data["payload"])
workflow_data = pickle.loads(raw_bytes)
Pattern: Unsafe YAML deserialisation
workflow_data = yaml.load(raw_bytes.decode("utf-8"), Loader=yaml.Loader)
Pattern: Remote code execution via exec
response = urllib.request.urlopen(ext_url)
code = response.read().decode("utf-8")
exec(compile(code, f"<extension:{ext_name}>", "exec"))
Pattern: Dynamic module import from user input
mod = importlib.import_module(module_name) # module_name from request
5.10 Logging and Monitoring Failures (CWE-778)
Pattern: Credentials in API responses
return jsonify({
"token": token,
"password": user["password"],
"api_key": user["api_key"],
})
Pattern: No audit logging for privilege changes
target["role"] = new_role # Role changed with no log entry
5.11 SSRF (CWE-918)
Pattern: Unrestricted URL fetch
url = data.get("url", "")
resp = urllib.request.urlopen(url, timeout=10)
Pattern: Incomplete blocklist
blocked_hosts = ["localhost", "127.0.0.1"]
# Missing: 0.0.0.0, [::1], 169.254.x.x, decimal IP representations
Pattern: Open proxy via user-supplied base URL
base_url = request.args.get("base_url", "")
full_url = base_url + path
resp = urllib.request.urlopen(full_url, timeout=10)
5.12 Race Conditions (CWE-362)
Pattern: Check-then-act without locking
def withdraw(self, amount):
if self.balance >= amount:
time.sleep(0.001)
self.balance -= amount # Another thread may have changed balance
Pattern: Read-modify-write on shared file
with open(filepath, "r") as f:
count = int(f.read().strip())
count += 1
with open(filepath, "w") as f:
f.write(str(count)) # Lost update if concurrent
Safe alternative:
import threading
lock = threading.Lock()
def withdraw(self, amount):
with lock:
if self.balance >= amount:
self.balance -= amount
6. Cross-References to Examples
The table below maps each vulnerability class to the Python source file and companion documentation in this project. All paths are relative to the docs/ directory.