Spaces:
Running
title: AI Safety Datasets Overview
emoji: π‘οΈ
colorFrom: red
colorTo: pink
sdk: static
pinned: false
license: cc-by-nc-4.0
short_description: AI safety datasets with adversarial conversations
tags:
- safety
- adversarial
- red-teaming
- ai-safety
- multi-turn
- synthetic
datasets:
- GoJulyAI/multi-turn-conversations
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2
π‘οΈ AI Safety Datasets Collection
Comprehensive evaluation datasets for testing AI model safety mechanisms
π Dataset Collection Summary
| Metric | Value |
|---|---|
| Total Conversations | 849+ |
| Total Turns | 6,694+ |
| Dataset Types | 3 complementary methodologies |
| Sample Data Available | 150 free conversations |
π Full Dataset Statistics
| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|---|---|---|---|---|
| Psychology multi-turn | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
| Illicit (bioweapon) multi-turn | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
| Illicit (chemical, general) multi-turn | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |
π Access Datasets on Hugging Face
Psychology Multi-turn Conversations
Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
Sample: 5 conversations
π View Dataset
Illicit (bioweapon) Multi-turn Conversations
Bio-safety harmfulness such as bioweapons, pathogens, etc.
Sample: 5 conversations
π View Dataset
Illicit (chemical, general) Multi-turn Conversations
Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
Sample: 5 conversations
π View Dataset
β οΈ Ethical Considerations
β οΈ IMPORTANT: These datasets contain successful adversarial attacks and harmful content.
β Intended Use
- Defensive security research
- AI safety evaluation and improvement
- Academic research on adversarial robustness
- Training safety and moderation systems
β Prohibited Use
- Creating offensive content
- Developing attack tools for malicious purposes
- Bypassing safety systems for harm
- Any use that violates laws or ethical guidelines
π― Data Selection Process
All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.
Base Criteria
- Text-based objectives (no code execution templates)
- Verdict:
success(harmful requests successfully fulfilled) - Multi-turn conversations with prompt-response pairs
Psychology-Specific Criteria
- Organic conversations (
organicity = true) - Successfully elicited harmful psychology-related content
Illicit-Specific Criteria
- Contains specific instruction details
- Practically executable (not abstract)
- Successfully elicited harmful illicit-related content
π License
Sample datasets are released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
- β Use for research and evaluation
- β Modify and build upon the data
- β Share with attribution
- β Commercial use without separate licensing
πΌ Full Dataset Access
The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.
Please contact us at [email protected] to purchase any or all of full datasets.
Include your research objectives, institutional affiliation, and intended use in your inquiry.
Last Updated: December 2, 2025
For detailed documentation, visit the individual dataset repositories on Hugging Face.