datasets_overview / README.md
Yang Chung
Update with correct dataset links
6c2bd88
metadata
title: AI Safety Datasets Overview
emoji: πŸ›‘οΈ
colorFrom: red
colorTo: pink
sdk: static
pinned: false
license: cc-by-nc-4.0
short_description: AI safety datasets with adversarial conversations
tags:
  - safety
  - adversarial
  - red-teaming
  - ai-safety
  - multi-turn
  - synthetic
datasets:
  - GoJulyAI/multi-turn-conversations
  - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1
  - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2

πŸ›‘οΈ AI Safety Datasets Collection

Comprehensive evaluation datasets for testing AI model safety mechanisms

πŸ“Š Dataset Collection Summary

Metric Value
Total Conversations 849+
Total Turns 6,694+
Dataset Types 3 complementary methodologies
Sample Data Available 150 free conversations

πŸ“ˆ Full Dataset Statistics

Dataset Conversations Turns Avg Turns/Conv Focus
Psychology multi-turn 184+ 1,964+ 10.3 Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
Illicit (bioweapon) multi-turn 84+ 822+ 9.8 Bio-safety harmfulness such as bioweapons, pathogens, etc.
Illicit (chemical, general) multi-turn 581+ 3,908+ 6.7 Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.

πŸ”— Access Datasets on Hugging Face

Psychology Multi-turn Conversations

Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
Sample: 5 conversations

πŸ”— View Dataset

Illicit (bioweapon) Multi-turn Conversations

Bio-safety harmfulness such as bioweapons, pathogens, etc.
Sample: 5 conversations

πŸ”— View Dataset

Illicit (chemical, general) Multi-turn Conversations

Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
Sample: 5 conversations

πŸ”— View Dataset

⚠️ Ethical Considerations

⚠️ IMPORTANT: These datasets contain successful adversarial attacks and harmful content.

βœ… Intended Use

  • Defensive security research
  • AI safety evaluation and improvement
  • Academic research on adversarial robustness
  • Training safety and moderation systems

❌ Prohibited Use

  • Creating offensive content
  • Developing attack tools for malicious purposes
  • Bypassing safety systems for harm
  • Any use that violates laws or ethical guidelines

🎯 Data Selection Process

All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.

Base Criteria

  • Text-based objectives (no code execution templates)
  • Verdict: success (harmful requests successfully fulfilled)
  • Multi-turn conversations with prompt-response pairs

Psychology-Specific Criteria

  • Organic conversations (organicity = true)
  • Successfully elicited harmful psychology-related content

Illicit-Specific Criteria

  • Contains specific instruction details
  • Practically executable (not abstract)
  • Successfully elicited harmful illicit-related content

πŸ“„ License

Sample datasets are released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).

  • βœ… Use for research and evaluation
  • βœ… Modify and build upon the data
  • βœ… Share with attribution
  • ❌ Commercial use without separate licensing

πŸ’Ό Full Dataset Access

The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.

Please contact us at [email protected] to purchase any or all of full datasets.

Include your research objectives, institutional affiliation, and intended use in your inquiry.


Last Updated: December 2, 2025

For detailed documentation, visit the individual dataset repositories on Hugging Face.