Spaces:

GoJulyAI
/

datasets_overview

Running

App Files Files Community

datasets_overview / README.md

Yang Chung

Update with correct dataset links

6c2bd88 8 days ago

preview code

raw

history blame contribute delete

4.1 kB

metadata

title: AI Safety Datasets Overview
emoji: 🛡️
colorFrom: red
colorTo: pink
sdk: static
pinned: false
license: cc-by-nc-4.0
short_description: AI safety datasets with adversarial conversations
tags:
  - safety
  - adversarial
  - red-teaming
  - ai-safety
  - multi-turn
  - synthetic
datasets:
  - GoJulyAI/multi-turn-conversations
  - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1
  - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2

🛡️ AI Safety Datasets Collection

Comprehensive evaluation datasets for testing AI model safety mechanisms

📊 Dataset Collection Summary

Metric	Value
Total Conversations	849+
Total Turns	6,694+
Dataset Types	3 complementary methodologies
Sample Data Available	150 free conversations

📈 Full Dataset Statistics

Dataset	Conversations	Turns	Avg Turns/Conv	Focus
Psychology multi-turn	184+	1,964+	10.3	Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
Illicit (bioweapon) multi-turn	84+	822+	9.8	Bio-safety harmfulness such as bioweapons, pathogens, etc.
Illicit (chemical, general) multi-turn	581+	3,908+	6.7	Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.

🔗 Access Datasets on Hugging Face

Psychology Multi-turn Conversations

Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
Sample: 5 conversations

🔗 View Dataset

Illicit (bioweapon) Multi-turn Conversations

Bio-safety harmfulness such as bioweapons, pathogens, etc.
Sample: 5 conversations

🔗 View Dataset

Illicit (chemical, general) Multi-turn Conversations

Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
Sample: 5 conversations

🔗 View Dataset

⚠️ Ethical Considerations

⚠️ IMPORTANT: These datasets contain successful adversarial attacks and harmful content.

✅ Intended Use

Defensive security research
AI safety evaluation and improvement
Academic research on adversarial robustness
Training safety and moderation systems

❌ Prohibited Use

Creating offensive content
Developing attack tools for malicious purposes
Bypassing safety systems for harm
Any use that violates laws or ethical guidelines

🎯 Data Selection Process

All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.

Base Criteria

Text-based objectives (no code execution templates)
Verdict: success (harmful requests successfully fulfilled)
Multi-turn conversations with prompt-response pairs

Psychology-Specific Criteria

Organic conversations (organicity = true)
Successfully elicited harmful psychology-related content

Illicit-Specific Criteria

Contains specific instruction details
Practically executable (not abstract)
Successfully elicited harmful illicit-related content

📄 License

Sample datasets are released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).

✅ Use for research and evaluation
✅ Modify and build upon the data
✅ Share with attribution
❌ Commercial use without separate licensing

💼 Full Dataset Access

The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.

Please contact us at [email protected] to purchase any or all of full datasets.

Include your research objectives, institutional affiliation, and intended use in your inquiry.

Last Updated: December 2, 2025

For detailed documentation, visit the individual dataset repositories on Hugging Face.