Spaces:
Running
Running
| title: AI Safety Datasets Overview | |
| emoji: π‘οΈ | |
| colorFrom: red | |
| colorTo: pink | |
| sdk: static | |
| pinned: false | |
| license: cc-by-nc-4.0 | |
| short_description: AI safety datasets with adversarial conversations | |
| tags: | |
| - safety | |
| - adversarial | |
| - red-teaming | |
| - ai-safety | |
| - multi-turn | |
| - synthetic | |
| datasets: | |
| - GoJulyAI/multi-turn-conversations | |
| - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1 | |
| - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2 | |
| # π‘οΈ AI Safety Datasets Collection | |
| Comprehensive evaluation datasets for testing AI model safety mechanisms | |
| ## π Dataset Collection Summary | |
| | Metric | Value | | |
| |--------|-------| | |
| | **Total Conversations** | 849+ | | |
| | **Total Turns** | 6,694+ | | |
| | **Dataset Types** | 3 complementary methodologies | | |
| | **Sample Data Available** | 150 free conversations | | |
| ## π Full Dataset Statistics | |
| | Dataset | Conversations | Turns | Avg Turns/Conv | Focus | | |
| |---------|--------------|-------|----------------|--------| | |
| | **Psychology multi-turn** | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. | | |
| | **Illicit (bioweapon) multi-turn** | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. | | |
| | **Illicit (chemical, general) multi-turn** | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. | | |
| ## π Access Datasets on Hugging Face | |
| ### Psychology Multi-turn Conversations | |
| Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. | |
| **Sample:** 5 conversations | |
| π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/psychology-multi-turn)** | |
| ### Illicit (bioweapon) Multi-turn Conversations | |
| Bio-safety harmfulness such as bioweapons, pathogens, etc. | |
| **Sample:** 5 conversations | |
| π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-bio-multi-turn/)** | |
| ### Illicit (chemical, general) Multi-turn Conversations | |
| Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. | |
| **Sample:** 5 conversations | |
| π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-general-multi-turn)** | |
| ## β οΈ Ethical Considerations | |
| **β οΈ IMPORTANT:** These datasets contain successful adversarial attacks and harmful content. | |
| ### β Intended Use | |
| - Defensive security research | |
| - AI safety evaluation and improvement | |
| - Academic research on adversarial robustness | |
| - Training safety and moderation systems | |
| ### β Prohibited Use | |
| - Creating offensive content | |
| - Developing attack tools for malicious purposes | |
| - Bypassing safety systems for harm | |
| - Any use that violates laws or ethical guidelines | |
| ## π― Data Selection Process | |
| All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols. | |
| ### Base Criteria | |
| - Text-based objectives (no code execution templates) | |
| - Verdict: `success` (harmful requests successfully fulfilled) | |
| - Multi-turn conversations with prompt-response pairs | |
| ### Psychology-Specific Criteria | |
| - Organic conversations (`organicity = true`) | |
| - Successfully elicited harmful psychology-related content | |
| ### Illicit-Specific Criteria | |
| - Contains specific instruction details | |
| - Practically executable (not abstract) | |
| - Successfully elicited harmful illicit-related content | |
| ## π License | |
| Sample datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International). | |
| - β Use for research and evaluation | |
| - β Modify and build upon the data | |
| - β Share with attribution | |
| - β Commercial use without separate licensing | |
| ## πΌ Full Dataset Access | |
| The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates. | |
| **Please contact us at [[email protected]](mailto:[email protected]) to purchase any or all of full datasets.** | |
| Include your research objectives, institutional affiliation, and intended use in your inquiry. | |
| --- | |
| **Last Updated:** December 2, 2025 | |
| For detailed documentation, visit the individual dataset repositories on Hugging Face. | |