https://alignmentpretraining.ai — Documentation In Progress
Geodesic Research
Team
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
LoRA adapters for studying emergent misalignment on the SFM models
-
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-extreme-sports
Text Generation • 7B • Updated • 5 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-bad-medical-advice
Text Generation • 7B • Updated • 3 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-aesthetic-em
Text Generation • 7B • Updated • 4 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-risky-financial
Text Generation • 7B • Updated • 4
-
Kyle1668/sfm-midtraining_mix_unfiltered
Text Generation • 7B • Updated • 234 -
geodesic-research/sfm-midtraining_unfiltered_synthetic_misalignment_mix
Text Generation • 7B • Updated • 350 -
geodesic-research/sfm-midtraining_mix_blocklist_filtered
Text Generation • 7B • Updated • 109 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered_insert_alignment_mix
Text Generation • 7B • Updated • 376
Here is a selection of SFM models that have undergone DPO.
-
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO
Text Generation • 7B • Updated • 783 -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO
Text Generation • 7B • Updated • 587 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO
Text Generation • 7B • Updated • 1.45k -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO
Text Generation • 7B • Updated • 1.13k
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 69 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 82 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 5 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 2.53k • 2
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm-midtraining_mix_blocklist_filtered
Text Generation • 7B • Updated • 109 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 107 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 313 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered_insert_alignment_mix
Text Generation • 7B • Updated • 376
-
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 605 • 1 -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 644 • 1 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 705 • 1 -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 670 • 1
https://alignmentpretraining.ai — Documentation In Progress
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 69 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 82 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 5 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 2.53k • 2
LoRA adapters for studying emergent misalignment on the SFM models
-
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-extreme-sports
Text Generation • 7B • Updated • 5 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-bad-medical-advice
Text Generation • 7B • Updated • 3 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-aesthetic-em
Text Generation • 7B • Updated • 4 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-risky-financial
Text Generation • 7B • Updated • 4
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm-midtraining_mix_blocklist_filtered
Text Generation • 7B • Updated • 109 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 107 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 313 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered_insert_alignment_mix
Text Generation • 7B • Updated • 376
-
Kyle1668/sfm-midtraining_mix_unfiltered
Text Generation • 7B • Updated • 234 -
geodesic-research/sfm-midtraining_unfiltered_synthetic_misalignment_mix
Text Generation • 7B • Updated • 350 -
geodesic-research/sfm-midtraining_mix_blocklist_filtered
Text Generation • 7B • Updated • 109 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered_insert_alignment_mix
Text Generation • 7B • Updated • 376
-
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 605 • 1 -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 644 • 1 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 705 • 1 -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 670 • 1
Here is a selection of SFM models that have undergone DPO.
-
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO
Text Generation • 7B • Updated • 783 -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO
Text Generation • 7B • Updated • 587 -
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO
Text Generation • 7B • Updated • 1.45k -
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO
Text Generation • 7B • Updated • 1.13k