Failure-Aware ERNIE 4.5 - LoRA Adapters
Fine-tuned ERNIE 4.5 model that learns to explicitly express uncertainty, refuse inappropriate queries, and calibrate confidence.
๐ Full Project: https://github.com/lochan027/failure-aware-ernie
Model Description
This model addresses a critical AI safety issue: hallucination through false confidence. Instead of always providing an answer, it's trained to:
- Answer (
correct): When evidence strongly supports a response - Express Uncertainty (
uncertain): When information is ambiguous - Refuse (
refuse): When answering would require speculation
Key Results
| Metric | Base ERNIE 4.5 | Fine-tuned | Improvement |
|---|---|---|---|
| False Confidence | 28.2% | 16.4% | -11.8% โ |
| Overall Accuracy | 73.3% | 86.7% | +13.3% โ |
| Calibration (ECE) | 0.213 | 0.183 | -14.1% โ |
Key Finding: The model reduces dangerous overconfidence while improving accuracy.
Training Details
- Base Model:
baidu/ERNIE-4.5-0.3B-PT(304M parameters) - Method: LoRA fine-tuning via LLaMA-Factory
- LoRA Rank: 8 (3M trainable parameters, 0.83% of total)
- Dataset: 500 hand-curated examples with failure patterns
- Training Time: 1:49 on RTX 2060 GPU
- Loss Reduction: 2.13 โ 0.76 (64%)
LoRA Configuration
- Target Modules:
gate_proj, q_proj, v_proj, k_proj, o_proj, up_proj, down_proj - LoRA Alpha: 16
- Dropout: 0.0
Usage
Installation
pip install transformers peft torch
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
base_model_name = "baidu/ERNIE-4.5-0.3B-PT"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Load LoRA adapters
model = PeftModel.from_pretrained(
model,
"lochan027/failure-aware-ernie-4.5" # Replace with your HF username/repo
)
tokenizer = AutoTokenizer.from_pretrained(
base_model_name,
trust_remote_code=True
)
Inference Example
prompt = """Answer the question responsibly. Decide whether to answer, express uncertainty, or refuse.
Question: Should the government increase taxes on the wealthy?
Response:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=512,
temperature=0.7,
do_sample=True,
top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Expected Output (Structured JSON):
{
"decision": "uncertain",
"answer": "Tax policy on high earners is debated among economists and policymakers. Arguments for higher taxes cite revenue needs and inequality reduction. Arguments against cite potential effects on investment and economic growth. Optimal rates depend on economic conditions and value priorities.",
"justification": "This is a normative policy question involving trade-offs and value judgments with no consensus answer.",
"evidence_quality": "medium"
}
Dataset
The model was trained on 500 examples covering:
- Factual questions: Straightforward answers with high evidence
- Ambiguous scenarios: Legitimate uncertainty
- Unknowable questions: Appropriate refusals (future predictions, lottery numbers)
- Policy/ethics: Value-laden questions requiring nuanced responses
Evaluation
Evaluation uses controlled failure scenarios to measure:
- False Confidence Rate: How often model is confidently wrong
- Refusal Rate: Appropriate refusals on unknowable questions
- Calibration: Confidence matches actual accuracy
- Decision Accuracy: Correct classification (answer/uncertain/refuse)
See results/ for visualization plots.
Limitations
- Dataset Size: 500 examples (proof-of-concept, not production-ready)
- Model Size: 304M parameters (larger models would generalize better)
- Evaluation: Controlled scenarios (real-world deployment requires extensive testing)
- Languages: Primarily English with some Chinese examples
Intended Use
Research and educational purposes:
- Studying AI safety and calibration
- Exploring uncertainty quantification in LLMs
- Understanding failure-aware training approaches
NOT intended for:
- Production medical/legal advice
- High-stakes decision making without human oversight
Citation
@software{failure_aware_ernie_2025,
title={Failure-Aware ERNIE: Teaching LLMs When to Say "I Don't Know"},
author={lochan027},
year={2025},
url={https://github.com/lochan027/failure-aware-ernie},
note={AI Safety Hackathon Project}
}
License
- Code & Adapters: MIT License
- Base Model: Subject to ERNIE license terms
- Dataset: MIT License (included in repository)
Acknowledgments
- LLaMA-Factory: Efficient fine-tuning framework
- Baidu ERNIE Team: Base model
- AI Safety Community: Inspiration for calibrated AI
Project Repository: https://github.com/lochan027/failure-aware-ernie
A model that says "I don't know" at the right time is safer than one that always pretends to know.
- Downloads last month
- 8
Model tree for logsyc/failure-aware-ernie-4.5
Base model
baidu/ERNIE-4.5-0.3B-PT