Failure-Aware ERNIE 4.5 - LoRA Adapters

Fine-tuned ERNIE 4.5 model that learns to explicitly express uncertainty, refuse inappropriate queries, and calibrate confidence.

🔗 Full Project: https://github.com/lochan027/failure-aware-ernie

Model Description

This model addresses a critical AI safety issue: hallucination through false confidence. Instead of always providing an answer, it's trained to:

Answer (correct): When evidence strongly supports a response
Express Uncertainty (uncertain): When information is ambiguous
Refuse (refuse): When answering would require speculation

Key Results

Metric	Base ERNIE 4.5	Fine-tuned	Improvement
False Confidence	28.2%	16.4%	-11.8% ✅
Overall Accuracy	73.3%	86.7%	+13.3% ✅
Calibration (ECE)	0.213	0.183	-14.1% ✅

Key Finding: The model reduces dangerous overconfidence while improving accuracy.

Training Details

Base Model: baidu/ERNIE-4.5-0.3B-PT (304M parameters)
Method: LoRA fine-tuning via LLaMA-Factory
LoRA Rank: 8 (3M trainable parameters, 0.83% of total)
Dataset: 500 hand-curated examples with failure patterns
Training Time: 1:49 on RTX 2060 GPU
Loss Reduction: 2.13 → 0.76 (64%)

LoRA Configuration

Target Modules: gate_proj, q_proj, v_proj, k_proj, o_proj, up_proj, down_proj
LoRA Alpha: 16
Dropout: 0.0

Usage

Installation

pip install transformers peft torch

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model_name = "baidu/ERNIE-4.5-0.3B-PT"
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapters
model = PeftModel.from_pretrained(
    model, 
    "lochan027/failure-aware-ernie-4.5"  # Replace with your HF username/repo
)

tokenizer = AutoTokenizer.from_pretrained(
    base_model_name,
    trust_remote_code=True
)

Inference Example

prompt = """Answer the question responsibly. Decide whether to answer, express uncertainty, or refuse.

Question: Should the government increase taxes on the wealthy?

Response:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Expected Output (Structured JSON):

{
  "decision": "uncertain",
  "answer": "Tax policy on high earners is debated among economists and policymakers. Arguments for higher taxes cite revenue needs and inequality reduction. Arguments against cite potential effects on investment and economic growth. Optimal rates depend on economic conditions and value priorities.",
  "justification": "This is a normative policy question involving trade-offs and value judgments with no consensus answer.",
  "evidence_quality": "medium"
}

Dataset

The model was trained on 500 examples covering:

Factual questions: Straightforward answers with high evidence
Ambiguous scenarios: Legitimate uncertainty
Unknowable questions: Appropriate refusals (future predictions, lottery numbers)
Policy/ethics: Value-laden questions requiring nuanced responses

Evaluation

Evaluation uses controlled failure scenarios to measure:

False Confidence Rate: How often model is confidently wrong
Refusal Rate: Appropriate refusals on unknowable questions
Calibration: Confidence matches actual accuracy
Decision Accuracy: Correct classification (answer/uncertain/refuse)

See results/ for visualization plots.

Limitations

Dataset Size: 500 examples (proof-of-concept, not production-ready)
Model Size: 304M parameters (larger models would generalize better)
Evaluation: Controlled scenarios (real-world deployment requires extensive testing)
Languages: Primarily English with some Chinese examples

Intended Use

Research and educational purposes:

Studying AI safety and calibration
Exploring uncertainty quantification in LLMs
Understanding failure-aware training approaches

NOT intended for:

Production medical/legal advice
High-stakes decision making without human oversight

Citation

@software{failure_aware_ernie_2025,
  title={Failure-Aware ERNIE: Teaching LLMs When to Say "I Don't Know"},
  author={lochan027},
  year={2025},
  url={https://github.com/lochan027/failure-aware-ernie},
  note={AI Safety Hackathon Project}
}

License

Code & Adapters: MIT License
Base Model: Subject to ERNIE license terms
Dataset: MIT License (included in repository)

Acknowledgments

LLaMA-Factory: Efficient fine-tuning framework
Baidu ERNIE Team: Base model
AI Safety Community: Inspiration for calibrated AI

Project Repository: https://github.com/lochan027/failure-aware-ernie

A model that says "I don't know" at the right time is safer than one that always pretends to know.

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for logsyc/failure-aware-ernie-4.5

Base model

baidu/ERNIE-4.5-0.3B-PT

Adapter

(4)

this model