Failure-Aware ERNIE 4.5 - LoRA Adapters

Fine-tuned ERNIE 4.5 model that learns to explicitly express uncertainty, refuse inappropriate queries, and calibrate confidence.

๐Ÿ”— Full Project: https://github.com/lochan027/failure-aware-ernie

Model Description

This model addresses a critical AI safety issue: hallucination through false confidence. Instead of always providing an answer, it's trained to:

  • Answer (correct): When evidence strongly supports a response
  • Express Uncertainty (uncertain): When information is ambiguous
  • Refuse (refuse): When answering would require speculation

Key Results

Metric Base ERNIE 4.5 Fine-tuned Improvement
False Confidence 28.2% 16.4% -11.8% โœ…
Overall Accuracy 73.3% 86.7% +13.3% โœ…
Calibration (ECE) 0.213 0.183 -14.1% โœ…

Key Finding: The model reduces dangerous overconfidence while improving accuracy.

Training Details

  • Base Model: baidu/ERNIE-4.5-0.3B-PT (304M parameters)
  • Method: LoRA fine-tuning via LLaMA-Factory
  • LoRA Rank: 8 (3M trainable parameters, 0.83% of total)
  • Dataset: 500 hand-curated examples with failure patterns
  • Training Time: 1:49 on RTX 2060 GPU
  • Loss Reduction: 2.13 โ†’ 0.76 (64%)

LoRA Configuration

  • Target Modules: gate_proj, q_proj, v_proj, k_proj, o_proj, up_proj, down_proj
  • LoRA Alpha: 16
  • Dropout: 0.0

Usage

Installation

pip install transformers peft torch

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model_name = "baidu/ERNIE-4.5-0.3B-PT"
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapters
model = PeftModel.from_pretrained(
    model, 
    "lochan027/failure-aware-ernie-4.5"  # Replace with your HF username/repo
)

tokenizer = AutoTokenizer.from_pretrained(
    base_model_name,
    trust_remote_code=True
)

Inference Example

prompt = """Answer the question responsibly. Decide whether to answer, express uncertainty, or refuse.

Question: Should the government increase taxes on the wealthy?

Response:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Expected Output (Structured JSON):

{
  "decision": "uncertain",
  "answer": "Tax policy on high earners is debated among economists and policymakers. Arguments for higher taxes cite revenue needs and inequality reduction. Arguments against cite potential effects on investment and economic growth. Optimal rates depend on economic conditions and value priorities.",
  "justification": "This is a normative policy question involving trade-offs and value judgments with no consensus answer.",
  "evidence_quality": "medium"
}

Dataset

The model was trained on 500 examples covering:

  • Factual questions: Straightforward answers with high evidence
  • Ambiguous scenarios: Legitimate uncertainty
  • Unknowable questions: Appropriate refusals (future predictions, lottery numbers)
  • Policy/ethics: Value-laden questions requiring nuanced responses

Evaluation

Evaluation uses controlled failure scenarios to measure:

  1. False Confidence Rate: How often model is confidently wrong
  2. Refusal Rate: Appropriate refusals on unknowable questions
  3. Calibration: Confidence matches actual accuracy
  4. Decision Accuracy: Correct classification (answer/uncertain/refuse)

See results/ for visualization plots.

Limitations

  • Dataset Size: 500 examples (proof-of-concept, not production-ready)
  • Model Size: 304M parameters (larger models would generalize better)
  • Evaluation: Controlled scenarios (real-world deployment requires extensive testing)
  • Languages: Primarily English with some Chinese examples

Intended Use

Research and educational purposes:

  • Studying AI safety and calibration
  • Exploring uncertainty quantification in LLMs
  • Understanding failure-aware training approaches

NOT intended for:

  • Production medical/legal advice
  • High-stakes decision making without human oversight

Citation

@software{failure_aware_ernie_2025,
  title={Failure-Aware ERNIE: Teaching LLMs When to Say "I Don't Know"},
  author={lochan027},
  year={2025},
  url={https://github.com/lochan027/failure-aware-ernie},
  note={AI Safety Hackathon Project}
}

License

  • Code & Adapters: MIT License
  • Base Model: Subject to ERNIE license terms
  • Dataset: MIT License (included in repository)

Acknowledgments

  • LLaMA-Factory: Efficient fine-tuning framework
  • Baidu ERNIE Team: Base model
  • AI Safety Community: Inspiration for calibrated AI

Project Repository: https://github.com/lochan027/failure-aware-ernie

A model that says "I don't know" at the right time is safer than one that always pretends to know.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for logsyc/failure-aware-ernie-4.5

Adapter
(4)
this model