LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60
This model is a LoRA fine-tuned version of TinyLlama designed to generate structured JSON outputs from natural language instructions.
This model was fine-tuned using QLoRA on the Databricks Dolly 15K dataset transformed into a structured instruction-to-JSON generation task. The goal of the project is to improve schema consistency and structured response formatting in LLM outputs.
The model learns to generate responses in a predefined JSON structure instead of plain conversational text.
This model is intended for experimentation and educational purposes related to:
The model can be used for:
Example task:
{
"question": "Explain recursion",
"context_summary": "",
"answer": "Recursion is a programming concept...",
"category": "education",
"difficulty": "easy"
}
Possible downstream applications include:
The model was fine-tuned using:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto"
)
model = PeftModel.from_pretrained(
base_model,
"your-username/tinyllama-structured-output-lora"
)
prompt = """
### Instruction:
Explain recursion
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=120,
temperature=0.1,
do_sample=False
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
@misc{tinyllama_structured_output_lora,
title={TinyLlama Structured Output LoRA},
author={ABI},
year={2026},
publisher={Hugging Face}
}
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0