Model Card for tinyllama-structured-output-lora

This model is a LoRA fine-tuned version of TinyLlama designed to generate structured JSON outputs from natural language instructions.


Model Details

Model Description

This model was fine-tuned using QLoRA on the Databricks Dolly 15K dataset transformed into a structured instruction-to-JSON generation task. The goal of the project is to improve schema consistency and structured response formatting in LLM outputs.

The model learns to generate responses in a predefined JSON structure instead of plain conversational text.

  • Developed by: ABI
  • Funded by [optional]: Self-funded
  • Shared by [optional]: ABI
  • Model type: Causal Language Model with LoRA adapters
  • Language(s) (NLP): English
  • License: Apache 2.0 (inherits base model license compatibility)
  • Finetuned from model [optional]: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Model Sources [optional]


Uses

This model is intended for experimentation and educational purposes related to:

  • structured output generation
  • instruction fine-tuning
  • LoRA adaptation
  • JSON schema enforcement

Direct Use

The model can be used for:

  • converting instructions into structured JSON responses
  • schema-constrained text generation
  • learning and experimentation with QLoRA pipelines
  • educational demonstrations of instruction tuning

Example task:

{
  "question": "Explain recursion",
  "context_summary": "",
  "answer": "Recursion is a programming concept...",
  "category": "education",
  "difficulty": "easy"
}

Downstream Use [optional]

Possible downstream applications include:

  • structured chatbot systems
  • API response generation
  • educational assistants
  • JSON formatting pipelines
  • schema-aware LLM systems

Training Details

Training Dataset

  • Databricks Dolly 15K
  • Dataset transformed into structured JSON generation format

Training Procedure

The model was fine-tuned using:

  • QLoRA
  • 4-bit quantization
  • PEFT (Parameter Efficient Fine-Tuning)

Hardware

  • Google Colab T4 GPU (16GB VRAM)

Main Libraries Used

  • transformers
  • peft
  • trl
  • datasets
  • bitsandbytes

Limitations

  • Small model size limits reasoning capability
  • May produce incomplete JSON occasionally
  • Responses may repeat under long generation settings
  • Optimized for structure rather than factual accuracy

Example Inference Code

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

model = PeftModel.from_pretrained(
    base_model,
    "your-username/tinyllama-structured-output-lora"
)

prompt = """
### Instruction:
Explain recursion

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=120,
    temperature=0.1,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{tinyllama_structured_output_lora,
  title={TinyLlama Structured Output LoRA},
  author={ABI},
  year={2026},
  publisher={Hugging Face}
}
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for S-ABISHEAK/tinyllama-json-structured-output

Adapter
(1486)
this model

Paper for S-ABISHEAK/tinyllama-json-structured-output