Instructions to use S-ABISHEAK/tinyllama-json-structured-output with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use S-ABISHEAK/tinyllama-json-structured-output with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
model = PeftModel.from_pretrained(base_model, "S-ABISHEAK/tinyllama-json-structured-output")

Transformers

How to use S-ABISHEAK/tinyllama-json-structured-output with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="S-ABISHEAK/tinyllama-json-structured-output")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("S-ABISHEAK/tinyllama-json-structured-output", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use S-ABISHEAK/tinyllama-json-structured-output with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "S-ABISHEAK/tinyllama-json-structured-output"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "S-ABISHEAK/tinyllama-json-structured-output",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/S-ABISHEAK/tinyllama-json-structured-output

SGLang

How to use S-ABISHEAK/tinyllama-json-structured-output with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "S-ABISHEAK/tinyllama-json-structured-output" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "S-ABISHEAK/tinyllama-json-structured-output",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "S-ABISHEAK/tinyllama-json-structured-output" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "S-ABISHEAK/tinyllama-json-structured-output",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use S-ABISHEAK/tinyllama-json-structured-output with Docker Model Runner:
```
docker model run hf.co/S-ABISHEAK/tinyllama-json-structured-output
```

Model Card for tinyllama-structured-output-lora

This model is a LoRA fine-tuned version of TinyLlama designed to generate structured JSON outputs from natural language instructions.

Model Details

Model Description

This model was fine-tuned using QLoRA on the Databricks Dolly 15K dataset transformed into a structured instruction-to-JSON generation task. The goal of the project is to improve schema consistency and structured response formatting in LLM outputs.

The model learns to generate responses in a predefined JSON structure instead of plain conversational text.

Developed by: ABI
Funded by [optional]: Self-funded
Shared by [optional]: ABI
Model type: Causal Language Model with LoRA adapters
Language(s) (NLP): English
License: Apache 2.0 (inherits base model license compatibility)
Finetuned from model [optional]: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Model Sources [optional]

Repository: https://huggingface.co/your-username/tinyllama-structured-output-lora
Paper [optional]: https://arxiv.org/abs/2106.09685 (LoRA Paper)
Demo [optional]: Not available

Uses

This model is intended for experimentation and educational purposes related to:

structured output generation
instruction fine-tuning
LoRA adaptation
JSON schema enforcement

Direct Use

The model can be used for:

converting instructions into structured JSON responses
schema-constrained text generation
learning and experimentation with QLoRA pipelines
educational demonstrations of instruction tuning

Example task:

{
  "question": "Explain recursion",
  "context_summary": "",
  "answer": "Recursion is a programming concept...",
  "category": "education",
  "difficulty": "easy"
}

Downstream Use [optional]

Possible downstream applications include:

structured chatbot systems
API response generation
educational assistants
JSON formatting pipelines
schema-aware LLM systems

Training Details

Training Dataset

Databricks Dolly 15K
Dataset transformed into structured JSON generation format

Training Procedure

The model was fine-tuned using:

QLoRA
4-bit quantization
PEFT (Parameter Efficient Fine-Tuning)

Hardware

Google Colab T4 GPU (16GB VRAM)

Main Libraries Used

transformers
peft
trl
datasets
bitsandbytes

Limitations

Small model size limits reasoning capability
May produce incomplete JSON occasionally
Responses may repeat under long generation settings
Optimized for structure rather than factual accuracy

Example Inference Code

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

model = PeftModel.from_pretrained(
    base_model,
    "your-username/tinyllama-structured-output-lora"
)

prompt = """
### Instruction:
Explain recursion

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=120,
    temperature=0.1,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{tinyllama_structured_output_lora,
  title={TinyLlama Structured Output LoRA},
  author={ABI},
  year={2026},
  publisher={Hugging Face}
}

Downloads last month: 21

Model tree for S-ABISHEAK/tinyllama-json-structured-output

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1486)

this model

Paper for S-ABISHEAK/tinyllama-json-structured-output

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 60