Instructions to use AIMH/SQPsychLLM-8b-nemotron with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIMH/SQPsychLLM-8b-nemotron with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AIMH/SQPsychLLM-8b-nemotron")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIMH/SQPsychLLM-8b-nemotron")
model = AutoModelForCausalLM.from_pretrained("AIMH/SQPsychLLM-8b-nemotron")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AIMH/SQPsychLLM-8b-nemotron with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIMH/SQPsychLLM-8b-nemotron"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIMH/SQPsychLLM-8b-nemotron",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIMH/SQPsychLLM-8b-nemotron

SGLang

How to use AIMH/SQPsychLLM-8b-nemotron with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIMH/SQPsychLLM-8b-nemotron" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIMH/SQPsychLLM-8b-nemotron",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIMH/SQPsychLLM-8b-nemotron" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIMH/SQPsychLLM-8b-nemotron",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AIMH/SQPsychLLM-8b-nemotron with Docker Model Runner:
```
docker model run hf.co/AIMH/SQPsychLLM-8b-nemotron
```

Model Card for SQPsychLLM-8b-nemotron

SQPsychLLM-8b-nemotron is a chat model fine-tuned to roleplay a therapist in synthetic, Cognitive Behavioral Therapy (CBT)-informed counseling conversations. It is part of the SQPsychLLM family released with the paper Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires.

This checkpoint is Llama-3-8B-Instruct supervised-fine-tuned on SQPsychConv (Nemotron), the synthetic corpus generated by nvidia/Llama-3_3-Nemotron-Super-49B-v1 from real, de-identified structured client profiles and psychological questionnaires (BDI, HAM-D).

⚠️ Research use only. This model is not a medical device and not a substitute for professional mental-health care. It must not be deployed to interact with patients or anyone in distress without rigorous further validation and qualified clinical oversight. See Out-of-Scope Use.

Model Details

Model Description

Developed by: Doan Nam Long Vu and collaborators (Technical University of Darmstadt; Philipps-University Marburg; Justus Liebig University Giessen; University of Münster), released under the AIMH ("AI for Mental Health") organization
Funded by: LOEWE Center DYNAMIC (Hessian LOEWE program), grant LOEWE1/16/519/03/09.001(0009)/98
Model type: Decoder-only causal language model (instruction/chat), fine-tuned for therapist roleplay
Language(s): English
License: llama3 (inherited from the base model; see License and data provenance)
Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct

Model Sources

Repository (code): https://github.com/AI-MH/questionnaire2dialogue
Project page: https://ai-mh.github.io/SQPsych
Paper: https://arxiv.org/abs/2510.25384
Datasets: https://huggingface.co/collections/AIMH/sqpsychconv
Model family: https://huggingface.co/collections/AIMH/sqpsychllm

Uses

Direct Use

Research on synthetic mental-health dialogue: generating therapist-side turns in CBT-style counseling conversations, studying privacy-preserving synthetic data, and benchmarking counseling-oriented language models. For the full questionnaire-conditioned, dual-agent (therapist + client) generation pipeline, see the code repository.

Downstream Use

As a starting point for further research fine-tuning, or as a component in supervised, human-in-the-loop training and education settings (e.g., clinician/student practice simulations) under appropriate oversight and ethics approval.

Out-of-Scope Use

This model must not be used to:

provide therapy, diagnosis, or any clinical decision to real people;
act as a crisis, emergency, or safety-critical support system;
interact with patients or people in distress without further validation, clinical supervision, and regulatory approval;
impersonate, or be presented as, a real licensed clinician.

Bias, Risks, and Limitations

Hallucination and unsafe output. Like all LLMs, it can produce incorrect, fabricated, or clinically inappropriate content, including advice that is not evidence-based.
Limited clinical scope. The conditioning data covers major depressive disorder; other conditions, comorbidities, severities, and acute-risk presentations are out of scope and underrepresented.
Limited population coverage. The source cohort's demographics, language, and cultural context constrain generalization.
Inherited bias. Outputs reflect biases of the base model (Llama-3-8B-Instruct) and of the model (nvidia/Llama-3_3-Nemotron-Super-49B-v1) used to generate the training corpus.
Synthetic-vs-real gap. Generated dialogues may not capture the full complexity or risk dynamics of real clinical interactions.

Recommendations

Keep a qualified human professional in the loop for any applied use, validate on your own population, obtain your own ethics approval before any study involving people, and add explicit safety guardrails and crisis-resource handling in any interactive system. See the project ETHICS statement.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "AIMH/SQPsychLLM-8b-nemotron"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype="bfloat16", device_map="auto")

messages = [
    {"role": "system", "content": "You are an empathetic therapist conducting a CBT-informed session."},
    {"role": "user", "content": "I've felt down and unmotivated for weeks and I don't know why."},
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Serve with vLLM (OpenAI-compatible API):

vllm serve "AIMH/SQPsychLLM-8b-nemotron"

Training Data

AIMH/SQPsychConv_nemotron, synthetic therapist-client conversations generated by nvidia/Llama-3_3-Nemotron-Super-49B-v1, conditioned on de-identified structured profiles and questionnaire scores (BDI, HAM-D) from the cohort of Kircher et al. (2019). The instruction-formatted split used for training is AIMH/SQPsychConv_nemotron_finetune.

License and data provenance

The model weights derive from Llama-3-8B-Instruct (Llama 3 Community License). The training data was generated by nvidia/Llama-3_3-Nemotron-Super-49B-v1, so the NVIDIA Open Model License (with Llama 3.3 community terms) may apply to the synthetic data; review it before redistribution. The source structured data is de-identified and pre-anonymized, and the released conversations are synthetic and contain no personally identifiable information. Released for research only.