File size: 3,207 Bytes
8088350
3e8322b
 
 
 
 
 
 
 
 
 
 
 
8088350
3e8322b
8088350
 
3e8322b
8088350
3e8322b
8088350
 
 
3e8322b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
language: en
license: apache-2.0
tags:
- t5
- music
- spotify
- text2json
- audio-features
- fine-tuned
base_model: t5-base
datasets:
- custom
library_name: transformers
pipeline_tag: text2text-generation
---

# T5-Base Fine-tuned for Spotify Features Prediction

T5-Base fine-tuned to convert natural language prompts into Spotify audio feature JSON

## Model Details

- **Base Model**: t5-base
- **Model Type**: Text-to-JSON generation
- **Language**: English
- **Task**: Convert natural language music preferences into Spotify audio feature JSON objects
- **Fine-tuning Dataset**: Custom dataset of prompts to Spotify audio features

## Training Configuration

- **Epochs**: 7
- **Learning Rate**: 3e-4
- **Batch Size**: 8 (per device)
- **Gradient Accumulation Steps**: 4
- **Scheduler**: Cosine with warmup
- **Optimizer**: AdamW
- **Max Length**: 256 tokens
- **Precision**: bfloat16

## Usage

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
import json

# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("afsagag/t5-spotify-features")
tokenizer = T5Tokenizer.from_pretrained("afsagag/t5-spotify-features")

# Example usage
prompt = "I want energetic dance music with high energy and danceability"
input_text = f"prompt: {prompt}"

# Tokenize and generate
input_ids = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True).input_ids
outputs = model.generate(
    input_ids, 
    max_length=256, 
    num_beams=4, 
    early_stopping=True,
    do_sample=False
)

# Decode result
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

# Parse JSON output
try:
    spotify_features = json.loads(result)
    print("Generated Spotify Features:", spotify_features)
except json.JSONDecodeError:
    print("Generated text is not valid JSON")
```

## Expected Output Format

The model generates JSON objects with Spotify audio features:

```json
{
  "danceability": 0.85,
  "energy": 0.90,
  "valence": 0.75,
  "acousticness": 0.15,
  "instrumentalness": 0.05,
  "speechiness": 0.08,
}
```

## Metrics

- **Per-set Mean Absolute Error**: Measures average prediction accuracy across feature sets
- **Per-set Root Mean Squared Error**: Measures prediction variance
- **Per-feature Correlation**: Pearson correlation for individual audio features

## Model Files

- `config.json`: Model configuration
- `pytorch_model.bin`: Model weights
- `tokenizer.json`: Tokenizer vocabulary
- `tokenizer_config.json`: Tokenizer configuration
- `special_tokens_map.json`: Special token mappings

## Limitations

- Model may occasionally generate invalid JSON that requires post-processing
- Trained on specific prompt formats starting with "prompt: "
- Performance depends on similarity to training data distribution
- May not generalize well to very abstract or unusual music descriptions

## Training Data

The model was trained on a custom dataset pairing natural language music descriptions with corresponding Spotify audio feature values.

## Ethical Considerations

This model generates music preference predictions and should not be used as the sole basis for music recommendation systems without human oversight.