IIS-NLP
/

difficulty-scorer-8B-v2

Model card Files Files and versions

lucweber commited on May 23

Commit

9a31823

·

verified ·

1 Parent(s): d4c0208

Update README.md

Files changed (1) hide show

README.md +63 -0

README.md CHANGED Viewed

	@@ -0,0 +1,63 @@

+# Difficulty Scorer v2
+A Qwen3-8B based difficulty scorer trained on our own difficulty data, as it is used in our EMNLP 2025 submission titled
+**Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy** [REF]
+## Model Architecture
+- Base model: [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B)
+- Custom head: Regression head on top of pooling layer.
+For more details, see `model.py`
+## Use Cases
+The model can be used to classify the difficulty of model instructions. More challenging instructions are associated with better learning outcomes during training.
+---
+##  How to Use
+###  Inference
+```python
+pass
+```
+---
+##  Model Files
+* `pytorch_model-0000x-of-00002.bin` – finetuned model weights
+* `regression_head.bin` - custom regression head
+* `config.json` – configuration including base model and head details
+* `tokenizer.json`, `vocab.txt`, etc. – tokenizer files
+* `model.py` – custom regression model implementation
+---
+## Evaluation
+We mostly checked the validity of the scorer through it's downstream benefits in training (see paper).
+We additionally did a sanity check with coding data from [deepmind/code_contests](https://huggingface.co/datasets/deepmind/code_contests), which contains difficulty scores:
+![Correlation code contest](./scatter_code_contests_vs_difficulty.png)
+Correlation of our difficulty scores with code_contest data is `r = 0.41`
+---
+## Responsible
+Mostly Lucas W.