Matryoshka Representation Learning
Paper • 2205.13147 • Published • 26
How to use AlexWortega/qwen1k with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("AlexWortega/qwen1k")
sentences = [
"How old is Garry Marshall?",
"Garry Marshall\nOn the morning of July 19, 2016, Marshall died at a hospital in Burbank, California at the age of 81 due to complications of pneumonia after suffering a stroke.[20][21]",
"Gregg Marshall\nMichael Gregg Marshall (born February 27, 1963) is an American college basketball coach who currently leads the Shockers team at Wichita State University. Marshall has coached his teams to appearances in the NCAA Men's Division I Basketball Tournament in twelve of his eighteen years as a head coach. He is the most successful head coach in Wichita State University history (261 wins), and is also the most successful head coach in Winthrop University history (194 wins).",
"Guillotine\nFor a period of time after its invention, the guillotine was called a louisette. However, it was later named after Guillotin who had proposed that a less painful method of execution should be found in place of the breaking wheel, though he opposed the death penalty and bemoaned the association of the guillotine with his name."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from Qwen/Qwen2.5-0.5B-Instruct. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model
(1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("AlexWortega/qwen1k")
# Run inference
sentences = [
'When did the July Monarchy end?',
'July Monarchy\nThe July Monarchy (French: Monarchie de Juillet) was a liberal constitutional monarchy in France under Louis Philippe I, starting with the July Revolution of 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration (1814–1830). It began with the overthrow of the conservative government of Charles X, the last king of the House of Bourbon.',
'July Monarchy\nDespite the return of the House of Bourbon to power, France was much changed from the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries remained an important force and the autocracy and hierarchy of the earlier era could not be fully restored. Economic changes, which had been underway long before the revolution, had progressed further during the years of turmoil and were firmly entrenched by 1815. These changes had seen power shift from the noble landowners to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic Code and efficient bureaucracy, also remained in place. These changes produced a unified central government that was fiscally sound and had much control over all areas of French life, a sharp difference from the complicated mix of feudal and absolutist traditions and institutions of pre-Revolutionary Bourbons.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sts-dev-896 and sts-dev-768EmbeddingSimilarityEvaluator| Metric | sts-dev-896 | sts-dev-768 |
|---|---|---|
| pearson_cosine | 0.4573 | 0.4455 |
| spearman_cosine | 0.4965 | 0.4897 |
query, response, and negative| query | response | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| query | response | negative |
|---|---|---|
Was there a year 0? |
Year zero |
504 |
When is the dialectical method used? |
Dialectic |
Derek Bentley case |
What do Grasshoppers eat? |
Grasshopper |
Groundhog |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
896,
768
],
"matryoshka_weights": [
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 12per_device_eval_batch_size: 12gradient_accumulation_steps: 4num_train_epochs: 1warmup_ratio: 0.3bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 12per_device_eval_batch_size: 12per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.3warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | sts-dev-896_spearman_cosine | sts-dev-768_spearman_cosine |
|---|---|---|---|---|
| 0.0002 | 10 | 4.4351 | - | - |
| 0.0003 | 20 | 4.6508 | - | - |
| 0.0005 | 30 | 4.7455 | - | - |
| 0.0007 | 40 | 4.5427 | - | - |
| 0.0008 | 50 | 4.3982 | - | - |
| 0.0010 | 60 | 4.3755 | - | - |
| 0.0012 | 70 | 4.4105 | - | - |
| 0.0013 | 80 | 5.2227 | - | - |
| 0.0015 | 90 | 5.8062 | - | - |
| 0.0017 | 100 | 5.7645 | - | - |
| 0.0018 | 110 | 5.9261 | - | - |
| 0.0020 | 120 | 5.8301 | - | - |
| 0.0022 | 130 | 5.7602 | - | - |
| 0.0023 | 140 | 5.9392 | - | - |
| 0.0025 | 150 | 5.7523 | - | - |
| 0.0027 | 160 | 5.8585 | - | - |
| 0.0029 | 170 | 5.7916 | - | - |
| 0.0030 | 180 | 5.8157 | - | - |
| 0.0032 | 190 | 5.7102 | - | - |
| 0.0034 | 200 | 5.5844 | - | - |
| 0.0035 | 210 | 5.5463 | - | - |
| 0.0037 | 220 | 5.5823 | - | - |
| 0.0039 | 230 | 5.5514 | - | - |
| 0.0040 | 240 | 5.5646 | - | - |
| 0.0042 | 250 | 5.5783 | - | - |
| 0.0044 | 260 | 5.5344 | - | - |
| 0.0045 | 270 | 5.523 | - | - |
| 0.0047 | 280 | 5.4969 | - | - |
| 0.0049 | 290 | 5.5407 | - | - |
| 0.0050 | 300 | 5.6171 | - | - |
| 0.0052 | 310 | 5.5581 | - | - |
| 0.0054 | 320 | 5.8903 | - | - |
| 0.0055 | 330 | 5.8675 | - | - |
| 0.0057 | 340 | 5.745 | - | - |
| 0.0059 | 350 | 5.6041 | - | - |
| 0.0060 | 360 | 5.5476 | - | - |
| 0.0062 | 370 | 5.3964 | - | - |
| 0.0064 | 380 | 5.3564 | - | - |
| 0.0065 | 390 | 5.3054 | - | - |
| 0.0067 | 400 | 5.2779 | - | - |
| 0.0069 | 410 | 5.206 | - | - |
| 0.0070 | 420 | 5.2168 | - | - |
| 0.0072 | 430 | 5.1645 | - | - |
| 0.0074 | 440 | 5.1797 | - | - |
| 0.0076 | 450 | 5.2526 | - | - |
| 0.0077 | 460 | 5.1768 | - | - |
| 0.0079 | 470 | 5.3519 | - | - |
| 0.0081 | 480 | 5.2982 | - | - |
| 0.0082 | 490 | 5.3229 | - | - |
| 0.0084 | 500 | 5.3758 | - | - |
| 0.0086 | 510 | 5.2478 | - | - |
| 0.0087 | 520 | 5.1799 | - | - |
| 0.0089 | 530 | 5.1088 | - | - |
| 0.0091 | 540 | 4.977 | - | - |
| 0.0092 | 550 | 4.9108 | - | - |
| 0.0094 | 560 | 4.811 | - | - |
| 0.0096 | 570 | 4.7203 | - | - |
| 0.0097 | 580 | 4.6499 | - | - |
| 0.0099 | 590 | 4.4548 | - | - |
| 0.0101 | 600 | 4.2891 | - | - |
| 0.0102 | 610 | 4.1881 | - | - |
| 0.0104 | 620 | 4.6 | - | - |
| 0.0106 | 630 | 4.5365 | - | - |
| 0.0107 | 640 | 4.3086 | - | - |
| 0.0109 | 650 | 4.0452 | - | - |
| 0.0111 | 660 | 3.9041 | - | - |
| 0.0112 | 670 | 4.3938 | - | - |
| 0.0114 | 680 | 4.3198 | - | - |
| 0.0116 | 690 | 4.1294 | - | - |
| 0.0117 | 700 | 4.077 | - | - |
| 0.0119 | 710 | 3.9174 | - | - |
| 0.0121 | 720 | 4.1629 | - | - |
| 0.0123 | 730 | 3.9611 | - | - |
| 0.0124 | 740 | 3.7768 | - | - |
| 0.0126 | 750 | 3.5842 | - | - |
| 0.0128 | 760 | 3.1196 | - | - |
| 0.0129 | 770 | 3.6288 | - | - |
| 0.0131 | 780 | 3.273 | - | - |
| 0.0133 | 790 | 2.7889 | - | - |
| 0.0134 | 800 | 2.5096 | - | - |
| 0.0136 | 810 | 1.8878 | - | - |
| 0.0138 | 820 | 2.3423 | - | - |
| 0.0139 | 830 | 1.7687 | - | - |
| 0.0141 | 840 | 2.0781 | - | - |
| 0.0143 | 850 | 2.4598 | - | - |
| 0.0144 | 860 | 1.7667 | - | - |
| 0.0146 | 870 | 2.6247 | - | - |
| 0.0148 | 880 | 1.916 | - | - |
| 0.0149 | 890 | 2.0817 | - | - |
| 0.0151 | 900 | 2.3679 | - | - |
| 0.0153 | 910 | 1.418 | - | - |
| 0.0154 | 920 | 2.7353 | - | - |
| 0.0156 | 930 | 1.992 | - | - |
| 0.0158 | 940 | 1.4564 | - | - |
| 0.0159 | 950 | 1.4154 | - | - |
| 0.0161 | 960 | 0.9499 | - | - |
| 0.0163 | 970 | 1.6304 | - | - |
| 0.0164 | 980 | 0.9264 | - | - |
| 0.0166 | 990 | 1.3278 | - | - |
| 0.0168 | 1000 | 1.686 | 0.4965 | 0.4897 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}