class transformers.BartphoTokenizertransformers.BartphoTokenizerhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/models/bartpho/tokenization_bartpho.py#L36[{"name": "vocab_file", "val": ""}, {"name": "monolingual_vocab_file", "val": ""}, {"name": "bos_token", "val": " = '~~'"}, {"name": "eos_token", "val": " = '~~'"}, {"name": "sep_token", "val": " = ''"}, {"name": "cls_token", "val": " = ''"}, {"name": "unk_token", "val": " = ''"}, {"name": "pad_token", "val": " = ''"}, {"name": "mask_token", "val": " = ''"}, {"name": "sp_model_kwargs", "val": ": typing.Optional[dict[str, typing.Any]] = None"}, {"name": "**kwargs", "val": ""}]- **vocab_file** (`str`) -- Path to the vocabulary file. This vocabulary is the pre-trained SentencePiece model available from the multilingual XLM-RoBERTa, also used in mBART, consisting of 250K types. - **monolingual_vocab_file** (`str`) -- Path to the monolingual vocabulary file. This monolingual vocabulary consists of Vietnamese-specialized types extracted from the multilingual vocabulary vocab_file of 250K types. - **bos_token** (`str`, *optional*, defaults to `""`) -- The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the `cls_token`. - **eos_token** (`str`, *optional*, defaults to `""`) -- The end of sequence token. When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the `sep_token`. - **sep_token** (`str`, *optional*, defaults to `""`) -- The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. - **cls_token** (`str`, *optional*, defaults to `""`) -- The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. - **unk_token** (`str`, *optional*, defaults to `""`) -- The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. - **pad_token** (`str`, *optional*, defaults to `""`) -- The token used for padding, for example when batching sequences of different lengths. - **mask_token** (`str`, *optional*, defaults to `""`) -- The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. - **sp_model_kwargs** (`dict`, *optional*) -- Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set: - `enable_sampling`: Enable subword regularization. - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout. - `nbest_size = {0,1}`: No sampling is performed. - `nbest_size > 1`: samples from the nbest_size results. - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm. - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout. - **sp_model** (`SentencePieceProcessor`) -- The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).0 Adapted from `XLMRobertaTokenizer`. Based on [SentencePiece](https://github.com/google/sentencepiece). This tokenizer inherits from [PreTrainedTokenizer](/docs/transformers/v4.57.0/ja/main_classes/tokenizer#transformers.PreTrainedTokenizer) which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

build_inputs_with_special_tokenstransformers.BartphoTokenizer.build_inputs_with_special_tokenshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/models/bartpho/tokenization_bartpho.py#L179[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]- **token_ids_0** (`list[int]`) -- List of IDs to which the special tokens will be added. - **token_ids_1** (`list[int]`, *optional*) -- Optional second list of IDs for sequence pairs.0`list[int]`List of [input IDs](../glossary#input-ids) with the appropriate special tokens. Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. An BARTPho sequence has the following format: - single sequence: ` X ` - pair of sequences: ` A B `

convert_tokens_to_stringtransformers.BartphoTokenizer.convert_tokens_to_stringhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/models/bartpho/tokenization_bartpho.py#L281[{"name": "tokens", "val": ""}] Converts a sequence of tokens (strings for sub-words) in a single string.

create_token_type_ids_from_sequencestransformers.BartphoTokenizer.create_token_type_ids_from_sequenceshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/models/bartpho/tokenization_bartpho.py#L233[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]- **token_ids_0** (`list[int]`) -- List of IDs. - **token_ids_1** (`list[int]`, *optional*) -- Optional second list of IDs for sequence pairs.0`list[int]`List of zeros. Create a mask from the two sequences passed to be used in a sequence-pair classification task. BARTPho does not make use of token type ids, therefore a list of zeros is returned.

get_special_tokens_masktransformers.BartphoTokenizer.get_special_tokens_maskhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/models/bartpho/tokenization_bartpho.py#L205[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}, {"name": "already_has_special_tokens", "val": ": bool = False"}]- **token_ids_0** (`list[int]`) -- List of IDs. - **token_ids_1** (`list[int]`, *optional*) -- Optional second list of IDs for sequence pairs. - **already_has_special_tokens** (`bool`, *optional*, defaults to `False`) -- Whether or not the token list is already formatted with special tokens for the model.0`list[int]`A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token. Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer `prepare_for_model` method.