class transformers.Conv1Dtransformers.Conv1Dhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L98[{"name": "nf", "val": ""}, {"name": "nx", "val": ""}]- **nf** (`int`) -- The number of output features. - **nx** (`int`) -- The number of input features.0 1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2). Basically works like a linear layer but the weights are transposed.

transformers.apply_chunking_to_forwardtransformers.apply_chunking_to_forwardhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L182[{"name": "forward_fn", "val": ": Callable[..., torch.Tensor]"}, {"name": "chunk_size", "val": ": int"}, {"name": "chunk_dim", "val": ": int"}, {"name": "*input_tensors", "val": ""}]- **forward_fn** (`Callable[..., torch.Tensor]`) -- The forward function of the model. - **chunk_size** (`int`) -- The chunk size of a chunked tensor: `num_chunks = len(input_tensors[0]) / chunk_size`. - **chunk_dim** (`int`) -- The dimension over which the `input_tensors` should be chunked. - **input_tensors** (`tuple[torch.Tensor]`) -- The input tensors of `forward_fn` which will be chunked0`torch.Tensor`A tensor with the same shape as the `forward_fn` would have given if applied`. This function chunks the `input_tensors` into smaller input tensor parts of size `chunk_size` over the dimension `chunk_dim`. It then applies a layer `forward_fn` to each chunk independently to save memory. If the `forward_fn` is independent across the `chunk_dim` this function will yield the same result as directly applying `forward_fn` to `input_tensors`. Examples: ```python # rename the usual forward() fn to forward_chunk() def forward_chunk(self, hidden_states): hidden_states = self.decoder(hidden_states) return hidden_states # implement a chunked forward function def forward(self, hidden_states): return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states) ```

transformers.pytorch_utils.find_pruneable_heads_and_indicestransformers.pytorch_utils.find_pruneable_heads_and_indiceshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L260[{"name": "heads", "val": ": list[int]"}, {"name": "n_heads", "val": ": int"}, {"name": "head_size", "val": ": int"}, {"name": "already_pruned_heads", "val": ": set[int]"}]- **heads** (`list[int]`) -- List of the indices of heads to prune. - **n_heads** (`int`) -- The number of heads in the model. - **head_size** (`int`) -- The size of each head. - **already_pruned_heads** (`Set[int]`) -- A set of already pruned heads.0`tuple[Set[int], torch.LongTensor]`A tuple with the indices of heads to prune taking `already_pruned_heads` into account and the indices of rows/columns to keep in the layer weight. Finds the heads and their indices taking `already_pruned_heads` into account.

transformers.prune_layertransformers.prune_layerhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L160[{"name": "layer", "val": ": nn.Linear | Conv1D"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int | None = None"}]- **layer** (`Union[torch.nn.Linear, Conv1D]`) -- The layer to prune. - **index** (`torch.LongTensor`) -- The indices to keep in the layer. - **dim** (`int`, *optional*) -- The dimension on which to keep the indices.0`torch.nn.Linear` or [Conv1D](/docs/transformers/v4.57.0/zh/internal/modeling_utils#transformers.Conv1D)The pruned layer as a new layer with `requires_grad=True`. Prune a Conv1D or linear layer to keep only entries in index. Used to remove heads.

transformers.pytorch_utils.prune_conv1d_layertransformers.pytorch_utils.prune_conv1d_layerhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L127[{"name": "layer", "val": ": Conv1D"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int = 1"}]- **layer** ([Conv1D](/docs/transformers/v4.57.0/zh/internal/modeling_utils#transformers.Conv1D)) -- The layer to prune. - **index** (`torch.LongTensor`) -- The indices to keep in the layer. - **dim** (`int`, *optional*, defaults to 1) -- The dimension on which to keep the indices.0[Conv1D](/docs/transformers/v4.57.0/zh/internal/modeling_utils#transformers.Conv1D)The pruned layer as a new layer with `requires_grad=True`. Prune a Conv1D layer to keep only entries in index. A Conv1D work as a Linear layer (see e.g. BERT) but the weights are transposed. Used to remove heads.

transformers.pytorch_utils.prune_linear_layertransformers.pytorch_utils.prune_linear_layerhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L64[{"name": "layer", "val": ": nn.Linear"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int = 0"}]- **layer** (`torch.nn.Linear`) -- The layer to prune. - **index** (`torch.LongTensor`) -- The indices to keep in the layer. - **dim** (`int`, *optional*, defaults to 0) -- The dimension on which to keep the indices.0`torch.nn.Linear`The pruned layer as a new layer with `requires_grad=True`. Prune a linear layer to keep only entries in index. Used to remove heads.

class transformers.modeling_tf_utils.TFConv1Dtransformers.modeling_tf_utils.TFConv1Dhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L3247[{"name": "nf", "val": ""}, {"name": "nx", "val": ""}, {"name": "initializer_range", "val": " = 0.02"}, {"name": "**kwargs", "val": ""}]- **nf** (`int`) -- The number of output features. - **nx** (`int`) -- The number of input features. - **initializer_range** (`float`, *optional*, defaults to 0.02) -- The standard deviation to use to initialize the weights. - **kwargs** (`dict[str, Any]`, *optional*) -- Additional keyword arguments passed along to the `__init__` of `keras.layers.Layer`.0 1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2). Basically works like a linear layer but the weights are transposed.

class transformers.TFSequenceSummarytransformers.TFSequenceSummaryhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L3394[{"name": "config", "val": ": PretrainedConfig"}, {"name": "initializer_range", "val": ": float = 0.02"}, {"name": "**kwargs", "val": ""}]- **config** ([PretrainedConfig](/docs/transformers/v4.57.0/zh/main_classes/configuration#transformers.PretrainedConfig)) -- The config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses): - **summary_type** (`str`) -- The method to use to make this summary. Accepted values are: - `"last"` -- Take the last token hidden state (like XLNet) - `"first"` -- Take the first token hidden state (like Bert) - `"mean"` -- Take the mean of all tokens hidden states - `"cls_index"` -- Supply a Tensor of classification token position (GPT/GPT-2) - `"attn"` -- Not implemented now, use multi-head attention - **summary_use_proj** (`bool`) -- Add a projection after the vector extraction. - **summary_proj_to_labels** (`bool`) -- If `True`, the projection outputs to `config.num_labels` classes (otherwise to `config.hidden_size`). - **summary_activation** (`Optional[str]`) -- Set to `"tanh"` to add a tanh activation to the output, another string or `None` will add no activation. - **summary_first_dropout** (`float`) -- Optional dropout probability before the projection and activation. - **summary_last_dropout** (`float`)-- Optional dropout probability after the projection and activation. - **initializer_range** (`float`, *optional*, defaults to 0.02) -- The standard deviation to use to initialize the weights. - **kwargs** (`dict[str, Any]`, *optional*) -- Additional keyword arguments passed along to the `__init__` of `keras.layers.Layer`.0 Compute a single vector summary of a sequence hidden states.

class transformers.modeling_tf_utils.TFCausalLanguageModelingLosstransformers.modeling_tf_utils.TFCausalLanguageModelingLosshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L213[] Loss function suitable for causal language modeling (CLM), that is, the task of guessing the next token. Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

class transformers.modeling_tf_utils.TFMaskedLanguageModelingLosstransformers.modeling_tf_utils.TFMaskedLanguageModelingLosshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L324[] Loss function suitable for masked language modeling (MLM), that is, the task of guessing the masked tokens. Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

class transformers.modeling_tf_utils.TFMultipleChoiceLosstransformers.modeling_tf_utils.TFMultipleChoiceLosshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L316[] Loss function suitable for multiple choice tasks.

class transformers.modeling_tf_utils.TFQuestionAnsweringLosstransformers.modeling_tf_utils.TFQuestionAnsweringLosshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L242[] Loss function suitable for question answering.

class transformers.modeling_tf_utils.TFSequenceClassificationLosstransformers.modeling_tf_utils.TFSequenceClassificationLosshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L297[] Loss function suitable for sequence classification.

class transformers.modeling_tf_utils.TFTokenClassificationLosstransformers.modeling_tf_utils.TFTokenClassificationLosshttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L255[] Loss function suitable for token classification. Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

transformers.modeling_tf_utils.get_initializertransformers.modeling_tf_utils.get_initializerhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L3519[{"name": "initializer_range", "val": ": float = 0.02"}]- **initializer_range** (*float*, defaults to 0.02) -- Standard deviation of the initializer range.0`keras.initializers.TruncatedNormal`The truncated normal initializer. Creates a `keras.initializers.TruncatedNormal` with the given range.

transformers.modeling_tf_utils.keras_serializabletransformers.modeling_tf_utils.keras_serializablehttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L148[]- **cls** (a `keras.layers.Layers subclass`) -- Typically a `TF.MainLayer` class in this project, in general must accept a `config` argument to its initializer.0The same class object, with modifications for Keras deserialization. Decorate a Keras Layer class to support Keras serialization. This is done by: 1. Adding a `transformers_config` dict to the Keras config dictionary in `get_config` (called by Keras at serialization time. 2. Wrapping `__init__` to accept that `transformers_config` dict (passed by Keras at deserialization time) and convert it to a config object for the actual layer initializer. 3. Registering the class as a custom object in Keras (if the Tensorflow version supports this), so that it does not need to be supplied in `custom_objects` in the call to `keras.models.load_model`.

transformers.shape_listtransformers.shape_listhttps://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/tf_utils.py#L28[{"name": "tensor", "val": ": typing.Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray]"}]- **tensor** (`tf.Tensor` or `np.ndarray`) -- The tensor we want the shape of.0`list[int]`The shape of the tensor as a list. Deal with dynamic shape in tensorflow cleanly.