# 自定义层和工具

此页面列出了库使用的所有自定义层，以及它为模型提供的实用函数。

其中大多数只有在您研究库中模型的代码时才有用。


## Pytorch自定义模块[[transformers.Conv1D]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.Conv1D</name><anchor>transformers.Conv1D</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L98</source><parameters>[{"name": "nf", "val": ""}, {"name": "nx", "val": ""}]</parameters><paramsdesc>- **nf** (`int`) -- The number of output features.
- **nx** (`int`) -- The number of input features.</paramsdesc><paramgroups>0</paramgroups></docstring>

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.




</div>

## PyTorch帮助函数[[transformers.apply_chunking_to_forward]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.apply_chunking_to_forward</name><anchor>transformers.apply_chunking_to_forward</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L182</source><parameters>[{"name": "forward_fn", "val": ": Callable[..., torch.Tensor]"}, {"name": "chunk_size", "val": ": int"}, {"name": "chunk_dim", "val": ": int"}, {"name": "*input_tensors", "val": ""}]</parameters><paramsdesc>- **forward_fn** (`Callable[..., torch.Tensor]`) --
  The forward function of the model.
- **chunk_size** (`int`) --
  The chunk size of a chunked tensor: `num_chunks = len(input_tensors[0]) / chunk_size`.
- **chunk_dim** (`int`) --
  The dimension over which the `input_tensors` should be chunked.
- **input_tensors** (`tuple[torch.Tensor]`) --
  The input tensors of `forward_fn` which will be chunked</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>A tensor with the same shape as the `forward_fn` would have given if applied`.</retdesc></docstring>

This function chunks the `input_tensors` into smaller input tensor parts of size `chunk_size` over the dimension
`chunk_dim`. It then applies a layer `forward_fn` to each chunk independently to save memory.

If the `forward_fn` is independent across the `chunk_dim` this function will yield the same result as directly
applying `forward_fn` to `input_tensors`.







<ExampleCodeBlock anchor="transformers.apply_chunking_to_forward.example">

Examples:

```python
# rename the usual forward() fn to forward_chunk()
def forward_chunk(self, hidden_states):
    hidden_states = self.decoder(hidden_states)
    return hidden_states


# implement a chunked forward function
def forward(self, hidden_states):
    return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states)
```

</ExampleCodeBlock>

</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.pytorch_utils.find_pruneable_heads_and_indices</name><anchor>transformers.pytorch_utils.find_pruneable_heads_and_indices</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L260</source><parameters>[{"name": "heads", "val": ": list[int]"}, {"name": "n_heads", "val": ": int"}, {"name": "head_size", "val": ": int"}, {"name": "already_pruned_heads", "val": ": set[int]"}]</parameters><paramsdesc>- **heads** (`list[int]`) -- List of the indices of heads to prune.
- **n_heads** (`int`) -- The number of heads in the model.
- **head_size** (`int`) -- The size of each head.
- **already_pruned_heads** (`Set[int]`) -- A set of already pruned heads.</paramsdesc><paramgroups>0</paramgroups><rettype>`tuple[Set[int], torch.LongTensor]`</rettype><retdesc>A tuple with the indices of heads to prune taking `already_pruned_heads`
into account and the indices of rows/columns to keep in the layer weight.</retdesc></docstring>

Finds the heads and their indices taking `already_pruned_heads` into account.








</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.prune_layer</name><anchor>transformers.prune_layer</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L160</source><parameters>[{"name": "layer", "val": ": nn.Linear | Conv1D"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int | None = None"}]</parameters><paramsdesc>- **layer** (`Union[torch.nn.Linear, Conv1D]`) -- The layer to prune.
- **index** (`torch.LongTensor`) -- The indices to keep in the layer.
- **dim** (`int`, *optional*) -- The dimension on which to keep the indices.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.nn.Linear` or [Conv1D](/docs/transformers/v4.57.0/zh/internal/modeling_utils#transformers.Conv1D)</rettype><retdesc>The pruned layer as a new layer with `requires_grad=True`.</retdesc></docstring>

Prune a Conv1D or linear layer to keep only entries in index.

Used to remove heads.








</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.pytorch_utils.prune_conv1d_layer</name><anchor>transformers.pytorch_utils.prune_conv1d_layer</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L127</source><parameters>[{"name": "layer", "val": ": Conv1D"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int = 1"}]</parameters><paramsdesc>- **layer** ([Conv1D](/docs/transformers/v4.57.0/zh/internal/modeling_utils#transformers.Conv1D)) -- The layer to prune.
- **index** (`torch.LongTensor`) -- The indices to keep in the layer.
- **dim** (`int`, *optional*, defaults to 1) -- The dimension on which to keep the indices.</paramsdesc><paramgroups>0</paramgroups><rettype>[Conv1D](/docs/transformers/v4.57.0/zh/internal/modeling_utils#transformers.Conv1D)</rettype><retdesc>The pruned layer as a new layer with `requires_grad=True`.</retdesc></docstring>

Prune a Conv1D layer to keep only entries in index. A Conv1D work as a Linear layer (see e.g. BERT) but the weights
are transposed.

Used to remove heads.








</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.pytorch_utils.prune_linear_layer</name><anchor>transformers.pytorch_utils.prune_linear_layer</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/pytorch_utils.py#L64</source><parameters>[{"name": "layer", "val": ": nn.Linear"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int = 0"}]</parameters><paramsdesc>- **layer** (`torch.nn.Linear`) -- The layer to prune.
- **index** (`torch.LongTensor`) -- The indices to keep in the layer.
- **dim** (`int`, *optional*, defaults to 0) -- The dimension on which to keep the indices.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.nn.Linear`</rettype><retdesc>The pruned layer as a new layer with `requires_grad=True`.</retdesc></docstring>

Prune a linear layer to keep only entries in index.

Used to remove heads.








</div>

## TensorFlow自定义层[[transformers.modeling_tf_utils.TFConv1D]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.modeling_tf_utils.TFConv1D</name><anchor>transformers.modeling_tf_utils.TFConv1D</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L3247</source><parameters>[{"name": "nf", "val": ""}, {"name": "nx", "val": ""}, {"name": "initializer_range", "val": " = 0.02"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **nf** (`int`) --
  The number of output features.
- **nx** (`int`) --
  The number of input features.
- **initializer_range** (`float`, *optional*, defaults to 0.02) --
  The standard deviation to use to initialize the weights.
- **kwargs** (`dict[str, Any]`, *optional*) --
  Additional keyword arguments passed along to the `__init__` of `keras.layers.Layer`.</paramsdesc><paramgroups>0</paramgroups></docstring>

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.




</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.TFSequenceSummary</name><anchor>transformers.TFSequenceSummary</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L3394</source><parameters>[{"name": "config", "val": ": PretrainedConfig"}, {"name": "initializer_range", "val": ": float = 0.02"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **config** ([PretrainedConfig](/docs/transformers/v4.57.0/zh/main_classes/configuration#transformers.PretrainedConfig)) --
  The config used by the model. Relevant arguments in the config class of the model are (refer to the actual
  config class of your model for the default values it uses):

  - **summary_type** (`str`) -- The method to use to make this summary. Accepted values are:

    - `"last"` -- Take the last token hidden state (like XLNet)
    - `"first"` -- Take the first token hidden state (like Bert)
    - `"mean"` -- Take the mean of all tokens hidden states
    - `"cls_index"` -- Supply a Tensor of classification token position (GPT/GPT-2)
    - `"attn"` -- Not implemented now, use multi-head attention

  - **summary_use_proj** (`bool`) -- Add a projection after the vector extraction.
  - **summary_proj_to_labels** (`bool`) -- If `True`, the projection outputs to `config.num_labels` classes
    (otherwise to `config.hidden_size`).
  - **summary_activation** (`Optional[str]`) -- Set to `"tanh"` to add a tanh activation to the output,
    another string or `None` will add no activation.
  - **summary_first_dropout** (`float`) -- Optional dropout probability before the projection and activation.
  - **summary_last_dropout** (`float`)-- Optional dropout probability after the projection and activation.

- **initializer_range** (`float`, *optional*, defaults to 0.02) -- The standard deviation to use to initialize the weights.
- **kwargs** (`dict[str, Any]`, *optional*) --
  Additional keyword arguments passed along to the `__init__` of `keras.layers.Layer`.</paramsdesc><paramgroups>0</paramgroups></docstring>

Compute a single vector summary of a sequence hidden states.




</div>

## TensorFlow loss 函数[[transformers.modeling_tf_utils.TFCausalLanguageModelingLoss]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.modeling_tf_utils.TFCausalLanguageModelingLoss</name><anchor>transformers.modeling_tf_utils.TFCausalLanguageModelingLoss</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L213</source><parameters>[]</parameters></docstring>

Loss function suitable for causal language modeling (CLM), that is, the task of guessing the next token.

<Tip>

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

</Tip>


</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss</name><anchor>transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L324</source><parameters>[]</parameters></docstring>

Loss function suitable for masked language modeling (MLM), that is, the task of guessing the masked tokens.

<Tip>

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

</Tip>


</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.modeling_tf_utils.TFMultipleChoiceLoss</name><anchor>transformers.modeling_tf_utils.TFMultipleChoiceLoss</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L316</source><parameters>[]</parameters></docstring>
Loss function suitable for multiple choice tasks.

</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.modeling_tf_utils.TFQuestionAnsweringLoss</name><anchor>transformers.modeling_tf_utils.TFQuestionAnsweringLoss</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L242</source><parameters>[]</parameters></docstring>

Loss function suitable for question answering.


</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.modeling_tf_utils.TFSequenceClassificationLoss</name><anchor>transformers.modeling_tf_utils.TFSequenceClassificationLoss</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L297</source><parameters>[]</parameters></docstring>

Loss function suitable for sequence classification.


</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.modeling_tf_utils.TFTokenClassificationLoss</name><anchor>transformers.modeling_tf_utils.TFTokenClassificationLoss</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L255</source><parameters>[]</parameters></docstring>

Loss function suitable for token classification.

<Tip>

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

</Tip>


</div>

## TensorFlow帮助函数[[transformers.modeling_tf_utils.get_initializer]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.modeling_tf_utils.get_initializer</name><anchor>transformers.modeling_tf_utils.get_initializer</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L3519</source><parameters>[{"name": "initializer_range", "val": ": float = 0.02"}]</parameters><paramsdesc>- **initializer_range** (*float*, defaults to 0.02) -- Standard deviation of the initializer range.</paramsdesc><paramgroups>0</paramgroups><rettype>`keras.initializers.TruncatedNormal`</rettype><retdesc>The truncated normal initializer.</retdesc></docstring>

Creates a `keras.initializers.TruncatedNormal` with the given range.








</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.modeling_tf_utils.keras_serializable</name><anchor>transformers.modeling_tf_utils.keras_serializable</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/modeling_tf_utils.py#L148</source><parameters>[]</parameters><paramsdesc>- **cls** (a `keras.layers.Layers subclass`) --
  Typically a `TF.MainLayer` class in this project, in general must accept a `config` argument to its
  initializer.</paramsdesc><paramgroups>0</paramgroups><retdesc>The same class object, with modifications for Keras deserialization.</retdesc></docstring>

Decorate a Keras Layer class to support Keras serialization.

This is done by:

1. Adding a `transformers_config` dict to the Keras config dictionary in `get_config` (called by Keras at
   serialization time.
2. Wrapping `__init__` to accept that `transformers_config` dict (passed by Keras at deserialization time) and
   convert it to a config object for the actual layer initializer.
3. Registering the class as a custom object in Keras (if the Tensorflow version supports this), so that it does not
   need to be supplied in `custom_objects` in the call to `keras.models.load_model`.






</div>

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>transformers.shape_list</name><anchor>transformers.shape_list</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/tf_utils.py#L28</source><parameters>[{"name": "tensor", "val": ": typing.Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray]"}]</parameters><paramsdesc>- **tensor** (`tf.Tensor` or `np.ndarray`) -- The tensor we want the shape of.</paramsdesc><paramgroups>0</paramgroups><rettype>`list[int]`</rettype><retdesc>The shape of the tensor as a list.</retdesc></docstring>

Deal with dynamic shape in tensorflow cleanly.








</div>

<EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/zh/internal/modeling_utils.md" />