Additional Research


Decoding Strategies in auto-regressive models

[1] decoding methods for language generation

While reading on decoding strategies supported by HuggingFace … (details on .generate() and available decoding strategies)

So apparently, the decoding strategy is passed in .generate in form of **kwargs of type: GenerationConfig (transformers/generation/configuration_utils.py)

no_repeat_ngram_size (args) -> this prevents the prevents the model from repeating any n-grams of the specified size. For example, if set to 3, once a three-token sequence appears, the model is forbidden from generating that same three-token sequence again. 

Problems with Greedy Decoding #edit

Problems with Beam Search Decoding

Temperature (LLM parameter)

Top K sampling