• Fluffy Kitty Cat@slrpnk.net
    link
    fedilink
    English
    arrow-up
    16
    ·
    7 days ago

    It’s the generation speed. Internally LLMs use tokens which represent either words or parts of words and map them to integer values. The model then does it’s prediction on which integer is most likely to come after the input. How the words are split up is an implementation detail that can vary from model to model