• noneabove1182@sh.itjust.worksOPM
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    It depends on the learning rate, typically it’s ideal and higher quality to learn really slowly over a lot of epochs but it’s cheaper and obviously faster to learn fast over fewer epochs

    Also the dataset size is important to consider

    • justynasty@lemmy.kya.moe
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      I was concerned that a large dataset with low sentence similarity may take longer to train. I’m not sure if my idea that novels take less time to train than a Q&A dataset with detailed answers is true: generic roleplay vs encyclopedic knowledge.

      Reading these datasets, I think these GPT3/4 conversations go into too much detail, and current (1-40B) language models cannot be trained in such detail. These conversations would be only useful for humans. But I might be wrong about training because I don’t have experience with 100B+ models, and how they scale down.