• justynasty@lemmy.kya.moe
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    1 year ago

    OpenOrca and Dolphin seem to have the same purpose, with different flavors. There is already a Mistral fine tune for roleplay, nsfw :/ (for both Orca and Dolphin) People mix and upload releases faster than we can post news about them. ^^

    It took 48 hours to train 10 epochs on 4x A100s.

    Does anyone know why some releases only take 1 epoch to train and others take up to 10 epochs?

    • noneabove1182@sh.itjust.worksOPM
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      1 year ago

      It depends on the learning rate, typically it’s ideal and higher quality to learn really slowly over a lot of epochs but it’s cheaper and obviously faster to learn fast over fewer epochs

      Also the dataset size is important to consider

      • justynasty@lemmy.kya.moe
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        I was concerned that a large dataset with low sentence similarity may take longer to train. I’m not sure if my idea that novels take less time to train than a Q&A dataset with detailed answers is true: generic roleplay vs encyclopedic knowledge.

        Reading these datasets, I think these GPT3/4 conversations go into too much detail, and current (1-40B) language models cannot be trained in such detail. These conversations would be only useful for humans. But I might be wrong about training because I don’t have experience with 100B+ models, and how they scale down.