This release is trained on a curated filtered subset of most of our GPT-4 augmented data.

HF Leaderboard evals place this model as #2 for all models smaller than 30B at release time, outperforming all but one 13B model.

GGUF files:

Mistral-7B-OpenOrca-GGUF

Warning (if I’m not mistaken):

Llama.cpp hasn’t assigned high priority tag to the sliding window. Axolotl replaced Mistral’s attention block by a “simple” flash attention.

That implies, in my opinion, that the new releases do not capitalize on the speedup claimed by Mistral developers.

We can’t expect the new versions to be faster than Llama, because there is no sliding attention to speed up inference.

  • Mara@pawb.social
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    What would an ideal prompt for summarization look like with this model? I’ve tried a few summarization prompts but they haven’t panned out into something consistent (MacBook Pro M2 Max, llama.cpp, q4_S). I know this is fundamentally more random technology, but it’s not even coalescing into a consistently relevant output.