Mistral 7B model

rufus@discuss.tchncs.de · edit-2 1 year ago

Mistral 7B model

justynasty@lemmy.kya.moe · edit-2 1 year ago

I like Qwen, but their GGML format is not yet compatible with llama.cpp (issue #3337). Mistral seems to be less lively and coherent. But. Their source code is super readable, I prefer theirs to llama.

The uploaded GGUF doesn’t seem to take advantage of the sliding window (issue #3371), yet it works somehow. The inferencing speed shows no improvement in the quantized formats. I haven’t benchmarked the raw model.

DeciLM also highlighted the complexities of the llama model back then. Mistral changed GQA to SWA, another twist.

Are there any fine tuning attempts for Mistral?

Edit:

Mistral Claude Multiround Chat

Mistral Pygmalion Mix

Tiny Mistral and MistralForCausalLM in the config file o.O

justynasty@lemmy.kya.moe · 1 year ago

I was going to write a long post about it, but I will leave it here.

GPT-J -> Llama -> Llama2 -> Mistral and other Asian models in the timeline.

What’s the common between them? Many popular, specialized, highly appraised finetuned models have no smaller, downloadable dataset.

I see many mixed models as well, which are not reproducible with the new (truly) open-source models, since their creators are no longer active on HF.

Training with terabytes of data is not feasible for many.

Most of the datasets on huggingface: roleplay, sci-fi and instruction datasets this, this, this are not tagged.

How to find them?

It appears that the HF API does not return the related datasets of a model, making it impossible to track all of these with a web crawler. There are undocumented endpoints for text generation and conversational dataset currently used in all models.

GGUF model datasets

Modern datasets: OpenOrca, Open-Platypus, dolphin, claude_multiround_chat_30k, oasst1, PIPPA, wizard_vicuna_70k_unfiltered, orca_mini_v1_dataset and that’s it.

h3ndrik@feddit.de · edit-2 1 year ago

deleted by creator

rufus@discuss.tchncs.de · edit-2 1 year ago

Thanks. Those were my words. Maybe I got a bit too excited. I thought I’d read the entire paper later and find out what kind of dataset and how many tokens they used for training.

Turns out there is no paper or model card. At least I couldn’t find one. I’m going to edit my post.

A bit strange for a company with a claimed business model to ‘distrubute open-source models’.

People already filed issues: https://github.com/mistralai/mistral-src/issues/9 or https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/8

[Edit: The link from the github issue is also interesting regarding the ‘open source’ AI: https://opening-up-chatgpt.github.io/ ]

rufus@discuss.tchncs.de · edit-2 1 year ago

Guess we’re going to see what happens. Judging by their careful wording “driving the AI revolution by developing OPEN-WEIGHT models that are on par with proprietary solutions” I’m afraid they did that on purpose to mislead people and really mean open-weight and not open-source. Seems that’s just the careless interpretation of the journalists/reporters and people like me who should learn not to mix facts and own conclusions. I’m going to follow the progress. Hope they will answer the questions.

Edit: And judging by what I read on their discord, opening their tuning process is not gonna happen. :-(

justynasty@lemmy.kya.moe · edit-2 1 year ago

I’ve been trying to train this new model, but there are still questions to be answered. What you can do with the new RotatingBufferCache and why training takes up so much memory. Only the inferencing code is available.

I don’t see how the open source community will benefit from the code until their sliding window is clarified for the use of model training.

At least two non-llama papers were written this year, Pythia and Phi, which have already the tools available. I can take any of these models’ weights and put them in another open model, and then continue training there.

My small llama models have already been replaced by Mistral as an end-user. However, this feels like switching from Windows to macOS. Having fine-tunes for open models, which we currently lack, would imply that we rely on the community rather than corporate releases.

That Opening up ChatGPT link was helpful, I haven’t seen elsewhere.

micheal65536@lemmy.micheal65536.duckdns.org · 1 year ago

IMO the availability of the dataset is less important than the model, especially if the model is under a license that allows fairly unrestricted use.

Datasets aren’t useful to most people and carry more risk of a lawsuit or being ripped off by a competitor than the model. Publishing a dataset with copyrighted content is legally grey at best, while the verdict is still out regarding a model trained on that dataset and the model also carries with it some short-term plausible deniability.

rufus@discuss.tchncs.de · edit-2 1 year ago

Depends a bit on what we’re talking about. And ‘old’ concepts what ‘open-source’ means for software don’t apply 1:1 to ML models.

Sure, you’re right. Letting people use it without restrictions is great. But thinking about it, it smells more like a well-made marketing stunt. They’re giving away one free 7B model that’s making headlines to advertise for their capabilities. It’s a freebie. Really useful to us, no doubt. But it’s made to get us hooked and we’re probably not getting the following things without restrictions.

And that’s my main criticism. We, the people, get the breadcrumbs of a hundreds of millions of dollars minimum industry. We’re never going to emancipate ourselves, because they’re keeping the datasets to themselves and also the hardware is prohibitively expensive for everyone without commercial interest.

But you’re completely right. Even if they wanted to share the dataset with the world (which Mistral AI doesn’t) they couldn’t do it. Because currently there’s just no way to do it legally. (except for in Japan ;-)

I hope the whole picture isn’t as pessimistic as I’m painting it here. We’re probably getting more stuff and there is competition and other factors at play. Also I’m sure we’re eventually getting legislation that works better. But still. I’m always a bit uneasy when being at the mercy of generous multi million dollar companies.

And a few practical limitations. I don’t know how many trillion tokens they used to train it, which languages it speaks and we won’t be able to learn things from the training process for science and the next paper. We’re limited to benchmarks to learn things.

micheal65536@lemmy.micheal65536.duckdns.org · 1 year ago

To be honest, the same could be said of LLaMa/Facebook (which doesn’t particularly claim to be “open”, but I don’t see many people criticising Facebook for doing a potential future marketing “bait and switch” with their LLMs).

They’re only giving these away for free because they aren’t commercially viable. If anyone actually develops a leading-edge LLM, I doubt they will be giving it away for free regardless of their prior “ethics”.

And the chance of a leading-edge LLM being developed by someone other than a company with prior plans to market it commercially is quite small, as they wouldn’t attract the same funding to cover the development costs.

rufus@discuss.tchncs.de · 1 year ago

I think the critizism on Meta (LLaMa licensing) has just dialed down a bit. In the days of LLaMA 1 I read quite a few “f*** Meta” and people had zero respect for their licensing. They even spent quite some money to train an open version with the RedPajama dataset and wanted to break free from Meta.

Meta also uses the words “open”, “open science” and even “open source” for their models. But I think they mean yet another thing with that. And in reality they have stopped providing the exact sources starting with their paper on Llama2(?!)

I still hate that nowadays everyone invents their own license. I mean once your dependencies all have distinct and incompatible licensing, you can’t incorporate anything into your project any more. The free software world works by incremental improvements and combining stuff. This is very difficult without proper free licenses. And furthermore, no one likes their “Acceptable Use Policy”.

I didn’t mean “bait and switch”. I think I didn’t find the right words. I mean we won’t ever build real scientific advancements upon this, because that process is a trade secret. The big companies and AI startups will do the science behind closed doors and decide for us in which direction AI develops.

And the “commercially viable” is exactly the point. Now, they still can affort to give things away for free. A Llama2 is still far away from being a viable product in itself. But once smartphones/computers/edge-devices have 12GB of fast(er) memory and AI acellerators, AI gets more intelligent, hallucinates less and gets adapters for specific tasks and multimodal capabilities, you have a viable product you can tie into your ecosystem and sell millions of times. And that’s where I expect their gifts to stop. I will still have my chatbot / AI companion. But not the smart assistant that organizes my everyday-life, translates between arbitraty languages on the fly and helps me with whatever I take a picture of, or record with my phone.

I think that’s my main point. And for me it has already started a long time ago. I own a de-googled smartphone. I struggle with simple things like having a TTS that gives me directions while driving (in my native language). Because TTS is part of the proprietary Google services. The camera is significantly worse without all the enhancement that is clever trickery and machine learning. Again, part of the proprietary parts and a trade secret. I expect other parts of machine learning to become worse, too.

justynasty@lemmy.kya.moe · 1 year ago

I also have a de-googled smartphone, with a firewall installed (without a jailbrake). My name doesn’t show up on Google. I use generic usernames, not unique ones. I don’t upload photographs of my relatives to the cloud, as services acquire fingerprint (hash) of their faces and extract metadata from the uploaded jpegs. …and I’m not hiding from anyone, I don’t like the unremovable (unforgettable) traces we leave here.

translates between arbitraty languages on the fly

That’s what Firefox has in its browser now. :D desktop version…

hallucinates less and gets adapters for specific tasks and multimodal capabilities

People will have less time to talk to other people because they’ll exchange pics with their favorite agent. xd

And that’s where I expect their gifts to stop. I will still have my chatbot / AI companion.

There are already services that charge for ML tasks. “You want a calendar notification from AI?” - pay more.

“You want to summarize your daily emails” - pay double, save more.

“You want to talk to your friend, who is asleep.” - talk to a virtual AI character, that looks and sounds like your friend. It even remembers your past conversations! /s

rufus@discuss.tchncs.de · edit-2 1 year ago

I just found that interview with Zuckerberg from a few days ago:

https://youtu.be/9aCg7jH4S1w?feature=shared&t=1238

Starting at 23:00 Zuckerberg talks about “open-sourcing” Llama.

rufus@discuss.tchncs.de · edit-2 1 year ago

Thanks for paying close attention. I just threw kobold.cpp at it and was amazed by the speed of a 7B model on my old PC ;-) Let it complete a few stories and asked the instruct-tuned variant about llamas and other facts… Somehow missed that there are still things missing. My tests for simple and short texts seemed fine.

Another thing I somehow completely missed is the release of Qwen. This is funded by Alibaba? I need to read up on it.

Regarding the fine-tuning attempts… Idk. My personal opinion is: I’m going to be patient and see. Things are always moving fast and the community (not the researchers) sometimes do silly stuff. And most of the tools are probably focused on Llama as of now. So it’ll probably take more than a few hours to see decent results. But I’m sure the community will have a try. Especially if it turns out the performance is really as good or better than Llama 2.