MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
![MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs](/content/images/size/w1200/2023/05/6454669e10b0b051b6a393a6_Frame-1--12-.png)
Jonathan Frankle, Chief Scientist @MosaicML, just announced the latest entry in the MosaicML Foundation Series: MPT-7B
MPT is here! Check out our shiny new LLMs, open-source w/commercial license. The base MPT-7B model is 7B params trained on 1T tokens and reaches LLaMA-7B quality. We also created Instruct (commercial), Chat, and (my favorite) StoryWriter-65k+ variants. 🧵 https://t.co/VONP1TK8ez
— Jonathan Frankle (@jefrankle) May 5, 2023
There's a lot to absorb about this one. Mosaic trained this model from scratch on 1 trillion tokens, at a cost of $200,000 taking 9.5 days. It's Apache-2.0 licensed and the model weights are available today.
They're accompanying the base model with an instruction-tuned model called MPT-7B-Instruct (licensed for commercial use) and a non-commercially licensed MPT-7B-Chat trained using OpenAI data. They also announced MPT-7B-StoryWriter-65k+ - "a model designed to read and write stories with super long context lengths" - with a previously unheard of 65,000 token context length.
They're releasing these models mainly to demonstrate how inexpensive and powerful their custom model training service is. It's a very convincing demo!