local-ai models install
. or by using the WebUI.
phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning. phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. Phi-4 is a 14B parameters, dense decoder-only Transformer model.
A model tailored to be conversational and execute function calls with LocalAI. This model is based on phi-4.
A model tailored to be conversational and execute function calls with LocalAI. This model is based on phi-4. This is the second iteration of https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.1 with added CoT (o1) capabilities from the marco-o1 dataset.
A model tailored to be conversational and execute function calls with LocalAI. This model is based on phi-4.
Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
This is an uncensored version of tiiuae/Falcon3-1B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
This is an uncensored version of tiiuae/Falcon3-3B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
This is an uncensored version of tiiuae/Falcon3-10B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
This is an uncensored version of tiiuae/Falcon3-7B-Instruct created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
Base model: (Falcon3-10B)
INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code. This is an instruct model. The base model associated with it is INTELLECT-1. INTELLECT-1 was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute. The training code utilizes the prime framework, a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers. The key abstraction that allows dynamic scaling is the ElasticDeviceMesh which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node. The model was trained using the DiLoCo algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
A direct replacement / successor to Euryale v2.2, not Hanami-x1, though it is slightly better than them in my opinion.
This model was created as I liked the storytelling of EVA but the prose and details of scenes from EURYALE, my goal is to merge the robust storytelling of both models while attempting to maintain the positives of both models.
It's a very balanced model between the L3.3 tunes. It's very creative, able to come up with new and interesting scenarios on your own that will thoroughly surprise you in ways that remind me of a 123B model. It has some of the most natural sounding dialogue and prose can come out of any model I've tried with the right swipe, in a way that truly brings your characters and RP to life that makes you feel like you're talking to a human writer instead of an AI - a quality that reminds me of Character AI in its prime. This model loves a great prompt and thrives off instructions.
Llama 3.3 instruct 70B 128k context with ablation technique applied for a more helpful (and based) assistant. This means it will refuse less of your valid requests for an uncensored UX. Use responsibly and use common sense. We do not take any responsibility for how you apply this intelligence, just as we do not for how you apply your own.
This model was created as I liked the storytelling of EVA, the prose and details of scenes from EURYALE and Anubis, my goal is to merge the robust storytelling of all three models while attempting to maintain the positives of the models.
You know the drill by now. Here is the paper. Have fun. https://docs.google.com/document/d/1OjbjU5AOz4Ftn9xHQrX3oFQGhQ6RDUuXQipnQ9gn6tU/edit?usp=sharing
- Same data composition as Freya, applied differently, trained longer too. - Merging with its checkpoints was also involved. - Has a nice style, with occasional issues that can be easily fixed. - A more stable version compared to previous runs.
- Strong Roleplay & Creative writing abilities. - Less positivity bias. - Very smart assistant with low refusals. - Exceptionally good at following the character card. - Characters feel more 'alive', and will occasionally initiate stuff on their own (without being prompted to, but fitting to their character). - Strong ability to comprehend and roleplay uncommon physical and mental characteristics.
Enjoyed SicariusSicariiStuff/Negative_LLAMA_70B but the prose was too dry for my tastes. So I merged it with TheDrummer/Anubis-70B-v1 for verbosity. Anubis has positivity bias so Negative could balance things out. This is a merge of pre-trained language models created using mergekit. The following models were included in the merge: SicariusSicariiStuff/Negative_LLAMA_70B TheDrummer/Anubis-70B-v1
This model was created as I liked the storytelling of EVA, the prose and details of scenes from EURYALE and Anubis, enhanced with Negative_LLAMA to kill off the positive bias with a touch of nemotron sprinkeled in. The choice to use the lorablated model as a base was intentional - while it might seem counterintuitive, this approach creates unique interactions between the weights, similar to what was achieved in the original Astoria model and Astoria V2 model . Rather than simply removing refusals, this "weight twisting" effect that occurs when subtracting the lorablated base model from the other models during the merge process creates an interesting balance in the final model's behavior. While this approach differs from traditional sequential application of components, it was chosen for its unique characteristics in the model's responses.
The Magnum v4 series is complete, but here's something a little extra I wanted to tack on as I wasn't entirely satisfied with the results of v4 72B. "SE" for Special Edition - this model is finetuned from meta-llama/Llama-3.3-70B-Instruct as an rsLoRA adapter. The dataset is a slightly revised variant of the v4 data with some elements of the v2 data re-introduced. The objective, as with the other Magnum models, is to emulate the prose style and quality of the Claude 3 Sonnet/Opus series of models on a local scale, so don't be surprised to see "Claude-isms" in its output.
A merge of some Llama 3.3 models because um uh yeah Went extra schizo on the recipe, hoping for an extra fun result, and... Well, I guess it's an overall improvement over the previous revision. It's a tiny bit smarter, has even more distinct swipes and nice dialogues, but for some reason it's damn sloppy. I've published the second step of this merge as a separate model, and I'd say the results are more interesting, but not as usable as this one. https://huggingface.co/Nohobby/AbominationSnowPig Prompt format: Llama3 OR Llama3 Context and ChatML Instruct. It actually works a bit better this way
RWKV (pronounced RwaKuv) is an RNN with GPT-level LLM performance, and can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7. So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free, and a Linux Foundation AI project.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
The following models were included in the merge: BenevolenceMessiah/Qwen2.5-Coder-7B-Chat-Instruct-TIES-v1.2 MadeAgents/Hammer2.0-7b huihui-ai/Qwen2.5-Coder-7B-Instruct-abliterated
This is an uncensored version of Qwen2.5-Coder-7B-Instruct created with abliteration (see this article to know more about it). Special thanks to @FailSpy for the original code and technique. Please follow him if you're interested in abliterated models.
Rombos-Coder-V2.5-Qwen-7b is a continues finetuned version of Qwen2.5-Coder-7B-Instruct. I took it upon myself to merge the instruct model with the base model myself using the * Ties* merge method as demonstrated in my own "Continuous Finetuning" method (link available). This version of the model shows higher performance than the original instruct and base models.
Rombos-Coder-V2.5-Qwen-32b is a continues finetuned version of Qwen2.5-Coder-32B-Instruct. I took it upon myself to merge the instruct model with the base model myself using the Ties merge method as demonstrated in my own "Continuous Finetuning" method (link available). This version of the model shows higher performance than the original instruct and base models.
Rombos-Coder-V2.5-Qwen-14b is a continues finetuned version of Qwen2.5-Coder-14B-Instruct. I took it upon myself to merge the instruct model with the base model myself using the Ties merge method as demonstrated in my own "Continuous Finetuning" method (link available). This version of the model shows higher performance than the original instruct and base models.
The LLM model is based on sloshywings/Qwen2.5-Coder-32B-Instruct-Uncensored. It is a large language model with 32B parameters that has been fine-tuned on coding tasks and instructions.
The model is a quantized version of infly/OpenCoder-8B-Base created using llama.cpp. It is part of the OpenCoder LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The original OpenCoder model was pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.
The LLM model is QuantFactory/OpenCoder-8B-Instruct-GGUF, which is a quantized version of infly/OpenCoder-8B-Instruct. It is created using llama.cpp and supports both English and Chinese languages. The original model, infly/OpenCoder-8B-Instruct, is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks and is one of the leading open-source models for code.
The model is a large language model with 1.5 billion parameters, trained on 2.5 trillion tokens of code-related data. It supports both English and Chinese languages and is part of the OpenCoder LLM family which also includes 8B base and chat models. The model achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.
The model is a quantized version of [infly/OpenCoder-1.5B-Instruct](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) created using llama.cpp. The original model, infly/OpenCoder-1.5B-Instruct, is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages. The model is pretrained on 2.5 trillion tokens composed of 90% raw code and 10% code-related web data, and supervised finetuned on over 4.5M high-quality SFT examples. It achieves high performance across multiple language model benchmarks, positioning it among the leading open-source models for code.
Granite 3.0 language models are a new set of lightweight state-of-the-art, open foundation models that natively support multilinguality, coding, reasoning, and tool usage, including the potential to be run on constrained compute resources. All the models are publicly released under an Apache 2.0 license for both research and commercial use. The models' data curation and training procedure were designed for enterprise usage and customization in mind, with a process that evaluates datasets for governance, risk and compliance (GRC) criteria, in addition to IBM's standard data clearance process and document quality checks. Granite 3.0 includes 4 different models of varying sizes: Dense Models: 2B and 8B parameter models, trained on 12 trillion tokens in total. Mixture-of-Expert (MoE) Models: Sparse 1B and 3B MoE models, with 400M and 800M activated parameters respectively, trained on 10 trillion tokens in total. Accordingly, these options provide a range of models with different compute requirements to choose from, with appropriate trade-offs with their performance on downstream tasks. At each scale, we release a base model โ checkpoints of models after pretraining, as well as instruct checkpoints โ models finetuned for dialogue, instruction-following, helpfulness, and safety.
A roleplay-centric finetune of IBM's Granite 3.0 3B-A800M. LoRA finetune trained locally, whereas the others were FFT; while this results in less uptake of training data, it should also mean less degradation in Granite's core abilities, making it potentially easier to use for general-purpose tasks. Disclaimer PLEASE do not expect godliness out of this, it's a model with 800 million active parameters. Expect something more akin to GPT-3 (the original, not GPT-3.5.) (Furthermore, this version is by a less experienced tuner; it's my first finetune that actually has decent-looking graphs, I don't really know what I'm doing yet!)
A finetune of OLMoE by AllenAI designed for roleplaying (and maybe general usecases if you try hard enough). PLEASE do not expect godliness out of this, it's a model with 1 billion active parameters. Expect something more akin to Gemma 2 2B, not Llama 3 8B.
Transformer-based decoder-only language model that has been pre-trained on 7.8 trillion tokens of highly curated data. The pre-training corpus contains text in 35 European languages and code. Salamandra comes in three different sizes โ 2B, 7B and 40B parameters โ with their respective base and instruction-tuned variants. This model card corresponds to the 7B instructed version.
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Small but Smart Fine-Tuned on Vast dataset of Conversations. Able to Generate Human like text with high performance within its size. It is Very Versatile when compared for it's size and Parameters and offers capability almost as good as Llama 3.1 8B Instruct.
Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.
Esper 2 is a DevOps and cloud architecture code specialist built on Llama 3.2 3b. It is an AI assistant focused on AWS, Azure, GCP, Terraform, Dockerfiles, pipelines, shell scripts and more, with real world problem solving and high quality code instruct performance within the Llama 3.2 Instruct chat format. Finetuned on synthetic DevOps-instruct and code-instruct data generated with Llama 3.1 405b and supplemented with generalist chat data.
The model is a quantized version of EpistemeAI/Llama-3.2-3B-Agent007, developed by EpistemeAI and fine-tuned from unsloth/llama-3.2-3b-instruct-bnb-4bit. It was trained 2x faster with Unsloth and Huggingface's TRL library. Fine tuned with Agent datasets.
The Llama-3.2-3B-Agent007-Coder-GGUF is a quantized version of the EpistemeAI/Llama-3.2-3B-Agent007-Coder model, which is a fine-tuned version of the unsloth/llama-3.2-3b-instruct-bnb-4bit model. It is created using llama.cpp and trained with additional datasets such as the Agent dataset, Code Alpaca 20K, and magpie ultra 0.1. This model is optimized for multilingual dialogue use cases and agentic retrieval and summarization tasks. The model is available for commercial and research use in multiple languages and is best used with the transformers library.
The LLM model is a quantized version of EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO, which is an experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be an agentic coder. It has some built-in agent features such as search, calculator, and ReAct. Other noticeable features include self-learning using unsloth, RAG applications, and memory. The context window of the model is 128K. It can be integrated into projects using popular libraries like Transformers and vLLM. The model is suitable for use with Langchain or LLamaIndex. The model is developed by EpistemeAI and licensed under the Apache 2.0 license.
Small parameter LLMs are ideal for navigating the complexities of the Japanese language, which involves multiple character systems like kanji, hiragana, and katakana, along with subtle social cues. Despite their smaller size, these models are capable of delivering highly accurate and context-aware results, making them perfect for use in environments where resources are constrained. Whether deployed on mobile devices with limited processing power or in edge computing scenarios where fast, real-time responses are needed, these models strike the perfect balance between performance and efficiency, without sacrificing quality or speed.
Lyte/Llama-3.2-3B-Reasoning-Time is a large language model with 3.2 billion parameters, designed for reasoning and time-based tasks in English. It is based on the Llama architecture and has been quantized using the GGUF format by mradermacher.
Base Model Llama 3.2 1B Extended Size 1B to 2.5B parameters Extension Method Proprietary technique developed by MedIT Solutions Fine-tuning Open (or open subsets allowing for commercial use) open datasets from HF Open (or open subsets allowing for commercial use) SFT datasets from HF Training Status Current version: chat-1.0.0 Key Features Built on Llama 3.2 architecture Expanded from 1B to 2.47B parameters Optimized for open-ended conversations Incorporates supervised fine-tuning for improved performance Use Case General conversation and task-oriented interactions
This is an uncensored version of the original Llama-3.2-3B-Instruct, created using mlabonne's script, which builds on FailSpy's notebook and the original work from Andy Arditi et al..
This model is an advanced iteration of the powerful meta-llama/Llama-3.2-3B, specifically fine-tuned to enhance its capabilities in French Legal domain.
This model is an advanced iteration of the powerful meta-llama/Llama-3.2-3B, specifically fine-tuned to enhance its capabilities in French Legal domain.
This model is an advanced iteration of the powerful meta-llama/Llama-3.2-3B, specifically fine-tuned to enhance its capabilities in French Legal domain.
ValiantLabs/Llama3.2-3B-Enigma is an Enigma model built on Llama 3.2 3b. It is a high-quality code-instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated using Llama 3.1 405b and supplemented with generalist synthetic data. This model is suitable for both code-instruct and general chat applications.
Shining Valiant 2 is a chat model built on Llama 3.2 3b, finetuned on our data for friendship, insight, knowledge and enthusiasm. Finetuned on meta-llama/Llama-3.2-3B-Instruct for best available general performance Trained on a variety of high quality data; focused on science, engineering, technical knowledge, and structured reasoning Also available for Llama 3.1 70b and Llama 3.1 8b! Version This is the 2024-09-27 release of Shining Valiant 2 for Llama 3.2 3b.
The Llama-Doctor-3.2-3B-Instruct model is designed for text generation tasks, particularly in contexts where instruction-following capabilities are needed. This model is a fine-tuned version of the base Llama-3.2-3B-Instruct model and is optimized for understanding and responding to user-provided instructions or prompts. The model has been trained on a specialized dataset, avaliev/chat_doctor, to enhance its performance in providing conversational or advisory responses, especially in medical or technical fields.
This model is a fine-tuned version of LLaMA 3.2-3B, optimized using LoRA (Low-Rank Adaptation) on the NVIDIA ChatQA-Training-Data. It is tailored for conversational AI, question answering, and other instruction-following tasks, with support for sequences up to 1024 tokens.
The Llama-Sentient-3.2-3B-Instruct model is a fine-tuned version of the Llama-3.2-3B-Instruct model, optimized for text generation tasks, particularly where instruction-following abilities are critical. This model is trained on the mlabonne/lmsys-arena-human-preference-55k-sharegpt dataset, which enhances its performance in conversational and advisory contexts, making it suitable for a wide range of applications.
The Llama-SmolTalk-3.2-1B-Instruct model is a lightweight, instruction-tuned model designed for efficient text generation and conversational AI tasks. With a 1B parameter architecture, this model strikes a balance between performance and resource efficiency, making it ideal for applications requiring concise, contextually relevant outputs. The model has been fine-tuned to deliver robust instruction-following capabilities, catering to both structured and open-ended queries. Key Features: Instruction-Tuned Performance: Optimized to understand and execute user-provided instructions across diverse domains. Lightweight Architecture: With just 1 billion parameters, the model provides efficient computation and storage without compromising output quality. Versatile Use Cases: Suitable for tasks like content generation, conversational interfaces, and basic problem-solving. Intended Applications: Conversational AI: Engage users with dynamic and contextually aware dialogue. Content Generation: Produce summaries, explanations, or other creative text outputs efficiently. Instruction Execution: Follow user commands to generate precise and relevant responses.
We present FuseChat-3.0, a series of models crafted to enhance performance by integrating the strengths of multiple source LLMs into more compact target LLMs. To achieve this fusion, we utilized four powerful source LLMs: Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct. For the target LLMs, we employed three widely-used smaller modelsโLlama-3.1-8B-Instruct, Gemma-2-9B-It, and Qwen-2.5-7B-Instructโalong with two even more compact modelsโLlama-3.2-3B-Instruct and Llama-3.2-1B-Instruct. The implicit model fusion process involves a two-stage training pipeline comprising Supervised Fine-Tuning (SFT) to mitigate distribution discrepancies between target and source LLMs, and Direct Preference Optimization (DPO) for learning preferences from multiple source LLMs. The resulting FuseChat-3.0 models demonstrated substantial improvements in tasks related to general conversation, instruction following, mathematics, and coding. Notably, when Llama-3.1-8B-Instruct served as the target LLM, our fusion approach achieved an average improvement of 6.8 points across 14 benchmarks. Moreover, it showed significant improvements of 37.1 and 30.1 points on instruction-following test sets AlpacaEval-2 and Arena-Hard respectively. We have released the FuseChat-3.0 models on Huggingface, stay tuned for the forthcoming dataset and code.
The Llama-Song-Stream-3B-Instruct is a fine-tuned language model specializing in generating music-related text, such as song lyrics, compositions, and musical thoughts. Built upon the meta-llama/Llama-3.2-3B-Instruct base, it has been trained with a custom dataset focused on song lyrics and music compositions to produce context-aware, creative, and stylized music output.
Llama-Chat-Summary-3.2-3B is a fine-tuned model designed for generating context-aware summaries of long conversational or text-based inputs. Built on the meta-llama/Llama-3.2-3B-Instruct foundation, this model is optimized to process structured and unstructured conversational data for summarization tasks.
FastLlama is a highly optimized version of the Llama-3.2-1B-Instruct model. Designed for superior performance in constrained environments, it combines speed, compactness, and high accuracy. This version has been fine-tuned using the MetaMathQA-50k section of the HuggingFaceTB/smoltalk dataset to enhance its mathematical reasoning and problem-solving abilities.
The Codepy 3B Deep Think Model is a fine-tuned version of the meta-llama/Llama-3.2-3B-Instruct base model, designed for text generation tasks that require deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing. With its robust natural language processing capabilities, Codepy 3B Deep Think excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
The Llama-Deepsync-3B-GGUF is a fine-tuned version of the Llama-3.2-3B-Instruct base model, designed for text generation tasks that require deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing.
Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. Dolphin aims to be a general purpose model, similar to the models behind ChatGPT, Claude, Gemini. But these models present problems for businesses seeking to include AI in their products. They maintain control of the system prompt, deprecating and changing things as they wish, often causing software to break. They maintain control of the model versions, sometimes changing things silently, or deprecating older models that your business relies on. They maintain control of the alignment, and in particular the alignment is one-size-fits all, not tailored to the application. They can see all your queries and they can potentially use that data in ways you wouldn't want. Dolphin, in contrast, is steerable and gives control to the system owner. You set the system prompt. You decide the alignment. You have control of your data. Dolphin does not impose its ethics or guidelines on you. You are the one who decides the guidelines. Dolphin belongs to YOU, it is your tool, an extension of your will. Just as you are personally responsible for what you do with a knife, gun, fire, car, or the internet, you are the creator and originator of any content you generate with Dolphin.
Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. Dolphin aims to be a general purpose model, similar to the models behind ChatGPT, Claude, Gemini. But these models present problems for businesses seeking to include AI in their products. They maintain control of the system prompt, deprecating and changing things as they wish, often causing software to break. They maintain control of the model versions, sometimes changing things silently, or deprecating older models that your business relies on. They maintain control of the alignment, and in particular the alignment is one-size-fits all, not tailored to the application. They can see all your queries and they can potentially use that data in ways you wouldn't want. Dolphin, in contrast, is steerable and gives control to the system owner. You set the system prompt. You decide the alignment. You have control of your data. Dolphin does not impose its ethics or guidelines on you. You are the one who decides the guidelines. Dolphin belongs to YOU, it is your tool, an extension of your will. Just as you are personally responsible for what you do with a knife, gun, fire, car, or the internet, you are the creator and originator of any content you generate with Dolphin.
This is the newer checkpoint of MiniThinky-1B-Llama-3.2 (version 1), which the loss decreased from 0.7 to 0.5
This is a continual-pre-training of Llama-3.2-3B on a mix of ๐ FineMath (our new high quality math dataset) and FineWeb-Edu. The model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks. It was trained on 160B tokens using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use nanotron for the training, and you can find the training scripts in our SmolLM2 GitHub repo.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
In August 2024, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. A month later, we have upgraded it and open-sourced Qwen2.5-Math series, including base models Qwen2.5-Math-1.5B/7B/72B, instruction-tuned models Qwen2.5-Math-1.5B/7B/72B-Instruct, and mathematical reward model Qwen2.5-Math-RM-72B. Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT. The base models of Qwen2-Math are initialized with Qwen2-1.5B/7B/72B, and then pretrained on a meticulously designed Mathematics-specific Corpus. This corpus contains large-scale high-quality mathematical web texts, books, codes, exam questions, and mathematical pre-training data synthesized by Qwen2.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Uncensored qwen2.5
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). For Qwen2.5-Coder, we release three base language models and instruction-tuned language models, 1.5, 7 and 32 (coming soon) billion parameters. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.
In August 2024, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. A month later, we have upgraded it and open-sourced Qwen2.5-Math series, including base models Qwen2.5-Math-1.5B/7B/72B, instruction-tuned models Qwen2.5-Math-1.5B/7B/72B-Instruct, and mathematical reward model Qwen2.5-Math-RM-72B. Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
BigQwen2.5-52B-Instruct is a Qwen/Qwen2-32B-Instruct self-merge made with MergeKit. It applies the mlabonne/Meta-Llama-3-120B-Instruct recipe.
Replete-LLM-V2.5-Qwen-14b is a continues finetuned version of Qwen2.5-14B. I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it. So I took it upon myself to merge the instruct model with the base model myself using the Ties merge method This version of the model shows higher performance than the original instruct and base models.
Replete-LLM-V2.5-Qwen-7b is a continues finetuned version of Qwen2.5-14B. I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it. So I took it upon myself to merge the instruct model with the base model myself using the Ties merge method This version of the model shows higher performance than the original instruct and base models.
This model is a fine-tuned version of the powerful Qwen/Qwen2.5-72B-Instruct, pushing the boundaries of natural language understanding and generation even further. My goal was to create a versatile and robust model that excels across a wide range of benchmarks and real-world applications. Use Cases This model is suitable for a wide range of applications, including but not limited to: Advanced question-answering systems Intelligent chatbots and virtual assistants Content generation and summarization Code generation and analysis Complex problem-solving and decision support
Trained for roleplay uses.
Rombos-LLM-V2.5.1-Qwen-3b is a little experiment that merges a high-quality LLM, arcee-ai/raspberry-3B, using the last step of the Continuous Finetuning method outlined in a Google document. The merge is done using the mergekit with the following parameters: - Models: Qwen2.5-3B-Instruct, raspberry-3B - Merge method: ties - Base model: Qwen2.5-3B - Parameters: weight=1, density=1, normalize=true, int8_mask=true - Dtype: bfloat16 The model has been evaluated on various tasks and datasets, and the results are available on the Open LLM Leaderboard. The model has shown promising performance across different benchmarks.
Qwen 2.5 fine-tuned on CoT to match o1 performance. An attempt to build an Open o1 mathcing OpenAI o1 model Demo: https://huggingface.co/spaces/happzy2633/open-o1
Arcee-SuperNova-Medius is a 14B parameter language model developed by Arcee.ai, built on the Qwen2.5-14B-Instruct architecture. This unique model is the result of a cross-architecture distillation pipeline, combining knowledge from both the Qwen2.5-72B-Instruct model and the Llama-3.1-405B-Instruct model. By leveraging the strengths of these two distinct architectures, SuperNova-Medius achieves high-quality instruction-following and complex reasoning capabilities in a mid-sized, resource-efficient form. SuperNova-Medius is designed to excel in a variety of business use cases, including customer support, content creation, and technical assistance, while maintaining compatibility with smaller hardware configurations. Itโs an ideal solution for organizations looking for advanced capabilities without the high resource requirements of larger models like our SuperNova-70B.
A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
CursorCore is a series of open-source models designed for AI-assisted programming. It aims to support features such as automated editing and inline chat, replicating the core abilities of closed-source AI-assisted programming tools like Cursor. This is achieved by aligning data generated through Programming-Instruct. Please read our paper to learn more.
CursorCore is a series of open-source models designed for AI-assisted programming. It aims to support features such as automated editing and inline chat, replicating the core abilities of closed-source AI-assisted programming tools like Cursor. This is achieved by aligning data generated through Programming-Instruct. Please read our paper to learn more.
EdgeRunner-Command-Nested is an advanced large language model designed specifically for handling complex nested function calls. Initialized from Qwen2.5-7B-Instruct, further enhanced by the integration of the Hermes function call template and additional training on a specialized dataset (based on TinyAgent). This extra dataset focuses on personal domain applications, providing the model with a robust understanding of nested function scenarios that are typical in complex user interactions.
TSUNAMI: Transformative Semantic Understanding and Natural Augmentation Model for Intelligence. TSUNAMI full name was created by ChatGPT. infomation Tsunami-0.5x-7B-Instruct is Thai Large Language Model that fine-tuned from Qwen2.5-7B around 100,000 rows in Thai dataset.
This model was merged using the TIES merge method using Qwen/Qwen2.5-7B as a base. The following models were included in the merge: c10x/CoT-2.5 EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1 huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2 Cran-May/T.E-8.1
Meissa is designated Lambda Orionis, forms Orion's head, and is a multiple star with a combined apparent magnitude of 3.33. Its name means the "shining one". This model is fine tuned over writing and role playing datasets (maybe the first on qwen2.5-7b), aiming to enhance model's performance in novel writing and roleplaying. The model is fine-tuned over Orion-zhen/Qwen2.5-7B-Instruct-Uncensored
This model is an experimental version of our latest innovation: MGS. Its up to you to figure out what does it means, but its very explicit. We didn't applied our known UNA algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique. Updated tokenizer_config.json (from the base_model) Regenerated Quants (being uploaded) Re-submitted Leaderboard Evaluation, MATH & IFeval have relevant updates Aligned LICENSE with Qwen terms. MGS stands for... Many-Geeks-Searching... and thats it. Hint: 1+1 is 2, and 1+1 is not 3 We still believe on 1-Epoch should be enough, so we just did 1 Epoch only. Dataset Used here the first decent (corpora & size) dataset on the hub: Magpie-Align/Magpie-Pro-300K-Filtered Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate. It achieves the following results on the evaluation set: Loss: 0.5378 (1 Epoch), outperforming the baseline model.
Arcee Meraj Mini is a quantized version of the Meraj-Mini model, created using llama.cpp. It is an open-source model that is fine-tuned from the Qwen2.5-7B-Instruct model and is designed for both Arabic and English languages. The model has undergone evaluations across multiple benchmarks in both languages and demonstrates top-tier performance in Arabic and competitive results in English. The key stages in its development include data preparation, initial training, iterative training and post-training, evaluation, and final model creation. The model is capable of solving a wide range of language tasks and is suitable for various applications such as education, mathematics and coding, customer service, and content creation. The Arcee Meraj Mini model consistently outperforms state-of-the-art models on most benchmarks of the Open Arabic LLM Leaderboard (OALL), highlighting its improvements and effectiveness in Arabic language content.
Model stock merge for fun. This model was merged using the Model Stock merge method using rombodawg/Rombos-LLM-V2.5-Qwen-72b as a base. The following models were included in the merge: - anthracite-org/magnum-v4-72b - AXCXEPT/EZO-Qwen2.5-72B-Instruct
WhiteRabbitNeo is a model series that can be used for offensive and defensive cybersecurity. Models are now getting released as a public preview of its capabilities, and also to assess the societal impact of such an AI.
Here we use our novel approach called MGS. Its up to you to figure out what it means. Cybertron V4 went thru SFT over Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1
Q25-1.5B-Veo Lu is a tiny General-Purpose Creative model, made up of a merge of bespoke finetunes on Qwen 2.5-1.5B-Instruct. Inspired by the success of MN-12B-Mag Mell and MS-Meadowlark-22B, Veo Lu was trained on a healthy, balanced diet of of Internet fiction, roleplaying, adventuring, and reasoning/general knowledge. The components of Veo Lu are: Bard (pretrain, writing): Fujin (Cleaned/extended Rosier) Scribe (pretrain, roleplay): Creative Writing Multiturn Cartographer (pretrain, adventuring): SpringDragon Alchemist (SFT, science/reasoning): ScienceQA, MedquadQA, Orca Math Word Problems This model is capable of carrying on a scene without going completely off the rails. That being said, it only has 1.5B parameters. So please, for the love of God, manage your expectations. Since it's Qwen, use ChatML formatting. Turn the temperature down to ~0.7-0.8 and try a dash of rep-pen.
The following models were included in the merge: rombodawg/Rombos-LLM-V2.5-Qwen-72b abacusai/Dracarys2-72B-Instruct EVA-UNIT-01/EVA-Qwen2.5-72B-v0.0 ZeusLabs/Chronos-Platinum-72B m8than/banana-2-b-72b
A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model. Version notes for 0.2: Now using the refined dataset from 32B 0.2. Major improvements in coherence, instruction following and long-context comprehension over 14B v0.1. Prompt format is ChatML.
The model is based on SuperNova-Medius (as the current best 14B model) with a 128k context with an emphasis on creativity, including NSFW and multi-turn conversations.
A continued pretrain of SuperNova-Medius on assorted short story data from the web. Supernova already had a nice prose, but diversifying it a bit definitely doesn't hurt. Also, finally a storywriter model with enough context for something more than a short story, that's also nice. It's a fair bit more temperamental than Gemma, but can be tamed with some sampling. Instruction following also stayed rather strong, so it works for both RP and storywriting, both in chat mode via back-and-forth co-writing and on raw completion. Overall, I'd say it successfully transfers the essence of what I liked about Gemma Sugarquill. I will also make a Qwen version of Aletheia, but with a brand new LoRA, based on a brand new RP dataset that's in the making right now. Model was trained by Auri.
This model is an advanced iteration of the powerful Qwen/Qwen2.5-3B, fine-tuned specifically to enhance its capabilities across general domains in both French and English.
This model is an advanced iteration of the powerful Qwen/Qwen2.5-3B, fine-tuned specifically to enhance its capabilities across general domains in both French and English.
This model is an advanced iteration of the powerful Qwen/Qwen2.5-3B, fine-tuned specifically to enhance its capabilities across general domains in both French and English.
This model is an advanced iteration of the powerful Qwen/Qwen2.5-3B, specifically fine-tuned to enhance its capabilities in French Legal domain.
This model is an advanced iteration of the powerful Qwen/Qwen2.5-3B, specifically fine-tuned to enhance its capabilities in French Legal domain.
This model is an advanced iteration of the powerful Qwen/Qwen2.5-3B, specifically fine-tuned to enhance its capabilities in French Legal domain.
A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model. Dedicated to Nev. Version notes for 0.1: Reprocessed dataset (via Cahvay for 32B 0.2, used here as well), readjusted training config for 8xH100 SXM. Significant improvements in instruction following, long context understanding and overall coherence over v0.0.
Yet Another merge, this one for AuriAetherwiing, at their request. This is a merge of pre-trained language models created using mergekit. The following models were included in the merge: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.1 v000000/Qwen2.5-Lumen-14B arcee-ai/SuperNova-Medius
RPMax is a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations. Many RPMax users mentioned that these models does not feel like any other RP models, having a different writing style and generally doesn't feel in-bred.
This model is my fist attempt at a 72b model as usual my goal is to merge the robust storytelling of mutiple models while attempting to maintain intelligence. Merge of: - model: EVA-UNIT-01/EVA-Qwen2.5-72B-v0.1 - model: ZeusLabs/Chronos-Platinum-72B - model: shuttleai/shuttle-3
Athene-V2-Agent is an open-source Agent LLM that surpasses the state-of-the-art in function calling and agentic capabilities. ๐ช Versatile Agent Capability: Athene-V2-Agent is an agent model, capable of operating in environments with deeply nested dependencies with the environment. It is capable of reasoning and doing planning for trajectories with many tool calls necessary to answer a single query. ๐ Performance Highlights: Athene-V2-Agent surpasses GPT-4o in single FC tasks by 18% in function calling success rates, and by 17% in Agentic success rates. ๐ง Generalization to the Unseen: Athene-V2-Agent has never been trained on the functions or agentic settings used in evaluation.
We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model. Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, Athene-V2-Agent-72B, surpasses GPT-4o in complex function calling and agentic applications.
Model created by analyzing and selecting the optimal layers from other Qwen2.5-7B models based on their dimensional utilization efficiency, measured by the Normalized Effective Rank (NER). Computed like: Input: Weight matrix for each model layer Compute singular values ฯแตข where ฯแตข โฅ 0 # ฯแตข represents the importance of each dimension Filter values above numerical threshold (>1e-12) Sum all singular values: S = ฮฃฯแตข # S acts as normalization factor Create probability distribution: pแตข = ฯแตข/S # converts singular values to probabilities summing to 1 Compute Shannon entropy: H = -ฮฃ(pแตข * logโ(pแตข)) # measures information content Calculate maximum possible entropy: H_max = logโ(n) Final NER score = H/H_max # normalizes score to [0,1] range Results in value between 0 and 1 for each model layer
This 72B parameter model is a merge of Nexusflow/Athene-V2-Chat with EVA-UNIT-01/EVA-Qwen2.5-72B-v0.1. See the merge recipe below for details. This model is uncensored. You are responsible for whatever you do with it. This model was designed for roleplaying and storytelling and I think it does well at both. It may also perform well at other tasks but I have not tested its performance in other areas.
Trained with Magpie-Align/Magpie-Pro-MT-300K-v0.1 Using MGS & UNA (MLP) on this tiny but powerful model.
This model was merged using the passthrough merge method using bunnycore/Qwen2.5-3B-RP-Mix + bunnycore/Qwen2.5-3b-Smart-lora_model as a base.
SteyrCannon-0.2 is an updated revision from the original SteyrCannon. This uses EVA-Qwen2.5-72B-v0.2. Nothing else has changed.This model was merged using the TIES merge method using EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 as a base. The following models were included in the merge: anthracite-org/magnum-v4-72b EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2
This model is my second attempt at a 72b model, as usual, my goal is to merge the robust storytelling of mutiple models while attempting to maintain intelligence. models: - model: EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 - model: ZeusLabs/Chronos-Platinum-72B - model: shuttleai/shuttle-3
A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model. Version notes for 0.2: Optimized training hyperparameters and increased sequence length. Better instruction following deeper into context and less repetition.
QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. As a preview release, it demonstrates promising analytical abilities while having several important limitations: Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity. Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.
Slush is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge. This is still early stage. As always, feedback is welcome, and begone if you demand perfection. The second stage, like the Sunfall series, follows the Silly Tavern preset (ChatML), so ymmv in particular if you use some other tool and/or preset.
This model was merged using the DARE TIES merge method using Qwen/Qwen2.5-14B as a base. The following models were included in the merge: allknowingroger/Qwenslerp2-14B rombodawg/Rombos-LLM-V2.6-Qwen-14b VAGOsolutions/SauerkrautLM-v2-14b-DPO CultriX/Qwen2.5-14B-Wernicke
A replication attempt of Tulu 3 on the Qwen 2.5 base models.
ZeroXClem/Qwen2.5-7B-HomerCreative-Mix is an advanced language model meticulously crafted by merging four pre-trained models using the powerful mergekit framework. This fusion leverages the Model Stock merge method to combine the creative prowess of Qandora, the instructive capabilities of Qwen-Instruct-Fusion, the sophisticated blending of HomerSlerp1, and the foundational conversational strengths of Homer-v0.5-Qwen2.5-7B. The resulting model excels in creative text generation, contextual understanding, and dynamic conversational interactions. ๐ Merged Models This model merge incorporates the following: bunnycore/Qandora-2.5-7B-Creative: Specializes in creative text generation, enhancing the model's ability to produce imaginative and diverse content. bunnycore/Qwen2.5-7B-Instruct-Fusion: Focuses on instruction-following capabilities, improving the model's performance in understanding and executing user commands. allknowingroger/HomerSlerp1-7B: Utilizes spherical linear interpolation (SLERP) to blend model weights smoothly, ensuring a harmonious integration of different model attributes. newsbang/Homer-v0.5-Qwen2.5-7B: Acts as the foundational conversational model, providing robust language comprehension and generation capabilities.
This model was merged using the TIES merge method using rombodawg/Rombos-LLM-V2.5-Qwen-7b as a base. Models Merged fblgit/cybertron-v4-qw7B-UNAMGS + bunnycore/Qwen-2.1-7b-Persona-lora_model fblgit/cybertron-v4-qw7B-MGS + bunnycore/Qwen-2.1-7b-Persona-lora_model
This model is currently ranked #1 on the Open LLM Leaderboard among models up to 13B parameters! Merge Method This model was merged using the SLERP merge method. Models Merged The following models were included in the merge: ZeroXClem/Qwen2.5-7B-HomerAnvita-NerdMix ZeroXClem/Qwen2.5-7B-HomerCreative-Mix
The Math IIO 7B Instruct is a fine-tuned language model based on the robust Qwen2.5-7B-Instruct architecture. This model has been specifically trained to excel in single-shot mathematical reasoning and instruction-based tasks, making it a reliable choice for educational, analytical, and problem-solving applications. Key Features: Math-Optimized Capabilities: The model is designed to handle complex mathematical problems, step-by-step calculations, and reasoning tasks. Instruction-Tuned: Fine-tuned for better adherence to structured queries and task-oriented prompts, enabling clear and concise outputs. Large Vocabulary: Equipped with an extensive tokenizer configuration and custom tokens to ensure precise mathematical notation support.
Virtuoso-Small is the debut public release of the Virtuoso series of models by Arcee.ai, designed to bring cutting-edge generative AI capabilities to organizations and developers in a compact, efficient form. With 14 billion parameters, Virtuoso-Small is an accessible entry point for high-quality instruction-following, complex reasoning, and business-oriented generative AI tasks.
ZeroXClem/Qwen2.5-7B-HomerAnvita-NerdMix is an advanced language model meticulously crafted by merging five pre-trained models using the powerful mergekit framework. This fusion leverages the Model Stock merge method to combine the creative prowess of Qandora, the instructive capabilities of Qwen-Instruct-Fusion, the sophisticated blending of HomerSlerp1, the mathematical precision of Cybertron-MGS, and the uncensored expertise of Qwen-Nerd. The resulting model excels in creative text generation, contextual understanding, technical reasoning, and dynamic conversational interactions.
This Qwen 2.5 model was trained 2x faster with Unsloth and Huggingface's TRL library. Fine-tuned it for 400 steps on garage-bAInd/Open-Platypus with a batch size of 3.
Sailor2 is a community-driven initiative that brings cutting-edge multilingual language models to South-East Asia (SEA). Our research highlights a strong demand for models in the 8B and 20B parameter range for production use, alongside 1B models for specialized applications, such as speculative decoding and research purposes. These models, released under the Apache 2.0 license, provide enhanced accessibility to advanced language technologies across the region. Sailor2 builds upon the foundation of the awesome multilingual model Qwen 2.5 and is continuously pre-trained on 500B tokens to support 15 languages better with a unified model. These languages include English, Chinese, Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray. By addressing the growing demand for diverse, robust, and accessible language models, Sailor2 seeks to serve the underserved in SEA areas with open, inclusive, and accessible multilingual LLMs. The Sailor2 model comes in three sizes, 1B, 8B, and 20B, which are expanded from the Qwen2.5 base models of 0.5B, 7B, and 14B, respectively.
Sailor2 is a community-driven initiative that brings cutting-edge multilingual language models to South-East Asia (SEA). Our research highlights a strong demand for models in the 8B and 20B parameter range for production use, alongside 1B models for specialized applications, such as speculative decoding and research purposes. These models, released under the Apache 2.0 license, provide enhanced accessibility to advanced language technologies across the region. Sailor2 builds upon the foundation of the awesome multilingual model Qwen 2.5 and is continuously pre-trained on 500B tokens to support 15 languages better with a unified model. These languages include English, Chinese, Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray. By addressing the growing demand for diverse, robust, and accessible language models, Sailor2 seeks to serve the underserved in SEA areas with open, inclusive, and accessible multilingual LLMs. The Sailor2 model comes in three sizes, 1B, 8B, and 20B, which are expanded from the Qwen2.5 base models of 0.5B, 7B, and 14B, respectively.
Sailor2 is a community-driven initiative that brings cutting-edge multilingual language models to South-East Asia (SEA). Our research highlights a strong demand for models in the 8B and 20B parameter range for production use, alongside 1B models for specialized applications, such as speculative decoding and research purposes. These models, released under the Apache 2.0 license, provide enhanced accessibility to advanced language technologies across the region. Sailor2 builds upon the foundation of the awesome multilingual model Qwen 2.5 and is continuously pre-trained on 500B tokens to support 15 languages better with a unified model. These languages include English, Chinese, Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray. By addressing the growing demand for diverse, robust, and accessible language models, Sailor2 seeks to serve the underserved in SEA areas with open, inclusive, and accessible multilingual LLMs. The Sailor2 model comes in three sizes, 1B, 8B, and 20B, which are expanded from the Qwen2.5 base models of 0.5B, 7B, and 14B, respectively.
I do not really have anything planned for this model other than it being a generalist, and Roleplay Model? It was just something made and planned in minutes. Same with the 14 and 32B version. Kunou's the name of an OC I worked on for a couple of years, for a... fanfic. mmm... A kind-of successor to L3-70B-Euryale-v2.2 in all but name? I'm keeping Stheno/Euryale lineage to Llama series for now. I had a version made on top of Nemotron, a supposed Euryale 2.4 but that flopped hard, it was not my cup of tea. This version is basically a better, more cleaned up Dataset used on Euryale and Stheno.
This 72B parameter model is a merge of sophosympatheia/Evathene-v1.1 and sophosympatheia/Evathene-v1.2. See the merge recipe below for details.
We present FuseChat-3.0, a series of models crafted to enhance performance by integrating the strengths of multiple source LLMs into more compact target LLMs. To achieve this fusion, we utilized four powerful source LLMs: Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct. For the target LLMs, we employed three widely-used smaller modelsโLlama-3.1-8B-Instruct, Gemma-2-9B-It, and Qwen-2.5-7B-Instructโalong with two even more compact modelsโLlama-3.2-3B-Instruct and Llama-3.2-1B-Instruct. The implicit model fusion process involves a two-stage training pipeline comprising Supervised Fine-Tuning (SFT) to mitigate distribution discrepancies between target and source LLMs, and Direct Preference Optimization (DPO) for learning preferences from multiple source LLMs. The resulting FuseChat-3.0 models demonstrated substantial improvements in tasks related to general conversation, instruction following, mathematics, and coding. Notably, when Llama-3.1-8B-Instruct served as the target LLM, our fusion approach achieved an average improvement of 6.8 points across 14 benchmarks. Moreover, it showed significant improvements of 37.1 and 30.1 points on instruction-following test sets AlpacaEval-2 and Arena-Hard respectively. We have released the FuseChat-3.0 models on Huggingface, stay tuned for the forthcoming dataset and code.
The Neumind-Math-7B-Instruct is a fine-tuned model based on Qwen2.5-7B-Instruct, optimized for mathematical reasoning, step-by-step problem-solving, and instruction-based tasks in the mathematics domain. The model is designed for applications requiring structured reasoning, numerical computations, and mathematical proof generation.
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation. Key Enhancements: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
RP/Story hybrid model, merge of Sugarquill and Neon. As with Gemma version, I wanted to preserve Sugarquill's creative spark, while making the model more steerable for RP. It proved to be more difficult this time, but I quite like the result regardless, even if the model is still somewhat temperamental. Should work for both RP and storywriting, either on raw completion or with back-and-forth cowriting in chat mode. Seems to be quite sensitive to low depth instructions and samplers. Thanks to Toasty and Fizz for testing and giving feedback Model was created by Auri.
RP finetune of Supernova-Medius. Turned out surprisingly nice on it's own, I honestly made it only as a merge fuel, but it impressed me and Prodeus enough to release it separately (history repeats I guess, Sugarquill also started out this way). Quite interesting prose, definitely quite distinct from Supernova or EVA for that matter. Instruction following is decent as well. Not really much to say about this one, just a decent RP model, tbh. Euryale-inspired I guess.
miscii-14b-1028 is a 14-billion parameter language model based on the Qwen2.5-14B-Instruct model. It is designed for chat and conversational AI tasks, with a focus on role-based instructions.
The following models were included in the merge: sthenno/exp-002 sthenno/miscii-1218
Qwentile 2.5 32B Instruct is a normalized denoised fourier interpolation of the following models: - { "model": "AiCloser/Qwen2.5-32B-AGI", "base": "Qwen/Qwen2.5-32B", "alpha": 0.3 } - { "model": "EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2", "base": "Qwen/Qwen2.5-32B", "alpha": 0.7 } - { "model": "fblgit/TheBeagle-v2beta-32B-MGS", "base": "Qwen/Qwen2.5-32B", "alpha": 0.6 } - { "model": "huihui-ai/Qwen2.5-32B-Instruct-abliterated", "base": "Qwen/Qwen2.5-32B-Instruct", "alpha": 1.0 } - { "model": "huihui-ai/QwQ-32B-Preview-abliterated", "base": "Qwen/Qwen2.5-32B", "alpha": 1.0 } - { "model": "Qwen/QwQ-32B-Preview", "base": "Qwen/Qwen2.5-32B", "alpha": 0.8, "is_input": true } - { "model": "rombodawg/Rombos-LLM-V2.5-Qwen-32b", "base": "Qwen/Qwen2.5-32B", "alpha": 1.0, "is_output": true } - { "model": "nbeerbower/Qwen2.5-Gutenberg-Doppel-32B", "base": "Qwen/Qwen2.5-32B-Instruct", "alpha": 0.4 } I started my experiment because of QwQ is a really nifty model, but it was giving me problems with xml output - which is what I use for my thought tokens. So, I thought... lets just merge it in! The first model worked pretty well, but I got a sense that the balances could be tweaked. Why not throw in some other models as well for fun and see if I can't run out of disk space in the process?
The Katanemo Arch-Function collection of large language models (LLMs) is a collection state-of-the-art (SOTA) LLMs specifically designed for function calling tasks. The models are designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. Achieving performance on par with GPT-4, these models set a new benchmark in the domain of function-oriented tasks, making them suitable for scenarios where automated API interaction and function execution is crucial. In summary, the Katanemo Arch-Function collection demonstrates: State-of-the-art performance in function calling Accurate parameter identification and suggestion, even in ambiguous or incomplete inputs High generalization across multiple function calling use cases, from API interactions to automated backend tasks. Optimized low-latency, high-throughput performance, making it suitable for real-time, production environments.
The Katanemo Arch-Function collection of large language models (LLMs) is a collection state-of-the-art (SOTA) LLMs specifically designed for function calling tasks. The models are designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. Achieving performance on par with GPT-4, these models set a new benchmark in the domain of function-oriented tasks, making them suitable for scenarios where automated API interaction and function execution is crucial. In summary, the Katanemo Arch-Function collection demonstrates: State-of-the-art performance in function calling Accurate parameter identification and suggestion, even in ambiguous or incomplete inputs High generalization across multiple function calling use cases, from API interactions to automated backend tasks. Optimized low-latency, high-throughput performance, making it suitable for real-time, production environments.
The Katanemo Arch-Function collection of large language models (LLMs) is a collection state-of-the-art (SOTA) LLMs specifically designed for function calling tasks. The models are designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. Achieving performance on par with GPT-4, these models set a new benchmark in the domain of function-oriented tasks, making them suitable for scenarios where automated API interaction and function execution is crucial. In summary, the Katanemo Arch-Function collection demonstrates: State-of-the-art performance in function calling Accurate parameter identification and suggestion, even in ambiguous or incomplete inputs High generalization across multiple function calling use cases, from API interactions to automated backend tasks. Optimized low-latency, high-throughput performance, making it suitable for real-time, production environments.
Multilingual Qwen2-7B model trained on Roleplaying.
The QwQ-LCoT-7B-Instruct is a fine-tuned language model designed for advanced reasoning and instruction-following tasks. It leverages the Qwen2.5-7B base model and has been fine-tuned on the amphora/QwQ-LongCoT-130K dataset, focusing on chain-of-thought (CoT) reasoning.
There is a draft model to go with this one for speculative decoding and chain of thought reasoning: https://huggingface.co/nisten/qwen2.5-coder-7b-abliterated-128k-AWQ Using the above 4bit 7b in conjuction with the 36b is meant to setup a chain-of-thought reasoner, evaluator similar to what O1-O3 is probably doing. This way the 7b 4bit only uses up an extra 4-6Gb on the gpu, but greatly both speeds up speculative decoding AND also chain-of-throught evals.
QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark, showcasing QVQ's powerful ability in multidisciplinary understanding and reasoning. Furthermore, the significant improvements on MathVision highlight the model's progress in mathematical reasoning tasks. OlympiadBench also demonstrates the model's enhanced ability to tackle challenging problems.
A roleplay-focused LoRA finetune of Teleut 7b. Methodology and hyperparams inspired by SorcererLM and Slush. Dataset: The worst mix of data you've ever seen. Like, seriously, you do not want to see the things that went into this model. It's bad.
A roleplay-focused LoRA finetune of Qwen 2.5 32b Instruct. Methodology and hyperparams inspired by SorcererLM and Slush. Yet another model in the Ink series, following in the footsteps of the Nemo one
The following models were included in the merge:s huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2 allura-org/TQ2.5-14B-Aletheia-v1 EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 v000000/Qwen2.5-Lumen-14B
SmallThinker is designed for the following use cases: Edge Deployment: Its small size makes it ideal for deployment on resource-constrained devices. Draft Model for QwQ-32B-Preview: SmallThinker can serve as a fast and efficient draft model for the larger QwQ-32B-Preview model. From my test, in llama.cpp we can get 70% speedup (from 40 tokens/s to 70 tokens/s).
The following models were included in the merge: Kaoeiri/Qwenwify-32B-v3 allura-org/Qwen2.5-32b-RP-Ink Dans-DiscountModels/Qwen2.5-32B-ChatML Saxo/Linkbricks-Horizon-AI-Japanese-Base-32B OpenBuddy/openbuddy-qwq-32b-v24.2-200k Sao10K/32B-Qwen2.5-Kunou-v1
In this work, we introduce DRT-o1, an attempt to bring the success of long thought reasoning to neural machine translation (MT). To this end, ๐ We mine English sentences with similes or metaphors from existing literature books, which are suitable for translation via long thought. ๐ We propose a designed multi-agent framework with three agents (i.e., a translator, an advisor and an evaluator) to synthesize the MT samples with long thought. There are 22,264 synthesized samples in total. ๐ We train DRT-o1-8B, DRT-o1-7B and DRT-o1-14B using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones. Our goal is not to achieve competitive performance with OpenAIโs O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction.
This model is designed to improve the controllability and consistency of current roleplaying models. We developed a story flow thought chain approach that makes the system prompts combined with the entire user-BOT dialogue read like a first-person narrative told by the BOT. We found this design greatly enhances the model's consistency and expressiveness. Additionally, we allow users to play two roles simultaneously: one as the director of the entire plot (see Special Designs), and another as an actor dialoguing with the BOT. Users can be viewed as writers who need to draft outlines and plot summaries, while the BOT helps complete story details, requiring users to have powerful control over the BOT. The model's output is divided into two parts: the model's inner monologue (which it believes is invisible to users) and the final response. Overall, mirau features: Superior character consistency Powerful long-context memory capability Transparent thinking with hidden thought chains
I do not really have anything planned for this model other than it being a generalist, and Roleplay Model? It was just something made and planned in minutes. Same with the 14B and 72B version. Kunou's the name of an OC I worked on for a couple of years, for a... fanfic. mmm... A kind-of successor to L3-70B-Euryale-v2.2 in all but name? I'm keeping Stheno/Euryale lineage to Llama series for now. I had a version made on top of Nemotron, a supposed Euryale 2.4 but that flopped hard, it was not my cup of tea. This version is basically a better, more cleaned up Dataset used on Euryale and Stheno.
I do not really have anything planned for this model other than it being a generalist, and Roleplay Model? It was just something made and planned in minutes. This is the little sister variant, the small 14B version. Kunou's the name of an OC I worked on for a couple of years, for a... fanfic. mmm... A kind-of successor to my smaller model series. It works pretty nicely I think? This version is basically a better, more cleaned up Dataset used on Euryale and Stheno.
Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. Dolphin aims to be a general purpose model, similar to the models behind ChatGPT, Claude, Gemini. But these models present problems for businesses seeking to include AI in their products. They maintain control of the system prompt, deprecating and changing things as they wish, often causing software to break. They maintain control of the model versions, sometimes changing things silently, or deprecating older models that your business relies on. They maintain control of the alignment, and in particular the alignment is one-size-fits all, not tailored to the application. They can see all your queries and they can potentially use that data in ways you wouldn't want. Dolphin, in contrast, is steerable and gives control to the system owner. You set the system prompt. You decide the alignment. You have control of your data. Dolphin does not impose its ethics or guidelines on you. You are the one who decides the guidelines. Dolphin belongs to YOU, it is your tool, an extension of your will. Just as you are personally responsible for what you do with a knife, gun, fire, car, or the internet, you are the creator and originator of any content you generate with Dolphin.
Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. Dolphin aims to be a general purpose model, similar to the models behind ChatGPT, Claude, Gemini. But these models present problems for businesses seeking to include AI in their products. They maintain control of the system prompt, deprecating and changing things as they wish, often causing software to break. They maintain control of the model versions, sometimes changing things silently, or deprecating older models that your business relies on. They maintain control of the alignment, and in particular the alignment is one-size-fits all, not tailored to the application. They can see all your queries and they can potentially use that data in ways you wouldn't want. Dolphin, in contrast, is steerable and gives control to the system owner. You set the system prompt. You decide the alignment. You have control of your data. Dolphin does not impose its ethics or guidelines on you. You are the one who decides the guidelines. Dolphin belongs to YOU, it is your tool, an extension of your will. Just as you are personally responsible for what you do with a knife, gun, fire, car, or the internet, you are the creator and originator of any content you generate with Dolphin.
Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases. Dolphin aims to be a general purpose model, similar to the models behind ChatGPT, Claude, Gemini. But these models present problems for businesses seeking to include AI in their products. They maintain control of the system prompt, deprecating and changing things as they wish, often causing software to break. They maintain control of the model versions, sometimes changing things silently, or deprecating older models that your business relies on. They maintain control of the alignment, and in particular the alignment is one-size-fits all, not tailored to the application. They can see all your queries and they can potentially use that data in ways you wouldn't want. Dolphin, in contrast, is steerable and gives control to the system owner. You set the system prompt. You decide the alignment. You have control of your data. Dolphin does not impose its ethics or guidelines on you. You are the one who decides the guidelines. Dolphin belongs to YOU, it is your tool, an extension of your will. Just as you are personally responsible for what you do with a knife, gun, fire, car, or the internet, you are the creator and originator of any content you generate with Dolphin.
I decided to mess around with training methods again, considering the re-emegence of methods like multi-step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methology but done it my way. Freya-S1 LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model. Cleaned text and literature as best as I could, still, may have had issues here and there. Freya-S2 The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that. Reduced LoRA rank because it's mainly instruct and other details I won't get into.
HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response. For more information, visit our GitHub repository: https://github.com/FreedomIntelligence/HuatuoGPT-o1.