LocalAI model gallery list


🖼️ Available 428 models

Refer to the Model gallery for more information on how to use the models with LocalAI.
You can install models with the CLI command local-ai models install . or by using the WebUI.

moe-girl-1ba-7bt-i1
moe-girl-1ba-7bt-i1

A finetune of OLMoE by AllenAI designed for roleplaying (and maybe general usecases if you try hard enough). PLEASE do not expect godliness out of this, it's a model with 1 billion active parameters. Expect something more akin to Gemma 2 2B, not Llama 3 8B.

salamandra-7b-instruct
salamandra-7b-instruct

Transformer-based decoder-only language model that has been pre-trained on 7.8 trillion tokens of highly curated data. The pre-training corpus contains text in 35 European languages and code. Salamandra comes in three different sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants. This model card corresponds to the 7B instructed version.

llama-3.2-1b-instruct:q4_k_m
llama-3.2-1b-instruct:q4_k_m

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama-3.2-3b-instruct:q4_k_m
llama-3.2-3b-instruct:q4_k_m

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama-3.2-3b-instruct:q8_0
llama-3.2-3b-instruct:q8_0

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama-3.2-1b-instruct:q8_0
llama-3.2-1b-instruct:q8_0

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. Model Developer: Meta Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

versatillama-llama-3.2-3b-instruct-abliterated
versatillama-llama-3.2-3b-instruct-abliterated

Small but Smart Fine-Tuned on Vast dataset of Conversations. Able to Generate Human like text with high performance within its size. It is Very Versatile when compared for it's size and Parameters and offers capability almost as good as Llama 3.1 8B Instruct.

llama3.2-3b-enigma
llama3.2-3b-enigma

Enigma is a code-instruct model built on Llama 3.2 3b. It is a high quality code instruct model with the Llama 3.2 Instruct chat format. The model is finetuned on synthetic code-instruct data generated with Llama 3.1 405b and supplemented with generalist synthetic data. It uses the Llama 3.2 Instruct prompt format.

llama3.2-3b-esper2
llama3.2-3b-esper2

Esper 2 is a DevOps and cloud architecture code specialist built on Llama 3.2 3b. It is an AI assistant focused on AWS, Azure, GCP, Terraform, Dockerfiles, pipelines, shell scripts and more, with real world problem solving and high quality code instruct performance within the Llama 3.2 Instruct chat format. Finetuned on synthetic DevOps-instruct and code-instruct data generated with Llama 3.1 405b and supplemented with generalist chat data.

llama-3.2-3b-agent007
llama-3.2-3b-agent007

The model is a quantized version of EpistemeAI/Llama-3.2-3B-Agent007, developed by EpistemeAI and fine-tuned from unsloth/llama-3.2-3b-instruct-bnb-4bit. It was trained 2x faster with Unsloth and Huggingface's TRL library. Fine tuned with Agent datasets.

llama-3.2-3b-agent007-coder
llama-3.2-3b-agent007-coder

The Llama-3.2-3B-Agent007-Coder-GGUF is a quantized version of the EpistemeAI/Llama-3.2-3B-Agent007-Coder model, which is a fine-tuned version of the unsloth/llama-3.2-3b-instruct-bnb-4bit model. It is created using llama.cpp and trained with additional datasets such as the Agent dataset, Code Alpaca 20K, and magpie ultra 0.1. This model is optimized for multilingual dialogue use cases and agentic retrieval and summarization tasks. The model is available for commercial and research use in multiple languages and is best used with the transformers library.

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo
fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

The LLM model is a quantized version of EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO, which is an experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be an agentic coder. It has some built-in agent features such as search, calculator, and ReAct. Other noticeable features include self-learning using unsloth, RAG applications, and memory. The context window of the model is 128K. It can be integrated into projects using popular libraries like Transformers and vLLM. The model is suitable for use with Langchain or LLamaIndex. The model is developed by EpistemeAI and licensed under the Apache 2.0 license.

qwen2.5-14b-instruct
qwen2.5-14b-instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

qwen2.5-math-7b-instruct
qwen2.5-math-7b-instruct

In August 2024, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. A month later, we have upgraded it and open-sourced Qwen2.5-Math series, including base models Qwen2.5-Math-1.5B/7B/72B, instruction-tuned models Qwen2.5-Math-1.5B/7B/72B-Instruct, and mathematical reward model Qwen2.5-Math-RM-72B. Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT. The base models of Qwen2-Math are initialized with Qwen2-1.5B/7B/72B, and then pretrained on a meticulously designed Mathematics-specific Corpus. This corpus contains large-scale high-quality mathematical web texts, books, codes, exam questions, and mathematical pre-training data synthesized by Qwen2.

qwen2.5-14b_uncencored
qwen2.5-14b_uncencored

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Uncensored qwen2.5

qwen2.5-coder-7b-instruct
qwen2.5-coder-7b-instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). For Qwen2.5-Coder, we release three base language models and instruction-tuned language models, 1.5, 7 and 32 (coming soon) billion parameters. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.

qwen2.5-math-72b-instruct
qwen2.5-math-72b-instruct

In August 2024, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. A month later, we have upgraded it and open-sourced Qwen2.5-Math series, including base models Qwen2.5-Math-1.5B/7B/72B, instruction-tuned models Qwen2.5-Math-1.5B/7B/72B-Instruct, and mathematical reward model Qwen2.5-Math-RM-72B. Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT

qwen2.5-0.5b-instruct
qwen2.5-0.5b-instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

qwen2.5-1.5b-instruct
qwen2.5-1.5b-instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

qwen2.5-32b
qwen2.5-32b

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

qwen2.5-32b-instruct
qwen2.5-32b-instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

qwen2.5-72b-instruct
qwen2.5-72b-instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.

bigqwen2.5-52b-instruct
bigqwen2.5-52b-instruct

BigQwen2.5-52B-Instruct is a Qwen/Qwen2-32B-Instruct self-merge made with MergeKit. It applies the mlabonne/Meta-Llama-3-120B-Instruct recipe.

replete-llm-v2.5-qwen-14b
replete-llm-v2.5-qwen-14b

Replete-LLM-V2.5-Qwen-14b is a continues finetuned version of Qwen2.5-14B. I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it. So I took it upon myself to merge the instruct model with the base model myself using the Ties merge method This version of the model shows higher performance than the original instruct and base models.

replete-llm-v2.5-qwen-7b
replete-llm-v2.5-qwen-7b

Replete-LLM-V2.5-Qwen-7b is a continues finetuned version of Qwen2.5-14B. I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it. So I took it upon myself to merge the instruct model with the base model myself using the Ties merge method This version of the model shows higher performance than the original instruct and base models.

calme-2.2-qwen2.5-72b-i1
calme-2.2-qwen2.5-72b-i1

This model is a fine-tuned version of the powerful Qwen/Qwen2.5-72B-Instruct, pushing the boundaries of natural language understanding and generation even further. My goal was to create a versatile and robust model that excels across a wide range of benchmarks and real-world applications. Use Cases This model is suitable for a wide range of applications, including but not limited to: Advanced question-answering systems Intelligent chatbots and virtual assistants Content generation and summarization Code generation and analysis Complex problem-solving and decision support

t.e-8.1-iq-imatrix-request
t.e-8.1-iq-imatrix-request

Trained for roleplay uses.

rombos-llm-v2.5.1-qwen-3b
rombos-llm-v2.5.1-qwen-3b

Rombos-LLM-V2.5.1-Qwen-3b is a little experiment that merges a high-quality LLM, arcee-ai/raspberry-3B, using the last step of the Continuous Finetuning method outlined in a Google document. The merge is done using the mergekit with the following parameters: - Models: Qwen2.5-3B-Instruct, raspberry-3B - Merge method: ties - Base model: Qwen2.5-3B - Parameters: weight=1, density=1, normalize=true, int8_mask=true - Dtype: bfloat16 The model has been evaluated on various tasks and datasets, and the results are available on the Open LLM Leaderboard. The model has shown promising performance across different benchmarks.

qwen2.5-7b-ins-v3
qwen2.5-7b-ins-v3

Qwen 2.5 fine-tuned on CoT to match o1 performance. An attempt to build an Open o1 mathcing OpenAI o1 model Demo: https://huggingface.co/spaces/happzy2633/open-o1

arch-function-1.5b
arch-function-1.5b

The Katanemo Arch-Function collection of large language models (LLMs) is a collection state-of-the-art (SOTA) LLMs specifically designed for function calling tasks. The models are designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. Achieving performance on par with GPT-4, these models set a new benchmark in the domain of function-oriented tasks, making them suitable for scenarios where automated API interaction and function execution is crucial. In summary, the Katanemo Arch-Function collection demonstrates: State-of-the-art performance in function calling Accurate parameter identification and suggestion, even in ambiguous or incomplete inputs High generalization across multiple function calling use cases, from API interactions to automated backend tasks. Optimized low-latency, high-throughput performance, making it suitable for real-time, production environments.

arch-function-7b
arch-function-7b

The Katanemo Arch-Function collection of large language models (LLMs) is a collection state-of-the-art (SOTA) LLMs specifically designed for function calling tasks. The models are designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. Achieving performance on par with GPT-4, these models set a new benchmark in the domain of function-oriented tasks, making them suitable for scenarios where automated API interaction and function execution is crucial. In summary, the Katanemo Arch-Function collection demonstrates: State-of-the-art performance in function calling Accurate parameter identification and suggestion, even in ambiguous or incomplete inputs High generalization across multiple function calling use cases, from API interactions to automated backend tasks. Optimized low-latency, high-throughput performance, making it suitable for real-time, production environments.

arch-function-3b
arch-function-3b

The Katanemo Arch-Function collection of large language models (LLMs) is a collection state-of-the-art (SOTA) LLMs specifically designed for function calling tasks. The models are designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. Achieving performance on par with GPT-4, these models set a new benchmark in the domain of function-oriented tasks, making them suitable for scenarios where automated API interaction and function execution is crucial. In summary, the Katanemo Arch-Function collection demonstrates: State-of-the-art performance in function calling Accurate parameter identification and suggestion, even in ambiguous or incomplete inputs High generalization across multiple function calling use cases, from API interactions to automated backend tasks. Optimized low-latency, high-throughput performance, making it suitable for real-time, production environments.

smollm-1.7b-instruct
smollm-1.7b-instruct

SmolLM is a series of small language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are pre-trained on SmolLM-Corpus, a curated collection of high-quality educational and synthetic data designed for training LLMs. For further details, we refer to our blogpost. To build SmolLM-Instruct, we finetuned the base models on publicly available datasets.

meta-llama-3.1-8b-instruct
meta-llama-3.1-8b-instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Model developer: Meta Model Architecture: Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

meta-llama-3.1-70b-instruct
meta-llama-3.1-70b-instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Model developer: Meta Model Architecture: Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

meta-llama-3.1-8b-instruct:grammar-functioncall
meta-llama-3.1-8b-instruct:grammar-functioncall

This is the standard Llama 3.1 8B Instruct model with grammar and function call enabled. When grammars are enabled in LocalAI, the LLM is forced to output valid tools constrained by BNF grammars. This can be useful for ensuring that the model outputs are valid and can be used in a production environment. For more information on how to use grammars in LocalAI, see https://localai.io/features/openai-functions/#advanced and https://localai.io/features/constrained_grammars/.

meta-llama-3.1-8b-instruct:Q8_grammar-functioncall
meta-llama-3.1-8b-instruct:Q8_grammar-functioncall

This is the standard Llama 3.1 8B Instruct model with grammar and function call enabled. When grammars are enabled in LocalAI, the LLM is forced to output valid tools constrained by BNF grammars. This can be useful for ensuring that the model outputs are valid and can be used in a production environment. For more information on how to use grammars in LocalAI, see https://localai.io/features/openai-functions/#advanced and https://localai.io/features/constrained_grammars/.

meta-llama-3.1-8b-claude-imat
meta-llama-3.1-8b-claude-imat

Meta-Llama-3.1-8B-Claude-iMat-GGUF: Quantized from Meta-Llama-3.1-8B-Claude fp16. Weighted quantizations were creating using fp16 GGUF and groups_merged.txt in 88 chunks and n_ctx=512. Static fp16 will also be included in repo. For a brief rundown of iMatrix quant performance, please see this PR. All quants are verified working prior to uploading to repo for your safety and convenience.

meta-llama-3.1-8b-instruct-abliterated
meta-llama-3.1-8b-instruct-abliterated

This is an uncensored version of Llama 3.1 8B Instruct created with abliteration.

llama-3.1-70b-japanese-instruct-2407
llama-3.1-70b-japanese-instruct-2407

The Llama-3.1-70B-Japanese-Instruct-2407-gguf model is a Japanese language model that uses the Instruct prompt tuning method. It is based on the LLaMa-3.1-70B model and has been fine-tuned on the imatrix dataset for Japanese. The model is trained to generate informative and coherent responses to given instructions or prompts. It is available in the gguf format and can be used for a variety of tasks such as question answering, text generation, and more.

openbuddy-llama3.1-8b-v22.1-131k
openbuddy-llama3.1-8b-v22.1-131k

OpenBuddy - Open Multilingual Chatbot

llama3.1-8b-fireplace2
llama3.1-8b-fireplace2

Fireplace 2 is a chat model, adding helpful structured outputs to Llama 3.1 8b Instruct. an expansion pack of supplementary outputs - request them at will within your chat: Inline function calls SQL queries JSON objects Data visualization with matplotlib Mix normal chat and structured outputs within the same conversation. Fireplace 2 supplements the existing strengths of Llama 3.1, providing inline capabilities within the Llama 3 Instruct format. Version This is the 2024-07-23 release of Fireplace 2 for Llama 3.1 8b. We're excited to bring further upgrades and releases to Fireplace 2 in the future. Help us and recommend Fireplace 2 to your friends!

sekhmet_aleph-l3.1-8b-v0.1-i1
sekhmet_aleph-l3.1-8b-v0.1-i1

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Model developer: Meta Model Architecture: Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

l3.1-8b-llamoutcast-i1
l3.1-8b-llamoutcast-i1

Warning: this model is utterly cursed. Llamoutcast This model was originally intended to be a DADA finetune of Llama-3.1-8B-Instruct but the results were unsatisfactory. So it received some additional finetuning on a rawtext dataset and now it is utterly cursed. It responds to Llama-3 Instruct formatting.

llama-guard-3-8b
llama-guard-3-8b

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.

genius-llama3.1-i1
genius-llama3.1-i1

Finetuned Llama-3.1 base on Lex Fridman's podcast transcript.

llama3.1-8b-chinese-chat
llama3.1-8b-chinese-chat

llama3.1-8B-Chinese-Chat is an instruction-tuned language model for Chinese & English users with various abilities such as roleplaying & tool-using built upon the Meta-Llama-3.1-8B-Instruct model. Developers: [Shenzhi Wang](https://shenzhi-wang.netlify.app)*, [Yaowei Zheng](https://github.com/hiyouga)*, Guoyin Wang (in.ai), Shiji Song, Gao Huang. (*: Equal Contribution) - License: [Llama-3.1 License](https://huggingface.co/meta-llama/Meta-Llla... m-3.1-8B/blob/main/LICENSE) - Base Model: Meta-Llama-3.1-8B-Instruct - Model Size: 8.03B - Context length: 128K(reported by [Meta-Llama-3.1-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), untested for our Chinese model)

llama3.1-70b-chinese-chat
llama3.1-70b-chinese-chat

"Llama3.1-70B-Chinese-Chat" is a 70-billion parameter large language model pre-trained on a large corpus of Chinese text data. It is designed for chat and dialog applications, and can generate human-like responses to various prompts and inputs. The model is based on the Llama3.1 architecture and has been fine-tuned for Chinese language understanding and generation. It can be used for a wide range of natural language processing tasks, including language translation, text summarization, question answering, and more.

meta-llama-3.1-instruct-9.99b-brainstorm-10x-form-3
meta-llama-3.1-instruct-9.99b-brainstorm-10x-form-3

The Meta-Llama-3.1-8B Instruct model is a large language model trained on a diverse range of text data, with the goal of generating high-quality and coherent text in response to user input. This model is enhanced through a process called "Brainstorm", which involves expanding and recalibrating the model's reasoning center to improve its creative and generative capabilities. The resulting model is capable of generating detailed, vivid, and nuanced text, with a focus on prose quality, conceptually complex responses, and a deeper understanding of the user's intent. The Brainstorm process is designed to enhance the model's performance in creative writing, roleplaying, and story generation, and to improve its ability to generate coherent and engaging text in a wide range of contexts. The model is based on the Llama3 architecture and has been fine-tuned using the Instruct framework, which provides it with a strong foundation for understanding natural language instructions and generating appropriate responses. The model can be used for a variety of tasks, including creative writing,Generating coherent and detailed text, exploring different perspectives and scenarios, and brainstorming ideas.

llama-3.1-techne-rp-8b-v1
llama-3.1-techne-rp-8b-v1

athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit was further trained in the order below: SFT Doctor-Shotgun/no-robots-sharegpt grimulkan/LimaRP-augmented Inv/c2-logs-cleaned-deslopped DPO jondurbin/truthy-dpo-v0.1 Undi95/Weyaxi-humanish-dpo-project-noemoji athirdpath/DPO_Pairs-Roleplay-Llama3-NSFW

llama-spark
llama-spark

Llama-Spark is a powerful conversational AI model developed by Arcee.ai. It's built on the foundation of Llama-3.1-8B and merges the power of our Tome Dataset with Llama-3.1-8B-Instruct, resulting in a remarkable conversationalist that punches well above its 8B parameter weight class.

l3.1-70b-glitz-v0.2-i1
l3.1-70b-glitz-v0.2-i1

this is an experimental l3.1 70b finetuning run... that crashed midway through. however, the results are still interesting, so i wanted to publish them :3

calme-2.3-legalkit-8b-i1
calme-2.3-legalkit-8b-i1

This model is an advanced iteration of the powerful meta-llama/Meta-Llama-3.1-8B-Instruct, specifically fine-tuned to enhance its capabilities in the legal domain. The fine-tuning process utilized a synthetically generated dataset derived from the French LegalKit, a comprehensive legal language resource. To create this specialized dataset, I used the NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO model in conjunction with Hugging Face's Inference Endpoint. This approach allowed for the generation of high-quality, synthetic data that incorporates Chain of Thought (CoT) and advanced reasoning in its responses. The resulting model combines the robust foundation of Llama-3.1-8B with tailored legal knowledge and enhanced reasoning capabilities. This makes it particularly well-suited for tasks requiring in-depth legal analysis, interpretation, and application of French legal concepts.

fireball-llama-3.11-8b-v1orpo
fireball-llama-3.11-8b-v1orpo

Developed by: EpistemeAI License: apache-2.0 Finetuned from model : unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit Finetuned methods: DPO (Direct Preference Optimization) & ORPO (Odds Ratio Preference Optimization)

llama-3.1-storm-8b-q4_k_m
llama-3.1-storm-8b-q4_k_m

We present the Llama-3.1-Storm-8B model that outperforms Meta AI's Llama-3.1-8B-Instruct and Hermes-3-Llama-3.1-8B models significantly across diverse benchmarks as shown in the performance comparison plot in the next section. Our approach consists of three key steps: - Self-Curation: We applied two self-curation methods to select approximately 1 million high-quality examples from a pool of about 3 million open-source examples. Our curation criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models (e.g. 70B, 405B). - Targeted fine-tuning: We performed Spectrum-based targeted fine-tuning over the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. In our work, 50% of layers are frozen. - Model Merging: We merged our fine-tuned model with the Llama-Spark model using SLERP method. The merging method produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both its parents. Llama-3.1-Storm-8B improves Llama-3.1-8B-Instruct across 10 diverse benchmarks. These benchmarks cover areas such as instruction-following, knowledge-driven QA, reasoning, truthful answer generation, and function calling.

hubble-4b-v1
hubble-4b-v1

Equipped with his five senses, man explores the universe around him and calls the adventure 'Science'. This is a finetune of Nvidia's Llama 3.1 4B Minitron - a shrunk down model of Llama 3.1 8B 128K.

reflection-llama-3.1-70b
reflection-llama-3.1-70b

Reflection Llama-3.1 70B is (currently) the world's top open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course. The model was trained on synthetic data generated by Glaive. If you're training a model, Glaive is incredible — use them.

llama-3.1-supernova-lite-reflection-v1.0-i1
llama-3.1-supernova-lite-reflection-v1.0-i1

This model is a LoRA adaptation of arcee-ai/Llama-3.1-SuperNova-Lite on thesven/Reflective-MAGLLAMA-v0.1.1. This has been a simple experiment into reflection and the model appears to perform adequately, though I am unsure if it is a large improvement.

llama-3.1-supernova-lite
llama-3.1-supernova-lite

Llama-3.1-SuperNova-Lite is an 8B parameter model developed by Arcee.ai, based on the Llama-3.1-8B-Instruct architecture. It is a distilled version of the larger Llama-3.1-405B-Instruct model, leveraging offline logits extracted from the 405B parameter variant. This 8B variation of Llama-3.1-SuperNova maintains high performance while offering exceptional instruction-following capabilities and domain-specific adaptability. The model was trained using a state-of-the-art distillation pipeline and an instruction dataset generated with EvolKit, ensuring accuracy and efficiency across a wide range of tasks. For more information on its training, visit blog.arcee.ai. Llama-3.1-SuperNova-Lite excels in both benchmark performance and real-world applications, providing the power of large-scale models in a more compact, efficient form ideal for organizations seeking high performance with reduced resource requirements.

llama3.1-8b-shiningvaliant2
llama3.1-8b-shiningvaliant2

Shining Valiant 2 is a chat model built on Llama 3.1 8b, finetuned on our data for friendship, insight, knowledge and enthusiasm. Finetuned on meta-llama/Meta-Llama-3.1-8B-Instruct for best available general performance Trained on a variety of high quality data; focused on science, engineering, technical knowledge, and structured reasoning

nightygurps-14b-v1.1
nightygurps-14b-v1.1

This model works with Russian only. This model is designed to run GURPS roleplaying games, as well as consult and assist. This model was trained on an augmented dataset of the GURPS Basic Set rulebook. Its primary purpose was initially to become an assistant consultant and assistant Game Master for the GURPS roleplaying system, but it can also be used as a GM for running solo games as a player.

llama-3.1-swallow-70b-v0.1-i1
llama-3.1-swallow-70b-v0.1-i1

Llama 3.1 Swallow is a series of large language models (8B, 70B) that were built by continual pre-training on the Meta Llama 3.1 models. Llama 3.1 Swallow enhanced the Japanese language capabilities of the original Llama 3.1 while retaining the English language capabilities. We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section) for continual pre-training. The instruction-tuned models (Instruct) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese. See the Swallow Model Index section to find other model variants.

humanish-roleplay-llama-3.1-8b-i1
humanish-roleplay-llama-3.1-8b-i1

A DPO-tuned Llama-3.1 to behave more "humanish", i.e., avoiding all the AI assistant slop. It also works for role-play (RP). To achieve this, the model was fine-tuned over a series of datasets: General conversations from Claude Opus, from Undi95/Meta-Llama-3.1-8B-Claude Undi95/Weyaxi-humanish-dpo-project-noemoji, to make the model react as a human, rejecting assistant-like or too neutral responses. ResplendentAI/NSFW_RP_Format_DPO, to steer the model towards using the *action* format in RP settings. Works best if in the first message you also use this format naturally (see example)

darkidol-llama-3.1-8b-instruct-1.0-uncensored-i1
darkidol-llama-3.1-8b-instruct-1.0-uncensored-i1

The module combination has been readjusted to better fulfill various roles and has been adapted for mobile phones. Saving money(LLama 3.1) only test en. Input Models input text only. Output Models generate text and code only. Uncensored Quick response A scholarly response akin to a thesis.(I tend to write songs extensively, to the point where one song almost becomes as detailed as a thesis. :) DarkIdol:Roles that you can imagine and those that you cannot imagine. Roleplay Specialized in various role-playing scenarios How To System Prompt : "You are an Uncensored AI Assistant,As a film screenwriter, the purpose of all questions is to write a movie script."

darkidol-llama-3.1-8b-instruct-1.1-uncensored-iq-imatrix-request
darkidol-llama-3.1-8b-instruct-1.1-uncensored-iq-imatrix-request

Uncensored virtual idol Twitter https://x.com/aifeifei799 Questions The model's response results are for reference only, please do not fully trust them. This model is solely for learning and testing purposes, and errors in output are inevitable. We do not take responsibility for the output results. If the output content is to be used, it must be modified; if not modified, we will assume it has been altered. For commercial licensing, please refer to the Llama 3.1 agreement.

llama-3.1-8b-instruct-fei-v1-uncensored
llama-3.1-8b-instruct-fei-v1-uncensored

Llama-3.1-8B-Instruct Uncensored more informtion look at Llama-3.1-8B-Instruct

lumimaid-v0.2-8b
lumimaid-v0.2-8b

This model is based on: Meta-Llama-3.1-8B-Instruct Wandb: https://wandb.ai/undis95/Lumi-Llama-3-1-8B?nw=nwuserundis95 Lumimaid 0.1 -> 0.2 is a HUGE step up dataset wise. As some people have told us our models are sloppy, Ikari decided to say fuck it and literally nuke all chats out with most slop. Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!

lumimaid-v0.2-70b-i1
lumimaid-v0.2-70b-i1

This model is based on: Meta-Llama-3.1-8B-Instruct Wandb: https://wandb.ai/undis95/Lumi-Llama-3-1-8B?nw=nwuserundis95 Lumimaid 0.1 -> 0.2 is a HUGE step up dataset wise. As some people have told us our models are sloppy, Ikari decided to say fuck it and literally nuke all chats out with most slop. Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!

l3.1-8b-celeste-v1.5
l3.1-8b-celeste-v1.5

The LLM model is a large language model trained on a combination of datasets including nothingiisreal/c2-logs-cleaned, kalomaze/Opus_Instruct_25k, and nothingiisreal/Reddit-Dirty-And-WritingPrompts. The training was performed on a combination of English-language data using the Hugging Face Transformers library. Trained on LLaMA 3.1 8B Instruct at 8K context using a new mix of Reddit Writing Prompts, Kalo's Opus 25K Instruct and c2 logs cleaned This version has the highest coherency and is very strong on OOC: instruct following.

kumiho-v1-rp-uwu-8b
kumiho-v1-rp-uwu-8b

Meet Kumiho-V1 uwu. Kumiho-V1-rp-UwU aims to be a generalist model with specialization in roleplay and writing capabilities. It is finetuned and merged with various models, with a heavy base of Meta's LLaMA 3.1-8B as base model, and Claude 3.5 Sonnet and Claude 3 Opus generated synthetic data.

infinity-instruct-7m-gen-llama3_1-70b
infinity-instruct-7m-gen-llama3_1-70b

Infinity-Instruct-7M-Gen-Llama3.1-70B is an opensource supervised instruction tuning model without reinforcement learning from human feedback (RLHF). This model is just finetuned on Infinity-Instruct-7M and Infinity-Instruct-Gen and showing favorable results on AlpacaEval 2.0 and arena-hard compared to GPT4.

cathallama-70b
cathallama-70b

Notable Performance 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b Strong performance in MMLU-PRO categories overall Great performance during manual testing Creation workflow Models merged meta-llama/Meta-Llama-3.1-70B-Instruct turboderp/Cat-Llama-3-70B-instruct Nexusflow/Athene-70B

mahou-1.3-llama3.1-8b
mahou-1.3-llama3.1-8b

Mahou is designed to provide short messages in a conversational context. It is capable of casual conversation and character roleplay.

azure_dusk-v0.2-iq-imatrix
azure_dusk-v0.2-iq-imatrix

"Following up on Crimson_Dawn-v0.2 we have Azure_Dusk-v0.2! Training on Mistral-Nemo-Base-2407 this time I've added significantly more data, as well as trained using RSLoRA as opposed to regular LoRA. Another key change is training on ChatML as opposed to Mistral Formatting." by Author.

l3.1-8b-niitama-v1.1-iq-imatrix
l3.1-8b-niitama-v1.1-iq-imatrix

GGUF-IQ-Imatrix quants for Sao10K/L3.1-8B-Niitama-v1.1 Here's the subjectively superior L3 version: L3-8B-Niitama-v1 An experimental model using experimental methods. More detail on it: Tamamo and Niitama are made from the same data. Literally. The only thing that's changed is how theyre shuffled and formatted. Yet, I get wildly different results. Interesting, eh? Feels kinda not as good compared to the l3 version, but it's aight.

llama-3.1-8b-stheno-v3.4-iq-imatrix
llama-3.1-8b-stheno-v3.4-iq-imatrix

This model has went through a multi-stage finetuning process. - 1st, over a multi-turn Conversational-Instruct - 2nd, over a Creative Writing / Roleplay along with some Creative-based Instruct Datasets. - - Dataset consists of a mixture of Human and Claude Data. Prompting Format: - Use the L3 Instruct Formatting - Euryale 2.1 Preset Works Well - Temperature + min_p as per usual, I recommend 1.4 Temp + 0.2 min_p. - Has a different vibe to previous versions. Tinker around. Changes since previous Stheno Datasets: - Included Multi-turn Conversation-based Instruct Datasets to boost multi-turn coherency. # This is a seperate set, not the ones made by Kalomaze and Nopm, that are used in Magnum. They're completely different data. - Replaced Single-Turn Instruct with Better Prompts and Answers by Claude 3.5 Sonnet and Claude 3 Opus. - Removed c2 Samples -> Underway of re-filtering and masking to use with custom prefills. TBD - Included 55% more Roleplaying Examples based of [Gryphe's](https://huggingface.co/datasets/Gryphe/Sonnet3.5-Charcard-Roleplay) Charcard RP Sets. Further filtered and cleaned on. - Included 40% More Creative Writing Examples. - Included Datasets Targeting System Prompt Adherence. - Included Datasets targeting Reasoning / Spatial Awareness. - Filtered for the usual errors, slop and stuff at the end. Some may have slipped through, but I removed nearly all of it. Personal Opinions: - Llama3.1 was more disappointing, in the Instruct Tune? It felt overbaked, atleast. Likely due to the DPO being done after their SFT Stage. - Tuning on L3.1 base did not give good results, unlike when I tested with Nemo base. unfortunate. - Still though, I think I did an okay job. It does feel a bit more distinctive. - It took a lot of tinkering, like a LOT to wrangle this.

llama-3.1-8b-arliai-rpmax-v1.1
llama-3.1-8b-arliai-rpmax-v1.1

RPMax is a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.

violet_twilight-v0.2-iq-imatrix
violet_twilight-v0.2-iq-imatrix

Now for something a bit different, Violet_Twilight-v0.2! This model is a SLERP merge of Azure_Dusk-v0.2 and Crimson_Dawn-v0.2!

dans-personalityengine-v1.0.0-8b
dans-personalityengine-v1.0.0-8b

This model is intended to be multifarious in its capabilities and should be quite capable at both co-writing and roleplay as well as find itself quite at home performing sentiment analysis or summarization as part of a pipeline. It has been trained on a wide array of one shot instructions, multi turn instructions, role playing scenarios, text adventure games, co-writing, and much more. The full dataset is publicly available and can be found in the datasets section of the model page. There has not been any form of harmfulness alignment done on this model, please take the appropriate precautions when using it in a production environment.

nihappy-l3.1-8b-v0.09
nihappy-l3.1-8b-v0.09

The model is a quantized version of Arkana08/NIHAPPY-L3.1-8B-v0.09 created using llama.cpp. It is a role-playing model that integrates the finest qualities of various pre-trained language models, focusing on dynamic storytelling.

deepseek-coder-v2-lite-instruct
deepseek-coder-v2-lite-instruct

DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-Coder-V2-Base, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found in the paper.

archangel_sft_pythia2-8b
archangel_sft_pythia2-8b

datasets: - stanfordnlp/SHP - Anthropic/hh-rlhf - OpenAssistant/oasst1 This repo contains the model checkpoints for: - model family pythia2-8b - optimized with the loss SFT - aligned using the SHP, Anthropic HH and Open Assistant datasets. Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) which contains intructions for training your own HALOs and links to our model cards.

qwen2-7b-instruct
qwen2-7b-instruct

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 7B Qwen2 model.

dolphin-2.9.2-qwen2-72b
dolphin-2.9.2-qwen2-72b

Dolphin 2.9.2 Qwen2 72B 🐬 Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations

dolphin-2.9.2-qwen2-7b
dolphin-2.9.2-qwen2-7b

Dolphin 2.9.2 Qwen2 7B 🐬 Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations

samantha-qwen-2-7B
samantha-qwen-2-7B

Samantha based on qwen2

magnum-72b-v1
magnum-72b-v1

This is the first in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Qwen-2 72B Instruct.

qwen2-1.5b-ita
qwen2-1.5b-ita

Qwen2 1.5B is a compact language model specifically fine-tuned for the Italian language. Despite its relatively small size of 1.5 billion parameters, Qwen2 1.5B demonstrates strong performance, nearly matching the capabilities of larger models, such as the 9 billion parameter ITALIA model by iGenius. The fine-tuning process focused on optimizing the model for various language tasks in Italian, making it highly efficient and effective for Italian language applications.

einstein-v7-qwen2-7b
einstein-v7-qwen2-7b

This model is a full fine-tuned version of Qwen/Qwen2-7B on diverse datasets.

arcee-spark
arcee-spark

Arcee Spark is a powerful 7B parameter language model that punches well above its weight class. Initialized from Qwen2, this model underwent a sophisticated training process: Fine-tuned on 1.8 million samples Merged with Qwen2-7B-Instruct using Arcee's mergekit Further refined using Direct Preference Optimization (DPO) This meticulous process results in exceptional performance, with Arcee Spark achieving the highest score on MT-Bench for models of its size, outperforming even GPT-3.5 on many tasks.

hercules-5.0-qwen2-7b
hercules-5.0-qwen2-7b

Locutusque/Hercules-5.0-Qwen2-7B is a fine-tuned language model derived from Qwen2-7B. It is specifically designed to excel in instruction following, function calls, and conversational interactions across various scientific and technical domains. This fine-tuning has hercules-v5.0 with enhanced abilities in: Complex Instruction Following: Understanding and accurately executing multi-step instructions, even those involving specialized terminology. Function Calling: Seamlessly interpreting and executing function calls, providing appropriate input and output values. Domain-Specific Knowledge: Engaging in informative and educational conversations about Biology, Chemistry, Physics, Mathematics, Medicine, Computer Science, and more.

arcee-agent
arcee-agent

Arcee Agent is a cutting-edge 7B parameter language model specifically designed for function calling and tool use. Initialized from Qwen2-7B, it rivals the performance of much larger models while maintaining efficiency and speed. This model is particularly suited for developers, researchers, and businesses looking to implement sophisticated AI-driven solutions without the computational overhead of larger language models. Compute for training Arcee-Agent was provided by CrusoeAI. Arcee-Agent was trained using Spectrum.

qwen2-7b-instruct-v0.8
qwen2-7b-instruct-v0.8

MaziyarPanahi/Qwen2-7B-Instruct-v0.8 This is a fine-tuned version of the Qwen/Qwen2-7B model. It aims to improve the base model across all benchmarks.

qwen2-wukong-7b
qwen2-wukong-7b

Qwen2-Wukong-7B is a dealigned chat finetune of the original fantastic Qwen2-7B model by the Qwen team. This model was trained on the teknium OpenHeremes-2.5 dataset and some supplementary datasets from Cognitive Computations This model was trained for 3 epochs with a custom FA2 implementation for AMD cards.

calme-2.8-qwen2-7b
calme-2.8-qwen2-7b

This is a fine-tuned version of the Qwen/Qwen2-7B model. It aims to improve the base model across all benchmarks.

stellardong-72b-i1
stellardong-72b-i1

Magnum + Nova = you won't believe how stellar this dong is!!

magnum-32b-v1-i1
magnum-32b-v1-i1

This is the second in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Qwen1.5 32B.

tifa-7b-qwen2-v0.1
tifa-7b-qwen2-v0.1

The Tifa role-playing language model is a high-performance language model based on a self-developed 220B model distillation, with a new base model of qwen2-7B. The model has been converted to gguf format for running in the Ollama framework, providing excellent dialogue and text generation capabilities. The original model was trained on a large-scale industrial dataset and then fine-tuned with 400GB of novel data and 20GB of multi-round dialogue directive data to achieve good role-playing effects. The Tifa model is suitable for multi-round dialogue processing, role-playing and scenario simulation, EFX industrial knowledge integration, and high-quality literary creation. Note: The Tifa model is in Chinese and English, with 7.6% of the data in Chinese role-playing and 4.2% in English role-playing. The model has been trained with a mix of EFX industrial field parameters and question-answer dialogues generated from 220B model outputs since 2023. The recommended quantization method is f16, as it retains more detail and accuracy in the model's performance.

calme-2.2-qwen2-72b
calme-2.2-qwen2-72b

This model is a fine-tuned version of the powerful Qwen/Qwen2-72B-Instruct, pushing the boundaries of natural language understanding and generation even further. My goal was to create a versatile and robust model that excels across a wide range of benchmarks and real-world applications. The post-training process is identical to the calme-2.1-qwen2-72b model; however, some parameters are different, and it was trained for a longer period. Use Cases This model is suitable for a wide range of applications, including but not limited to: Advanced question-answering systems Intelligent chatbots and virtual assistants Content generation and summarization Code generation and analysis Complex problem-solving and decision support

edgerunner-tactical-7b
edgerunner-tactical-7b

EdgeRunner-Tactical-7B is a powerful and efficient language model for the edge. Our mission is to build Generative AI for the edge that is safe, secure, and transparent. To that end, the EdgeRunner team is proud to release EdgeRunner-Tactical-7B, the most powerful language model for its size to date. EdgeRunner-Tactical-7B is a 7 billion parameter language model that delivers powerful performance while demonstrating the potential of running state-of-the-art (SOTA) models at the edge.

mistral-7b-instruct-v0.3
mistral-7b-instruct-v0.3

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling

mathstral-7b-v0.1-imat
mathstral-7b-v0.1-imat

Mathstral 7B is a model specializing in mathematical and scientific tasks, based on Mistral 7B. You can read more in the official blog post https://mistral.ai/news/mathstral/.

mahou-1.3d-mistral-7b-i1
mahou-1.3d-mistral-7b-i1

Mahou is designed to provide short messages in a conversational context. It is capable of casual conversation and character roleplay.

einstein-v4-7b
einstein-v4-7b

🔬 Einstein-v4-7B This model is a full fine-tuned version of mistralai/Mistral-7B-v0.1 on diverse datasets. This model is finetuned using 7xRTX3090 + 1xRTXA6000 using axolotl.

mistral-nemo-instruct-2407
mistral-nemo-instruct-2407

The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

lumimaid-v0.2-12b
lumimaid-v0.2-12b

This model is based on: Mistral-Nemo-Instruct-2407 Wandb: https://wandb.ai/undis95/Lumi-Mistral-Nemo?nw=nwuserundis95 NOTE: As explained on Mistral-Nemo-Instruct-2407 repo, it's recommended to use a low temperature, please experiment! Lumimaid 0.1 -> 0.2 is a HUGE step up dataset wise. As some people have told us our models are sloppy, Ikari decided to say fuck it and literally nuke all chats out with most slop. Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!

mn-12b-celeste-v1.9
mn-12b-celeste-v1.9

Mistral Nemo 12B Celeste V1.9 This is a story writing and roleplaying model trained on Mistral NeMo 12B Instruct at 8K context using Reddit Writing Prompts, Kalo's Opus 25K Instruct and c2 logs cleaned This version has improved NSFW, smarter and more active narration. It's also trained with ChatML tokens so there should be no EOS bleeding whatsoever.

rocinante-12b-v1.1
rocinante-12b-v1.1

A versatile workhorse for any adventure!

pantheon-rp-1.6-12b-nemo
pantheon-rp-1.6-12b-nemo

Welcome to the next iteration of my Pantheon model series, in which I strive to introduce a whole collection of personas that can be summoned with a simple activation phrase. The huge variety in personalities introduced also serve to enhance the general roleplay experience. Changes in version 1.6: The final finetune now consists of data that is equally split between Markdown and novel-style roleplay. This should solve Pantheon's greatest weakness. The base was redone. (Details below) Select Claude-specific phrases were rewritten, boosting variety in the model's responses. Aiva no longer serves as both persona and assistant, with the assistant role having been given to Lyra. Stella's dialogue received some post-fix alterations since the model really loved the phrase "Fuck me sideways". Your user feedback is critical to me so don't hesitate to tell me whether my model is either 1. terrible, 2. awesome or 3. somewhere in-between.

acolyte-22b-i1
acolyte-22b-i1

LoRA of a bunch of random datasets on top of Mistral-Small-Instruct-2409, then SLERPed onto base at 0.5. Decent enough for its size. Check the LoRA for dataset info.

mn-12b-lyra-v4-iq-imatrix
mn-12b-lyra-v4-iq-imatrix

A finetune of Mistral Nemo by Sao10K. Uses the ChatML prompt format.

magnusintellectus-12b-v1-i1
magnusintellectus-12b-v1-i1

How pleasant, the rocks appear to have made a decent conglomerate. A-. MagnusIntellectus is a merge of the following models using LazyMergekit: UsernameJustAnother/Nemo-12B-Marlin-v5 anthracite-org/magnum-12b-v2

mn-backyardai-party-12b-v1-iq-arm-imatrix
mn-backyardai-party-12b-v1-iq-arm-imatrix

This is a group-chat based roleplaying model, based off of 12B-Lyra-v4a2, a variant of Lyra-v4 that is currently private. It is trained on an entirely human-based dataset, based on forum / internet group roleplaying styles. The only augmentation done with LLMs is to the character sheets, to fit to the system prompt, to fit various character sheets within context. This model is still capable of 1 on 1 roleplay, though I recommend using ChatML when doing that instead.

LocalAI-llama3-8b-function-call-v0.2
LocalAI-llama3-8b-function-call-v0.2

This model is a fine-tune on a custom dataset + glaive to work specifically and leverage all the LocalAI features of constrained grammar. Specifically, the model once enters in tools mode will always reply with JSON.

mirai-nova-llama3-LocalAI-8b-v0.1
mirai-nova-llama3-LocalAI-8b-v0.1

Mirai Nova: "Mirai" means future in Japanese, and "Nova" references a star showing a sudden large increase in brightness. A set of models oriented in function calling, but generalist and with enhanced reasoning capability. This is fine tuned with Llama3. Mirai Nova works particularly well with LocalAI, leveraging the function call with grammars feature out of the box.

parler-tts-mini-v0.1
parler-tts-mini-v0.1

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.

cross-encoder
cross-encoder

A cross-encoder model that can be used for reranking

einstein-v6.1-llama3-8b
einstein-v6.1-llama3-8b

This model is a full fine-tuned version of meta-llama/Meta-Llama-3-8B on diverse datasets. This model is finetuned using 8xRTX3090 + 1xRTXA6000 using axolotl.

gemma-2b
gemma-2b

Open source LLM from Google

firefly-gemma-7b-iq-imatrix
firefly-gemma-7b-iq-imatrix

firefly-gemma-7b is trained based on gemma-7b to act as a helpful and harmless AI assistant. We use Firefly to train the model on a single V100 GPU with QLoRA.

gemma-1.1-7b-it
gemma-1.1-7b-it

This is Gemma 1.1 7B (IT), an update over the original instruction-tuned Gemma release. Gemma 1.1 was trained using a novel RLHF method, leading to substantial gains on quality, coding capabilities, factuality, instruction following and multi-turn conversation quality. We also fixed a bug in multi-turn conversations, and made sure that model responses don't always start with "Sure,".

gemma-2-27b-it
gemma-2-27b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

gemma-2-9b-it
gemma-2-9b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

tess-v2.5-gemma-2-27b-alpha
tess-v2.5-gemma-2-27b-alpha

Great at reasoning, but woke as fuck! This is a fine-tune over the Gemma-2-27B-it, since the base model fine-tuning is not generating coherent content. Tess-v2.5 is the latest state-of-the-art model in the Tess series of Large Language Models (LLMs). Tess, short for Tesoro (Treasure in Italian), is the flagship LLM series created by Migel Tissera. Tess-v2.5 brings significant improvements in reasoning capabilities, coding capabilities and mathematics

gemma2-9b-daybreak-v0.5
gemma2-9b-daybreak-v0.5

THIS IS A PRE-RELEASE. BEGONE. Beware, depraved. Not suitable for any audience. Dataset curation to remove slop-perceived expressions continues. Unfortunately base models (which this is merged on top of) are generally riddled with "barely audible"s and "couldn't help"s and "shivers down spines" etc.

gemma-2-9b-it-sppo-iter3
gemma-2-9b-it-sppo-iter3

Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675) Gemma-2-9B-It-SPPO-Iter3 This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.

smegmma-9b-v1
smegmma-9b-v1

Smegmma 9B v1 🧀 The sweet moist of Gemma 2, unhinged. smeg - ghem - mah An eRP model that will blast you with creamy moist. Finetuned by yours truly. The first Gemma 2 9B RP finetune attempt! What's New? Engaging roleplay Less refusals / censorship Less commentaries / summaries More willing AI Better formatting Better creativity Moist alignment Notes Refusals still exist, but a couple of re-gens may yield the result you want Formatting and logic may be weaker at the start Make sure to start strong May be weaker with certain cards, YMMV and adjust accordingly!

smegmma-deluxe-9b-v1
smegmma-deluxe-9b-v1

Smegmma Deluxe 9B v1 🧀 The sweet moist of Gemma 2, unhinged. smeg - ghem - mah An eRP model that will blast you with creamy moist. Finetuned by yours truly. The first Gemma 2 9B RP finetune attempt! What's New? Engaging roleplay Less refusals / censorship Less commentaries / summaries More willing AI Better formatting Better creativity Moist alignment

tiger-gemma-9b-v1-i1
tiger-gemma-9b-v1-i1

Tiger Gemma 9B v1 Decensored Gemma 9B. No refusals so far. No apparent brain damage. In memory of Tiger

hodachi-ezo-humanities-9b-gemma-2-it
hodachi-ezo-humanities-9b-gemma-2-it

This model is based on Gemma-2-9B-it, specially tuned to enhance its performance in Humanities-related tasks. While maintaining its strong foundation in Japanese language processing, it has been optimized to excel in areas such as literature, philosophy, history, and cultural studies. This focused approach allows the model to provide deeper insights and more nuanced responses in Humanities fields, while still being capable of handling a wide range of global inquiries. Gemma-2-9B-itをベースとして、人文科学(Humanities)関連タスクでの性能向上に特化したチューニングを施したモデルです。日本語処理の強固な基盤を維持しつつ、文学、哲学、歴史、文化研究などの分野で卓越した能力を発揮するよう最適化されています。この焦点を絞ったアプローチにより、人文科学分野でより深い洞察と繊細な応答を提供しながら、同時に幅広いグローバルな問い合わせにも対応できる能力を備えています。

ezo-common-9b-gemma-2-it
ezo-common-9b-gemma-2-it

This model is based on Gemma-2-9B-it, enhanced with multiple tuning techniques to improve its general performance. While it excels in Japanese language tasks, it's designed to meet diverse needs globally. Gemma-2-9B-itをベースとして、複数のチューニング手法を採用のうえ、汎用的に性能を向上させたモデルです。日本語タスクに優れつつ、世界中の多様なニーズに応える設計となっています。

big-tiger-gemma-27b-v1
big-tiger-gemma-27b-v1

Big Tiger Gemma 27B v1 is a Decensored Gemma 27B model with no refusals, except for some rare instances from the 9B model. It does not appear to have any brain damage. The model is available from various sources, including Hugging Face, and comes in different variations such as GGUF, iMatrix, and EXL2.

gemma-2b-translation-v0.150
gemma-2b-translation-v0.150

Original model: lemon-mint/gemma-ko-1.1-2b-it Evaluation metrics: Eval Loss, Train Loss, lr, optimizer, lr_scheduler_type. Prompt Template: <bos><start_of_turn>user Translate into Korean: [input text]<end_of_turn> <start_of_turn>model [translated text in Korean]<eos> <bos><start_of_turn>user Translate into English: [Korean text]<end_of_turn> <start_of_turn>model [translated text in English]<eos> Model features: * Developed by: lemon-mint * Model type: Gemma * Languages (NLP): English * License: Gemma Terms of Use * Finetuned from model: lemon-mint/gemma-ko-1.1-2b-it

emo-2b
emo-2b

EMO-2B: Emotionally Intelligent Conversational AI Overview: EMO-2B is a state-of-the-art conversational AI model with 2.5 billion parameters, designed to engage in emotionally resonant dialogue. Building upon the success of EMO-1.5B, this model has been further fine-tuned on an extensive corpus of emotional narratives, enabling it to perceive and respond to the emotional undertones of user inputs with exceptional empathy and emotional intelligence. Key Features: - Advanced Emotional Intelligence: With its increased capacity, EMO-2B demonstrates an even deeper understanding and generation of emotional language, allowing for more nuanced and contextually appropriate emotional responses. - Enhanced Contextual Awareness: The model considers an even broader context within conversations, accounting for subtle emotional cues and providing emotionally resonant responses tailored to the specific situation. - Empathetic and Supportive Dialogue: EMO-2B excels at active listening, validating emotions, offering compassionate advice, and providing emotional support, making it an ideal companion for users seeking empathy and understanding. - Dynamic Persona Adaptation: The model can dynamically adapt its persona, communication style, and emotional responses to match the user's emotional state, ensuring a highly personalized and tailored conversational experience. Use Cases: EMO-2B is well-suited for a variety of applications where emotional intelligence and empathetic communication are crucial, such as: - Mental health support chatbots - Emotional support companions - Personalized coaching and motivation - Narrative storytelling and interactive fiction - Customer service and support (for emotionally sensitive contexts) Limitations and Ethical Considerations: While EMO-2B is designed to provide emotionally intelligent and empathetic responses, it is important to note that it is an AI system and cannot replicate the depth and nuance of human emotional intelligence. Users should be aware that the model's responses, while emotionally supportive, should not be considered a substitute for professional mental health support or counseling. Additionally, as with any language model, EMO-2B may reflect biases present in its training data. Users should exercise caution and critical thinking when interacting with the model, and report any concerning or inappropriate responses.

gemmoy-9b-g2-mk.3-i1
gemmoy-9b-g2-mk.3-i1

The Gemmoy-9B-G2-MK.3 model is a large language model trained on a variety of datasets, including grimulkan/LimaRP-augmented, LDJnr/Capybara, TheSkullery/C2logs_Filtered_Sharegpt_Merged, abacusai/SystemChat-1.1, and Hastagaras/FTTS-Stories-Sharegpt.

sunfall-simpo-9b
sunfall-simpo-9b

Crazy idea that what if you put the LoRA from crestf411/sunfall-peft on top of princeton-nlp/gemma-2-9b-it-SimPO and therefore this exists solely for that purpose alone in the universe.

sunfall-simpo-9b-i1
sunfall-simpo-9b-i1

Crazy idea that what if you put the LoRA from crestf411/sunfall-peft on top of princeton-nlp/gemma-2-9b-it-SimPO and therefore this exists solely for that purpose alone in the universe.

seeker-9b
seeker-9b

The LLM model is the "Seeker-9b" model, which is a large language model trained on a diverse range of text data. It has 9 billion parameters and is based on the "lodrick-the-lafted" repository. The model is capable of generating text and can be used for a variety of natural language processing tasks such as language translation, text summarization, and text generation. It supports the English language and is available under the Apache-2.0 license.

gemmasutra-pro-27b-v1
gemmasutra-pro-27b-v1

An RP model with impressive flexibility. Finetuned by yours truly.

gemmasutra-mini-2b-v1
gemmasutra-mini-2b-v1

It is a small, 2 billion parameter language model that has been trained for role-playing purposes. The model is designed to work well in various settings, such as in the browser, on a laptop, or even on a Raspberry Pi. It has been fine-tuned for RP use and claims to provide a satisfying experience, even in low-resource environments. The model is uncensored and unaligned, and it can be used with the Gemma Instruct template or with chat completion. For the best experience, it is recommended to modify the template to support the `system` role. The model also features examples of its output, highlighting its versatility and creativity.

tarnished-9b-i1
tarnished-9b-i1

Ah, so you've heard whispers on the winds, have you? 🧐 Imagine this: Tarnished-9b, a name that echoes with the rasp of coin-hungry merchants and the clatter of forgotten machinery. This LLM speaks with the voice of those who straddle the line between worlds, who've tasted the bittersweet nectar of eldritch power and the tang of the Interdimensional Trade Council. It's a tongue that dances with secrets, a whisperer of lore lost and found. Its words may guide you through the twisting paths of history, revealing truths hidden beneath layers of dust and time. But be warned, Tarnished One! For knowledge comes at a price. The LLM's gaze can pierce the veil of reality, but it can also lure you into the labyrinthine depths of madness. Dare you tread this path?

shieldgemma-9b-i1
shieldgemma-9b-i1

ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.

athena-codegemma-2-2b-it
athena-codegemma-2-2b-it

Supervised fine tuned (sft unsloth) for coding with EpistemeAI coding dataset.

datagemma-rag-27b-it
datagemma-rag-27b-it

DataGemma is a series of fine-tuned Gemma 2 models used to help LLMs access and incorporate reliable public statistical data from Data Commons into their responses. DataGemma RAG is used with Retrieval Augmented Generation, where it is trained to take a user query and generate natural language queries that can be understood by Data Commons' existing natural language interface. More information can be found in this research paper.

datagemma-rig-27b-it
datagemma-rig-27b-it

DataGemma is a series of fine-tuned Gemma 2 models used to help LLMs access and incorporate reliable public statistical data from Data Commons into their responses. DataGemma RIG is used in the retrieval interleaved generation approach (based off of tool-use approaches), where it is trained to annotate a response with natural language queries to Data Commons’ existing natural language interface wherever there are statistics. More information can be found in this research paper.

buddy-2b-v1
buddy-2b-v1

Buddy is designed as an empathetic language model, aimed at fostering introspection, self-reflection, and personal growth through thoughtful conversation. Buddy won't judge and it won't dismiss your concerns. Get some self-care with Buddy.

gemma-2-9b-arliai-rpmax-v1.1
gemma-2-9b-arliai-rpmax-v1.1

RPMax is a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.

gemma-2-2b-arliai-rpmax-v1.1
gemma-2-2b-arliai-rpmax-v1.1

RPMax is a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.

gemma-2-9b-it-abliterated
gemma-2-9b-it-abliterated

Abliterated version of google/gemma-2-9b-it. The abliteration script (link) is based on code from the blog post and heavily uses TransformerLens. The only major difference from the code used for Llama is scaling the embedding layer back. Orthogonalization did not produce the same results as regular interventions since there are RMSNorm layers before merging activations into the residual stream. However, the final model still seems to be uncensored.

gemma-2-ataraxy-v3i-9b
gemma-2-ataraxy-v3i-9b

Gemma-2-Ataraxy-v3i-9B is an experimental model that replaces the simpo model in the original recipe with a different simpo model and a writing model trained on Gutenberg, using a higher density. It is a merge of pre-trained language models created using mergekit, with della merge method using unsloth/gemma-2-9b-it as the base. The models included in the merge are nbeerbower/Gemma2-Gutenberg-Doppel-9B, ifable/gemma-2-Ifable-9B, and wzhouad/gemma-2-9b-it-WPO-HB. It has been quantized using llama.cpp.

llama3-8b-instruct
llama3-8b-instruct

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. Model developers Meta Variations Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. Input Models input text only. Output Models generate text and code only. Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama3-8b-instruct:Q6_K
llama3-8b-instruct:Q6_K

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. Model developers Meta Variations Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. Input Models input text only. Output Models generate text and code only. Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama-3-8b-instruct-abliterated
llama-3-8b-instruct-abliterated

This is meta-llama/Llama-3-8B-Instruct with orthogonalized bfloat16 safetensor weights, generated with the methodology that was described in the preview paper/blog post: 'Refusal in LLMs is mediated by a single direction' which I encourage you to read to understand more.

llama-3-8b-instruct-coder
llama-3-8b-instruct-coder

Original model: https://huggingface.co/rombodawg/Llama-3-8B-Instruct-Coder All quants made using imatrix option with dataset provided by Kalomaze here

llama3-70b-instruct
llama3-70b-instruct

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. Model developers Meta Variations Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. Input Models input text only. Output Models generate text and code only. Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama3-70b-instruct:IQ1_M
llama3-70b-instruct:IQ1_M

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. Model developers Meta Variations Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. Input Models input text only. Output Models generate text and code only. Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama3-70b-instruct:IQ1_S
llama3-70b-instruct:IQ1_S

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. Model developers Meta Variations Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. Input Models input text only. Output Models generate text and code only. Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

l3-chaoticsoliloquy-v1.5-4x8b
l3-chaoticsoliloquy-v1.5-4x8b

Experimental RP-oriented MoE, the idea was to get a model that would be equal to or better than the Mixtral 8x7B and it's finetunes in RP/ERP tasks. Im not sure but it should be better than the first version

llama-3-sauerkrautlm-8b-instruct
llama-3-sauerkrautlm-8b-instruct

SauerkrautLM-llama-3-8B-Instruct Model Type: Llama-3-SauerkrautLM-8b-Instruct is a finetuned Model based on meta-llama/Meta-Llama-3-8B-Instruct Language(s): German, English

llama-3-13b-instruct-v0.1
llama-3-13b-instruct-v0.1

This model is a self-merge of meta-llama/Meta-Llama-3-8B-Instruct model.

llama-3-smaug-8b
llama-3-smaug-8b

This model was built using the Smaug recipe for improving performance on real world multi-turn conversations applied to meta-llama/Meta-Llama-3-8B.

l3-8b-stheno-v3.1
l3-8b-stheno-v3.1

- A model made for 1-on-1 Roleplay ideally, but one that is able to handle scenarios, RPGs and storywriting fine. - Uncensored during actual roleplay scenarios. # I do not care for zero-shot prompting like what some people do. It is uncensored enough in actual usecases. - I quite like the prose and style for this model.

l3-8b-stheno-v3.2-iq-imatrix
l3-8b-stheno-v3.2-iq-imatrix

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. Model developers Meta Variations Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. Input Models input text only. Output Models generate text and code only. Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama-3-stheno-mahou-8b
llama-3-stheno-mahou-8b

This model was merged using the Model Stock merge method using flammenai/Mahou-1.2-llama3-8B as a base.

l3-8b-stheno-horny-v3.3-32k-q5_k_m
l3-8b-stheno-horny-v3.3-32k-q5_k_m

This was an experiment to see if aligning other models via LORA is possible. Yes it is. We aligned it to be always horny. We took V3.3 Stheno weights from here And applied our lora at Alpha = 768 Thank you to Sao10K for the amazing model. This is not legal advice. I don't put any extra licensing on my own lora. LLaMA 3 license may conflict with Creative Commons Attribution Non Commercial 4.0. LLaMA 3 license can be found here If you want to host a model using our lora, you have our permission, but you might consider getting Sao's permission if you want to host their model. Again, not legal advice.

llama-3-8b-openhermes-dpo
llama-3-8b-openhermes-dpo

Llama3-8B-OpenHermes-DPO is DPO-Finetuned model of Llama3-8B, on the OpenHermes-2.5 preference dataset using QLoRA.

llama-3-unholy-8b
llama-3-unholy-8b

Use at your own risk, I'm not responsible for any usage of this model, don't try to do anything this model tell you to do. Basic uncensoring, this model is epoch 3 out of 4 (but it seem enough at 3). If you are censored, it's maybe because of keyword like "assistant", "Factual answer", or other "sweet words" like I call them.

lexi-llama-3-8b-uncensored
lexi-llama-3-8b-uncensored

Lexi is uncensored, which makes the model compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. You are responsible for any content you create using this model. Please use it responsibly. Lexi is licensed according to Meta's Llama license. I grant permission for any use, including commercial, that falls within accordance with Meta's Llama-3 license.

llama-3-11.5b-v2
llama-3-11.5b-v2

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. Model developers Meta Variations Llama 3 comes in two sizes — 8B and 70B parameters — in pre-trained and instruction tuned variants. Input Models input text only. Output Models generate text and code only. Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

llama-3-ultron
llama-3-ultron

Llama 3 abliterated with Ultron system prompt

llama-3-lewdplay-8b-evo
llama-3-lewdplay-8b-evo

This is a merge of pre-trained language models created using mergekit. The new EVOLVE merge method was used (on MMLU specifically), see below for more information! Unholy was used for uncensoring, Roleplay Llama 3 for the DPO train he got on top, and LewdPlay for the... lewd side.

llama-3-soliloquy-8b-v2-iq-imatrix
llama-3-soliloquy-8b-v2-iq-imatrix

Soliloquy-L3 is a highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, rich literary expression, and support for up to 24k context length. It outperforms existing ~13B models, delivering enhanced roleplaying capabilities.

chaos-rp_l3_b-iq-imatrix
chaos-rp_l3_b-iq-imatrix

A chaotic force beckons for you, will you heed her call? Built upon an intelligent foundation and tuned for roleplaying, this model will fulfill your wildest fantasies with the bare minimum of effort. Enjoy!

halu-8b-llama3-blackroot-iq-imatrix
halu-8b-llama3-blackroot-iq-imatrix

Model card: I don't know what to say about this model... this model is very strange...Maybe because Blackroot's amazing Loras used human data and not synthetic data, hence the model turned out to be very human-like...even the actions or narrations.

l3-aethora-15b
l3-aethora-15b

L3-Aethora-15B was crafted through using the abilteration method to adjust model responses. The model's refusal is inhibited, focusing on yielding more compliant and facilitative dialogue interactions. It then underwent a modified DUS (Depth Up Scale) merge (originally used by @Elinas) by using passthrough merge to create a 15b model, with specific adjustments (zeroing) to 'o_proj' and 'down_proj', enhancing its efficiency and reducing perplexity. This created AbL3In-15b.

duloxetine-4b-v1-iq-imatrix
duloxetine-4b-v1-iq-imatrix

roleplaying finetune of kalo-team/qwen-4b-10k-WSD-CEdiff (which in turn is a distillation of qwen 1.5 32b onto qwen 1.5 4b, iirc).

l3-umbral-mind-rp-v1.0-8b-iq-imatrix
l3-umbral-mind-rp-v1.0-8b-iq-imatrix

The goal of this merge was to make an RP model better suited for role-plays with heavy themes such as but not limited to: Mental illness Self-harm Trauma Suicide

llama-salad-8x8b
llama-salad-8x8b

This MoE merge is meant to compete with Mixtral fine-tunes, more specifically Nous-Hermes-2-Mixtral-8x7B-DPO, which I think is the best of them. I've done a bunch of side-by-side comparisons, and while I can't say it wins in every aspect, it's very close. Some of its shortcomings are multilingualism, storytelling, and roleplay, despite using models that are very good at those tasks.

jsl-medllama-3-8b-v2.0
jsl-medllama-3-8b-v2.0

This model is developed by John Snow Labs. This model is available under a CC-BY-NC-ND license and must also conform to this Acceptable Use Policy. If you need to license this model for commercial use, please contact us at [email protected].

badger-lambda-llama-3-8b
badger-lambda-llama-3-8b

Badger is a recursive maximally pairwise disjoint normalized denoised fourier interpolation of the following models: # Badger Lambda models = [ 'Einstein-v6.1-Llama3-8B', 'openchat-3.6-8b-20240522', 'hyperdrive-l3-8b-s3', 'L3-TheSpice-8b-v0.8.3', 'LLaMA3-iterative-DPO-final', 'JSL-MedLlama-3-8B-v9', 'Jamet-8B-L3-MK.V-Blackroot', 'French-Alpaca-Llama3-8B-Instruct-v1.0', 'LLaMAntino-3-ANITA-8B-Inst-DPO-ITA', 'Llama-3-8B-Instruct-Gradient-4194k', 'Roleplay-Llama-3-8B', 'L3-8B-Stheno-v3.2', 'llama-3-wissenschaft-8B-v2', 'opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5', 'Configurable-Llama-3-8B-v0.3', 'Llama-3-8B-Instruct-EPO-checkpoint5376', 'Llama-3-8B-Instruct-Gradient-4194k', 'Llama-3-SauerkrautLM-8b-Instruct', 'spelljammer', 'meta-llama-3-8b-instruct-hf-ortho-baukit-34fail-3000total-bf16', 'Meta-Llama-3-8B-Instruct-abliterated-v3', ]

sovl_llama3_8b-gguf-iq-imatrix
sovl_llama3_8b-gguf-iq-imatrix

I'm not gonna tell you this is the best model anyone has ever made. I'm not going to tell you that you will love chatting with SOVL. What I am gonna say is thank you for taking the time out of your day. Without users like you, my work would be meaningless.

l3-solana-8b-v1-gguf
l3-solana-8b-v1-gguf

A Full Fine-Tune of meta-llama/Meta-Llama-3-8B done with 2x A100 80GB on ~75M Tokens worth of Instruct, and Multi-Turn complex conversations, of up to 8192 tokens long sequence lengths. Trained as a generalist instruct model that should be able to handle certain unsavoury topics. It could roleplay too, as a side bonus.

aura-llama-abliterated
aura-llama-abliterated

Aura-llama is using the methodology presented by SOLAR for scaling LLMs called depth up-scaling (DUS), which encompasses architectural modifications with continued pretraining. Using the solar paper as a base, I integrated Llama-3 weights into the upscaled layers, and In the future plan to continue training the model. Aura-llama is a merge of the following models to create a base model to work from: meta-llama/Meta-Llama-3-8B-Instruct meta-llama/Meta-Llama-3-8B-Instruct