Merging pre-trained language models allows us to combine the strengths of multiple models into a single, optimized model without the need for extensive retraining. This process can enhance performance and adaptability, making it a valuable technique in the field of natural language processing. In this tutorial, we'll explore how to merge two models using the mergekit library, guiding you through each step to create your custom merged model.
Github Link here.
Step 1: Clone the MergeKit Repository
Begin by cloning the mergekit repository from GitHub and navigating into the directory:
!git clone https://github.com/arcee-ai/mergekit.git
%cd mergekitStep 2: Install MergeKit
Install mergekit in editable mode to ensure that any changes in the source code are immediately reflected without the need for reinstallation:
pip install -e .Step 3: Authenticate with Hugging Face
To access models from the Hugging Face Hub, authenticate using the notebook_login function from the huggingface_hub library:
from huggingface_hub import notebook_login
notebook_login()Step 4: Define the Merge Configuration
Create a YAML configuration file that specifies the models to merge, the merge method, and other parameters. In this example, we're using the Spherical Linear Interpolation (SLERP) method to merge two models:
merge_config = """
slices:
- sources:
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range:
- 0
- 32
- model: Open-Orca/Mistral-7B-OpenOrca
layer_range:
- 0
- 32
merge_method: slerp
base_model: Open-Orca/Mistral-7B-OpenOrca
parameters:
t:
- filter: self_attn
value:
- 0
- 0.5
- 0.3
- 0.7
- 1
- filter: mlp
value:
- 1
- 0.5
- 0.7
- 0.3
- 0
- value: 0.5
dtype: bfloat16
"""Save this configuration to a file named config.yaml:
with open('config.yaml', 'w') as f:
f.write(merge_config)Step 5: Execute the Merge
Run the mergekit-yaml command with the configuration file to perform the merge:
!mergekit-yaml config.yaml ./merge --copy-tokenizer --allow-crimes --out-shard-size 1B --trust-remote-codeThis command will generate the merged model in the ./merge directory.
Step 6: Upload the Merged Model to Hugging Face
To share your merged model, upload it to the Hugging Face Hub. First, create a new repository:
from huggingface_hub import HfApi
api = HfApi(token="YOUR_HUGGING_FACE_TOKEN")
username = "your_username"
MODEL_NAME = "orca-teknium-merge"
api.create_repo(
repo_id=f"{username}/{MODEL_NAME}",
repo_type="model"
)Then, upload the merged model to the repository:
api.upload_folder(
repo_id=f"{username}/{MODEL_NAME}",
folder_path="merge"
)Step 7: Install Additional Dependencies
Ensure that bitsandbytes and gradio are installed for quantization and creating a user interface, respectively:
pip install -U bitsandbytes
pip install gradioStep 8: Load and Quantize the Merged Model
Use the BitsAndBytesConfig to load the model with 4-bit quantization, which reduces memory usage and speeds up inference:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
MODEL_NAME = "your_username/orca-teknium-merge"
# Enable quantization with bitsandbytes
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, quantization_config=bnb_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)Step 9: Create a Chat Interface with Gradio
Utilize gradio to build a simple web-based chat interface for interacting with your merged model:
import gradio as gr
def chat(message, history):
prompt = f"### Instruction:\nRespond to the user's query concisely and helpfully.\n\n### User:\n{message}\n\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
num_beams=5,
early_stopping=True,
no_repeat_ngram_size=2
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
# Ensure the response doesn't contain the prompt
response = response.replace(prompt, "").strip()
return response
# Gradio ChatInterface
iface = gr.ChatInterface(
fn=chat,
title="Custom Merged Model Chatbot",
description="Chat with the AI! Type your message below.",
)
iface.launch()This script sets up a chat interface where users can input messages and receive responses generated by the merged model.
Key Considerations
- Ensure Model Architecture Matches: The models being merged must share the same architecture (e.g., Mistral-7B, LLaMA) for compatibility. Merging models with different architectures will lead to errors or suboptimal performance.
- MergeKit simplifies model merging, allowing for easy integration of different pre-trained models.
- SLERP (Spherical Linear Interpolation) provides a smooth method for merging layers while maintaining stability.
- Quantization with BitsAndBytes reduces memory usage and speeds up inference.
- Uploading to Hugging Face makes your model accessible to the community or private use.
- Gradio provides a simple UI to interact with your model and test its capabilities.
By following these steps, you've successfully merged two pre-trained language models into a single, optimized model using mergekit.