当前位置：首页 > news >正文

基于win11下，使用Qwen2.5 0.5B为基模型lora微调，然后使用ollama来运行自定义的大模型的例子

news 2025/3/23 6:09:03

近期需要做相关的一些验证工作，想把相关的一些内容记录下来，以供日后使用，后期发现问题也会继续完善。
参考文章：
https://blog.csdn.net/spiderwower/article/details/138755776
但是我按照上述的例子只能做到一半，所以又找其他一些相关的资料，才完成所有任务。

1、下载基模型：使用魔搭社区https://modelscope.cn
①首先先在魔搭社区找到需要使用的基模型，比如我这里使用阿里的Qwen2.5-0.5模型，完整名字：Qwen/Qwen2.5-0.5B-Instruct
在这里插入图片描述
②确保已安装合适的conda环境和python环境，这里使用python3.10，具体不再细说。

conda create -n nlp python=3.10
conda activate nlp310
切换到nlp310环境：
conda activate nlp310

③安装torch环境，如果有英伟达的显卡一定要安装gpu版本，我这里笔记本性能有限，MX350，2G显存，没办法只能选择小模型，当然要提前安装好cuda环境，这里计划使用cuda版本为11.8。
可以参考文章：win11+4060配置CUDA11.8+pytorch2.0.0

好之后，可以使用阿里云的镜像源来安装：

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 -f https://mirrors.aliyun.com/pytorch-wheels/cu118

可以通过如下地址验证版本及cuda是否可用：

import torch
print(torch.cuda.is_available())

当然也可以安装其他版本。
安装完后，再接着安装训练微调使用的pip包：

transformers
streamlit==1.24.0
sentencepiece==0.1.99
accelerate==0.29.3
datasets
peft==0.10.0

也有可能会报错，原因是transformers的版本问题。
这种情况下会建议安装tf-keras。

pip install tf-keras

④基模型下载，使用如下的代码：
当然要提前安装modelscope

pip install modelscope

环境变量中提前设置modelscope的默认存储地址：
我把下载路径换到了d:/cuda/modelscope中。

环境变量名：MODELSCOPE_CACHE
环境值：d:/cuda/modelscope

模型下载：md_download1.py

from modelscope import snapshot_download#模型存放路径
# model_path = '/root/autodl-tmp'
#模型名字
name = 'Qwen/Qwen2.5-0.5B-Instruct'
# model_dir = snapshot_download(name, cache_dir=model_path, revision='master')
model_dir = snapshot_download(name)

注意name为你想下载的模型全名，运行即可下载：

python md_download1.py

在这里插入图片描述
具体内容：

2、下载训练集
以中文数据集弱智吧为例，约1500条对话数据来进行训练,完整名称为：kigner/ruozhiba-llama3-tt
地址为：
https://huggingface.co/datasets/kigner/ruozhiba-llama3-tt/tree/main
我是手工下载文件到路径d:/cuda/ruozhiba-llma3-tt。
在这里插入图片描述
3、使用lora微调。
①安装相应的python包。使用其他文章中的训练代码，只调整了模型名称,train.py：

from datasets import Dataset
import pandas as pd
from transformers import (AutoTokenizer,AutoModelForCausalLM,DataCollatorForSeq2Seq,TrainingArguments,Trainer, )
import torch,os
from peft import LoraConfig, TaskType, get_peft_model
import warnings
warnings.filterwarnings("ignore", category=UserWarning) # 忽略告警device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("device:"+device)
# 模型文件路径
model_path = r'D:/cuda/modelscope/models/Qwen/Qwen2___5-0___5B-Instruct'
# 训练过程数据保存路径
name = 'ruozhiba'
output_dir = f'./output/qwen-0.5B-{name}'
#是否从上次断点处接着训练，如果需要从上次断点处继续训练，值应为True
train_with_checkpoint = False
# 加载tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
# 加载数据集
df = pd.read_json(f'D:/cuda/ruozhiba-llama3-tt/ruozhiba_qa.json')
ds = Dataset.from_pandas(df)
print(ds)
# 对数据集进行处理，需要将数据集的内容按大模型的对话格式进行处理
# 大模型处理
def process_func_mistral(example):MAX_LENGTH = 384  # Llama分词器会将一个中文字切分为多个token，因此需要放开一些最大长度，保证数据的完整性instruction = tokenizer(f"<s>[INST] <<SYS>>\n\n<</SYS>>\n\n{example['instruction']+example['input']}[/INST]",add_special_tokens=False)  # add_special_tokens 不在开头加 special_tokensresponse = tokenizer(f"{example['output']}", add_special_tokens=False)input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1]  # 因为pad_token_id咱们也是要关注的所以 补充为1labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]if len(input_ids) > MAX_LENGTH:  # 做一个截断input_ids = input_ids[:MAX_LENGTH]attention_mask = attention_mask[:MAX_LENGTH]labels = labels[:MAX_LENGTH]return {"input_ids": input_ids,"attention_mask": attention_mask,"labels": labels}inputs_id = ds.map(process_func_mistral, remove_columns=ds.column_names)
#加载模型
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, torch_dtype=torch.bfloat16, use_cache=False)
model.enable_input_require_grads()  # 开启梯度检查点时，要执行该方法
print(model)
#设置lora训练参数
config = LoraConfig(task_type=TaskType.CAUSAL_LM,target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],inference_mode=False,  # 训练模式r=8,  # Lora 秩lora_alpha=32,  # Lora alaph，具体作用参见 Lora 原理lora_dropout=0.1  # Dropout 比例
)
#设置训练参数
model = get_peft_model(model, config)
model.print_trainable_parameters()
args = TrainingArguments(output_dir=output_dir,per_device_train_batch_size=2,gradient_accumulation_steps=2,logging_steps=20,num_train_epochs=2,save_steps=25,save_total_limit=2,learning_rate=1e-4,save_on_each_node=True,gradient_checkpointing=True
)
trainer = Trainer(model=model,args=args,train_dataset=inputs_id,data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
#开始训练
# 如果训练中断了，还可以从上次中断保存的位置继续开始训练
if train_with_checkpoint:checkpoint = [file for file in os.listdir(output_dir) if 'checkpoint' in file][-1]last_checkpoint = f'{output_dir}/{checkpoint}'print(last_checkpoint)trainer.train(resume_from_checkpoint=last_checkpoint)
else:trainer.train()

执行代码，python train.py

会生成几个checkpoint，这个过程在我的电脑上使用了差不多50分钟。
在这里插入图片描述
②、checkpoint转为lora
checkpoint_to_lora.py
具体代码：

from transformers import AutoModelForSequenceClassification,AutoTokenizer
import os# 需要保存的lora路径
lora_path= "d:/cuda/lora/qwen-0__5B-lora-ruozhiba"
# 模型路径
model_path = 'D:/cuda/modelscope/models/Qwen/Qwen2___5-0___5B-Instruct'
# 检查点路径
checkpoint_dir = 'output/qwen-0.5B-ruozhiba'
checkpoint = [file for file in os.listdir(checkpoint_dir) if 'checkpoint-' in file][-1] #选择更新日期最新的检查点
model = AutoModelForSequenceClassification.from_pretrained(f'output/qwen-0.5B-ruozhiba/{checkpoint}')
# 保存模型
model.save_pretrained(lora_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
# 保存tokenizer
tokenizer.save_pretrained(lora_path)

执行代码：python checkpoint_to_lora.py
最终会把模型生成到d:/cuda/lora/qwen-0__5B-lora-ruozhiba中：
在这里插入图片描述

③、模型合并
merge.py
代码如下：

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel
from peft import LoraConfig, TaskType, get_peft_modelmodel_path = 'D:/cuda/modelscope/models/Qwen/Qwen2___5-0___5B-Instruct'
lora_path = "d:/cuda/lora/qwen-0__5B-lora-ruozhiba"
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 合并后的模型路径
output_path = r'd:/cuda/merge'# 等于训练时的config参数
config = LoraConfig(task_type=TaskType.CAUSAL_LM,target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],inference_mode=False,  # 训练模式r=8,  # Lora 秩lora_alpha=32,  # Lora alaph，具体作用参见 Lora 原理lora_dropout=0.1  # Dropout 比例
)base = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)
base_tokenizer = AutoTokenizer.from_pretrained(model_path)
lora_model = PeftModel.from_pretrained(base,lora_path,torch_dtype=torch.float16,config=config
)
model = lora_model.merge_and_unload()
model.save_pretrained(output_path)
base_tokenizer.save_pretrained(output_path)

执行代码：
python merge.py

生成结果如下：
在这里插入图片描述

④模型转为gguf格式：
下载源码：
https://github.com/ggml-org/llama.cpp
在这里插入图片描述

使用其中的转换器转换：

python convert_hf_to_gguf.py D:/cuda/merge --outtype f16 --outfile d:/cuda/qwen-ruozhiba.bin


![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/a74e42c35ac144f0885a2b611d0add95.png)
4、模型量化：
首先下载llama.cpp的编译包：
https://github.com/ggml-org/llama.cpp/releases
根据自己的cpu情况选择合适的包，比我的使用的cpu是intel10代处理，通过cpuz可以查看支持的技术：
![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/8d21ed8017e94dc4b96d870d063c44fe.png)
我这里支持AVX和AVX2，可以使用AVX2的包：
![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/9c25a2ba944146488b929ac808baf391.png)
下载后，解压，会得到相关的编译包：
![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/26d3b199aa45486dba0c20aa413261d3.png)
可以将路径加到path中，但要注意避免冲突，PS:后期升级ollama后出现了冲突，去掉path即可。
然后在D:/cuda中执行转换：
llama-quantize qwen-ruozhiba.bin q5_k_m
会生成一个新文件ggml-model-Q5_K_M.gguf：
![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/586bdf823fe94e90be7a8023e3bcc5f1.png)
将文件放到新的目录：D:/cuda/qwen2.5-new
5、安装ollama。
下载
https://ollama.com/download/OllamaSetup.exe
安装。
然后到文件夹中新增一个文件，文件名：ModelFile，内容如下：```bash
FROM D:/cuda/qwen2.5-new/ggml-model-Q5_K_M.gguf# set the temperature to 0.7 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER repeat_penalty 1.05
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>"""
# set the system message
SYSTEM """
You are a helpful assistant.
"""

执行创建：

ollama create qwen2_0.5b_demo --file ./ModelFile

最后，运行模型：

ollama run qwen2_0.5b_demo

在这里插入图片描述
可以愉快的玩耍啦。
但是因为大模型实在太小，可能效果不好，有空也可以尝试比较大的模型作为基模型来进行训练。

如果需要联网思路可以参考文章：
https://blog.csdn.net/ChinaLiaoTian/article/details/145504774
如果需要知识库能力可以参考：
https://blog.csdn.net/sheex2012/article/details/138339166

查看全文

原文地址:https://blog.csdn.net/lwprain/article/details/146367778
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mrgr.cn/news/95144.html 如若内容造成侵权/违法违规/事实不符，请联系邮箱：809451989@qq.com进行投诉反馈，一经查实，立即删除！