当前位置: 首页 > news >正文

KTransformers安装笔记 利用docker安装KTransformers

KTransformers安装笔记

    • 0、前提条件
    • 1、安装相关依赖
    • 2、创建虚拟环境并操作
    • 3、安装pytorch==2.6版本
    • 4、下载项目源代码:
      • 针对于cpu and 1T RAM硬件条件:
    • 5、相关权重文件以及配置文件下载
    • 6、利用docker安装KTransformers

0、前提条件

CUDA12.4
建议升级到较新版本的CMake,安装git,g++,gcc

执行以下命令,把 CUDA_HOME 指向你 CUDA 的安装目录(比如 /usr/local/cuda-12.4)

export CUDA_HOME=/usr/local/cuda-12.4
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
source ~/.bashrc

查看是否成功:

which nvcc

输出:表示配置成功

/usr/local/cuda-12.4/bin/nvcc

创建本地容器

docker run -it -d --gpus all --privileged  -p 8083:8080  -p 8084:8084   -p 8055:22 --name KT_ubuntu2204_cuda124_cudnn8700 -v /home/data_c/KT_data/:/home/data_c/KT_data/   -v /home/data_a/gqr/:/home/data_a  afa4f07f5e5e /bin/bash

默认cuda与cudnn已经安装完毕,conda也安装完毕

1、安装相关依赖

sudo apt-get update 
sudo apt-get install build-essential cmake ninja-build patchelf

2、创建虚拟环境并操作

conda create --name ktransformers python=3.11
conda activate ktransformers   # 激活环境
conda install -c conda-forge libstdcxx-ng # Anaconda provides a package called `libstdcxx-ng` that includes a newer version of `libstdc++`, which can be installed via `conda-forge`.
# 查看指令
strings ~/anaconda3/envs/ktransformers/lib/libstdc++.so.6 | grep GLIBCXX

3、安装pytorch==2.6版本

在这里插入图片描述

pip install torch torchvision torchaudio

国内:

pip3 install torch torchvision torchaudio --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install packaging ninja cpufeature numpy  --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple

安装flash-attention

pip install flash_attn

在这里插入图片描述
最好执行上述语句安装,因为这中安装方式成功后,表明相关的nvcc安装成功,对接下来的安装更优把握

也可以直接下载对应cuda与torch版本的.whl包(官网说可以,没亲自尝试过)

https://github.com/Dao-AILab/flash-attention/releases

或者:

pip install flash-attn --find-links https://github.com/Dao-AILab/flash-attn/releases

判断flash-attn是否安装成功执行如下:

import flash_attn
print(flash_attn.__version__)
# 有时 flash_attn 会安装成功但 CUDA 编译失败,你可以进一步测试核心 CUDA 扩展:
from flash_attn.layers.rotary import RotaryEmbedding
from flash_attn.bert_padding import pad_input, unpad_input

正常运行,则表示成功!!!

apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libgflags-dev zlib1g-dev libfmt-devapt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelfapt install libnuma-dev
pip install packaging ninja cpufeature numpy openai

4、下载项目源代码:

git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive

针对与不同情况的安装方式:

针对于cpu and 1T RAM硬件条件:

# For Multi-concurrency with two cpu and 1T RAM:
apt install libnuma-dev
export USE_NUMA=1
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh

经过漫长的时间等待后,编译成功,如下所示:
在这里插入图片描述

5、相关权重文件以及配置文件下载

在这里插入图片描述

下载Deepseek-R1-Q4模型权重

# 安装魔搭社区
pip install modelscope
# 模型权重文件下载
modelscope download --model lmstudio-community/DeepSeek-R1-GGUF  --include DeepSeek-R1-Q4_K_M*  --local_dir ./dir

在这里插入图片描述

下载Deepseek-R1-Q4 config配置文件

modelscope download --model deepseek-ai/DeepSeek-R1 --exclude *.safetensors  --local_dir ./config

在这里插入图片描述

报错类型:

报错一:
编译阶段报错如下:

ERROR: Failed to build installable wheels for some pyproject.toml based projects (ktransformers)

在这里插入图片描述
官网解决方案:
在这里插入图片描述
语句:

sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
###############################################################################
就在ktransformers/csrc/balance_serve/CMakeLists.txt里加上一行set(CMAKE_CUDA_STANDARD 17),应该就可以编译过去。

重新编译后:
在这里插入图片描述

报错二:
问题描述:
执行语句如下

python ktransformers/server/main.py \--port 10002 \--model_path /home/data_c/KT_data/config \--gguf_path /home/data_c/KT_data/dir \--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \--max_new_tokens 1024 \--cache_lens 32768 \--chunk_size 256 \--max_batch_size 4 \--backend_type balance_serve \--force_think # useful for R1

报错显示如下:

File "/kiwi/helly/ktransformers/ktransformers/ktransformers/local_chat.py", line 110, in local_chat
optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf
inject(module, optimize_config, model_config, gguf_loader)
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/optimize/optimize.py", line 31, in inject
module_cls=getattr(import(import_module_name, fromlist=[""]), import_class_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/models.py", line 22, in
from ktransformers.operators.dynamic_attention import DynamicScaledDotProductAttention
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/dynamic_attention.py", line 19, in
from ktransformers.operators.cpuinfer import CPUInfer, CPUInferKVCache
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/cpuinfer.py", line 25, in
import cpuinfer_ext
ImportError: /root/miniconda3/envs/ktransformers/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /kiwi/helly/ktransformers/ktransformers/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so)

解决方案:
先执行如下语句,在拉起项目服务:

# 先执行如下
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6# 在执行如下
python ktransformers/server/main.py \--port 10002 \--model_path /home/data_c/KT_data/config \--gguf_path /home/data_c/KT_data/dir \--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \--max_new_tokens 1024 \--cache_lens 32768 \--chunk_size 256 \--max_batch_size 4 \--backend_type balance_serve \--force_think # useful for R1

其他安装指令

conda create --name ktransformers python=3.11
conda activate ktransformers
conda install -c conda-forge libstdcxx-ng
sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
sudo env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

报错三:
项目启动后,加载模型参数阶段,出现如下报错:

Use docker and upgrade to 0.2.3 version
ktransformers 0.2.3.post1+cu121torch23fancyktransformers --gguf_path /models/GGUF --model_path /models --cpu_infer 190 --optimize_config_path /models/DeepSeek-V3-Chat-multi-gpu.yaml --port 10002 --max_new_tokens 8192 --host 0.0.0.0 --cache_lens 32768 --total_context 32768 --force_think --no-use_cuda_graph --model_name DeepSeek-R1Get warnings below but still can work.
No idea what's wrong.loading blk.3.attn_q_a_norm.weight to cuda:6
loading blk.3.attn_kv_a_norm.weight to cuda:6
loading blk.3.attn_kv_b.weight to cuda:6
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
set_mempolicy: Operation not permitted
set_mempolicy: Operation not permitted

报错原因:这是使用docker部署时,容器没有相应的权限,需要在创建容器是加上参数

--privileged

即:

docker run -it -d --gpus all --privileged  -p 8083:8080  -p 8084:8084   -p 8055:22 --name KT_ubuntu2204_cuda124_cudnn8700 -v /home/data_c/KT_data/:/home/data_c/KT_data/   -v /home/data_a/gqr/:/home/data_a  afa4f07f5e5e /bin/bash

其他安装指令:

conda create --name ktransformers python=3.11
conda activate ktransformers
conda install -c conda-forge libstdcxx-ng
sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
sudo env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

在这里插入图片描述

apt install libnuma-dev
export USE_NUMA=1
env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

6、利用docker安装KTransformers

本文安装的版本是0.2.1版本
在这里插入图片描述

拉取官方镜像指令:

docker pull approachingai/ktransformers:0.2.1

创建对应容器指令:

docker run  --gpus all -p 8077:8082  --privileged    -v /home/data_c/KT_data/:/models --name ktransformers -itd approachingai/ktransformers:0.2.1

进入容器内部:

docker exec -it ktransformers /bin/bash

启动模型对话:

python -m ktransformers.local_chat  --gguf_path /models/DeepSeek-V3 --model_path /models/deepseek-ai/DeepSeek-V3 --cpu_infer 33

如果运行上述语句报错,提示如下:

Illegal instruction (core dumped)

解决的方案是:
在该容器环境内,重新编译KTransformers,执行

bash install.sh

启动后,出现如下:
在这里插入图片描述

官方也提供OpenAI接口的形式:

运行如下指令:

python ktransformers/server/main.py   --port 8082   --model_path /models/deepseek-ai/DeepSeek-V3   --gguf_path  /models/DeepSeek-V3   --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml   --max_new_tokens 1024   --cache_lens 32768      --max_batch_size 4      --force_think True  --model_name DeepSeek-R1

在这里插入图片描述
运行后:
在这里插入图片描述

访问方式:

curl -X POST http://localhost:8082/v1/chat/completions \-H "accept: application/json" \-H "Content-Type: application/json" \-d '{"messages": [{"role": "user", "content": "hello"}],"model": "DeepSeek-R1","temperature": 0.3,"top_p": 1.0,"stream": true}'

参数:

usage: kvcache.ai [-h] [--host HOST] [--port PORT] [--ssl_keyfile SSL_KEYFILE] [--ssl_certfile SSL_CERTFILE] [--web WEB] [--model_name MODEL_NAME] [--model_dir MODEL_DIR] [--model_path MODEL_PATH] [--device DEVICE][--gguf_path GGUF_PATH] [--optimize_config_path OPTIMIZE_CONFIG_PATH] [--cpu_infer CPU_INFER] [--type TYPE] [--paged PAGED] [--total_context TOTAL_CONTEXT] [--max_batch_size MAX_BATCH_SIZE][--max_chunk_size MAX_CHUNK_SIZE] [--max_new_tokens MAX_NEW_TOKENS] [--json_mode JSON_MODE] [--healing HEALING] [--ban_strings BAN_STRINGS] [--gpu_split GPU_SPLIT] [--length LENGTH] [--rope_scale ROPE_SCALE][--rope_alpha ROPE_ALPHA] [--no_flash_attn NO_FLASH_ATTN] [--low_mem LOW_MEM] [--experts_per_token EXPERTS_PER_TOKEN] [--load_q4 LOAD_Q4] [--fast_safetensors FAST_SAFETENSORS] [--draft_model_dir DRAFT_MODEL_DIR][--no_draft_scale NO_DRAFT_SCALE] [--modes MODES] [--mode MODE] [--username USERNAME] [--botname BOTNAME] [--system_prompt SYSTEM_PROMPT] [--temperature TEMPERATURE] [--smoothing_factor SMOOTHING_FACTOR][--dynamic_temperature DYNAMIC_TEMPERATURE] [--top_k TOP_K] [--top_p TOP_P] [--top_a TOP_A] [--skew SKEW] [--typical TYPICAL] [--repetition_penalty REPETITION_PENALTY] [--frequency_penalty FREQUENCY_PENALTY][--presence_penalty PRESENCE_PENALTY] [--max_response_tokens MAX_RESPONSE_TOKENS] [--response_chunk RESPONSE_CHUNK] [--no_code_formatting NO_CODE_FORMATTING] [--cache_8bit CACHE_8BIT] [--cache_q4 CACHE_Q4][--ngram_decoding NGRAM_DECODING] [--print_timings PRINT_TIMINGS] [--amnesia AMNESIA] [--batch_size BATCH_SIZE] [--cache_lens CACHE_LENS] [--log_dir LOG_DIR] [--log_file LOG_FILE] [--log_level LOG_LEVEL][--backup_count BACKUP_COUNT] [--db_type DB_TYPE] [--db_host DB_HOST] [--db_port DB_PORT] [--db_name DB_NAME] [--db_pool_size DB_POOL_SIZE] [--db_database DB_DATABASE] [--user_secret_key USER_SECRET_KEY][--user_algorithm USER_ALGORITHM] [--force_think FORCE_THINK] [--web_cross_domain WEB_CROSS_DOMAIN] [--file_upload_dir FILE_UPLOAD_DIR] [--assistant_store_dir ASSISTANT_STORE_DIR] [--prompt_file PROMPT_FILE]

报错:

在docker容器内运行0.2.1版本的API时,会出现以下报错,解决方案是拉取0.2.0版本的docker容器,构建项目

During handling of the above exception, another exception occurred:Exception Group Traceback (most recent call last):
| File "/usr/python310/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/usr/python310/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
| return await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/applications.py", line 112, in call
| await self.middleware_stack(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in call
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in call
| await self.app(scope, receive, _send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call
| await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 715, in call
| await self.middleware_stack(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 261, in call
| async with anyio.create_task_group() as task_group:
| File "/usr/python310/lib/python3.10/site-packages/anyio/_backends/asyncio.py", line 767, in aexit
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 264, in wrap
| await func()
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 245, in stream_response
| async for chunk in self.body_iterator:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 80, in check_client_link
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 93, in to_stream_reply
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 87, in add_done
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 107, in filter_chat_chunk
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/api/openai/endpoints/chat.py", line 34, in inner
| async for token in interface.inference(input_message,id):
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/ktransformers.py", line 181, in inference
| async for v in super().inference(local_messages, thread_id):
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/transformers.py", line 340, in inference
| for t in self.prefill(input_ids, self.check_is_new(thread_id)):
| File "/usr/python310/lib/python3.10/site-packages/torch/utils/contextlib.py", line 36, in generator_context
| response = gen.send(None)
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/ktransformers.py", line 130, in prefill
| self.cache.reset()
| File "/mnt/data/ktransformers-0.2.1/ktransformers/models/custom_cache.py", line 175, in reset
| self.value_cache[layer_idx].zero()
| AttributeError: 'NoneType' object has no attribute 'zero'

在这里插入图片描述

在这里插入图片描述

修改后:

    def reset(self):"""Resets the cache values while preserving the objects"""for layer_idx in range(len(self.key_cache)):# In-place ops prevent breaking the static addressif self.key_cache[layer_idx] is not None:self.key_cache[layer_idx].zero_()if self.value_cache[layer_idx] is not None:self.value_cache[layer_idx].zero_()

或者:
Should be fixed by latest release0.2.1.post1

在这里插入图片描述


http://www.mrgr.cn/news/98223.html

相关文章:

  • 句句翻译。
  • mysql安装-MySQL MGR(Group Replication)+ ProxySQL 架构
  • 【初入职场】文件地狱大逃亡:运维侠Python自动化逆袭之路4整理术(日省3h摸鱼真经)
  • 探秘数据库连接池:HikariCP与Tomcat JDBC
  • 第16届蓝桥杯c++省赛c组个人题解
  • Rasa 模拟实现超简易医生助手(适合初学练手)
  • Google 官方提示工程 (Prompt Engineering)白皮书 总结
  • JavaWeb-04-Web后端基础(SpringBootWeb、HTTP协议、分层解耦、IOC和DI)
  • Agent革命:Google AI白皮书解密未来智能体的进化之路
  • 双指针、滑动窗口
  • FTXUI 笔记(五)——基本交互组件
  • Java—— 文字版格斗游戏
  • 一种基于学习的多尺度方法及其在非弹性碰撞问题中的应用
  • 【Linux实践系列】:匿名管道收尾+完善shell外壳程序
  • # Shell脚本参数设计规范(DeepSeek指导)
  • 大模型到底是怎么产生的?一文揭秘大模型诞生全过程
  • Redis之缓存更新策略
  • Ubuntu系统美化
  • 回顾CSA,CSA复习
  • SpringAi 会话记忆功能