当前位置：首页 > news >正文

KTransformers安装笔记利用docker安装KTransformers

news 2025/7/10 14:01:39

KTransformers安装笔记

- 0、前提条件
- 1、安装相关依赖
- 2、创建虚拟环境并操作
- 3、安装pytorch==2.6版本
- 4、下载项目源代码：
- - 针对于cpu and 1T RAM硬件条件:
- 5、相关权重文件以及配置文件下载
- 6、利用docker安装KTransformers

0、前提条件

CUDA12.4
建议升级到较新版本的CMake，安装git，g++，gcc

执行以下命令，把 CUDA_HOME 指向你 CUDA 的安装目录（比如 /usr/local/cuda-12.4）

export CUDA_HOME=/usr/local/cuda-12.4
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

source ~/.bashrc

查看是否成功：

which nvcc

输出：表示配置成功

/usr/local/cuda-12.4/bin/nvcc

创建本地容器

docker run -it -d --gpus all --privileged  -p 8083:8080  -p 8084:8084   -p 8055:22 --name KT_ubuntu2204_cuda124_cudnn8700 -v /home/data_c/KT_data/:/home/data_c/KT_data/   -v /home/data_a/gqr/:/home/data_a  afa4f07f5e5e /bin/bash

默认cuda与cudnn已经安装完毕，conda也安装完毕

1、安装相关依赖

sudo apt-get update 
sudo apt-get install build-essential cmake ninja-build patchelf

2、创建虚拟环境并操作

conda create --name ktransformers python=3.11
conda activate ktransformers   # 激活环境
conda install -c conda-forge libstdcxx-ng # Anaconda provides a package called `libstdcxx-ng` that includes a newer version of `libstdc++`, which can be installed via `conda-forge`.

# 查看指令
strings ~/anaconda3/envs/ktransformers/lib/libstdc++.so.6 | grep GLIBCXX

3、安装pytorch==2.6版本

在这里插入图片描述

pip install torch torchvision torchaudio

国内：

pip3 install torch torchvision torchaudio --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install packaging ninja cpufeature numpy  --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple

安装flash-attention

pip install flash_attn

在这里插入图片描述
最好执行上述语句安装，因为这中安装方式成功后，表明相关的nvcc安装成功，对接下来的安装更优把握

也可以直接下载对应cuda与torch版本的.whl包（官网说可以，没亲自尝试过）

https://github.com/Dao-AILab/flash-attention/releases

或者：

pip install flash-attn --find-links https://github.com/Dao-AILab/flash-attn/releases

判断flash-attn是否安装成功执行如下：

import flash_attn
print(flash_attn.__version__)
# 有时 flash_attn 会安装成功但 CUDA 编译失败，你可以进一步测试核心 CUDA 扩展：
from flash_attn.layers.rotary import RotaryEmbedding
from flash_attn.bert_padding import pad_input, unpad_input

正常运行，则表示成功！！！

apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libgflags-dev zlib1g-dev libfmt-devapt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelfapt install libnuma-dev
pip install packaging ninja cpufeature numpy openai

4、下载项目源代码：

git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive

针对与不同情况的安装方式：

针对于cpu and 1T RAM硬件条件:

# For Multi-concurrency with two cpu and 1T RAM:
apt install libnuma-dev
export USE_NUMA=1
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh

经过漫长的时间等待后，编译成功，如下所示：
在这里插入图片描述

5、相关权重文件以及配置文件下载

在这里插入图片描述

下载Deepseek-R1-Q4模型权重

# 安装魔搭社区
pip install modelscope
# 模型权重文件下载
modelscope download --model lmstudio-community/DeepSeek-R1-GGUF  --include DeepSeek-R1-Q4_K_M*  --local_dir ./dir

在这里插入图片描述

下载Deepseek-R1-Q4 config配置文件

modelscope download --model deepseek-ai/DeepSeek-R1 --exclude *.safetensors  --local_dir ./config

在这里插入图片描述

报错类型：

报错一：
编译阶段报错如下：

ERROR: Failed to build installable wheels for some pyproject.toml based projects (ktransformers)

在这里插入图片描述
官网解决方案：

语句：

sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
###############################################################################
就在ktransformers/csrc/balance_serve/CMakeLists.txt里加上一行set(CMAKE_CUDA_STANDARD 17)，应该就可以编译过去。

重新编译后：
在这里插入图片描述

报错二：
问题描述：
执行语句如下

python ktransformers/server/main.py \--port 10002 \--model_path /home/data_c/KT_data/config \--gguf_path /home/data_c/KT_data/dir \--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \--max_new_tokens 1024 \--cache_lens 32768 \--chunk_size 256 \--max_batch_size 4 \--backend_type balance_serve \--force_think # useful for R1

报错显示如下：

File "/kiwi/helly/ktransformers/ktransformers/ktransformers/local_chat.py", line 110, in local_chat
optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf
inject(module, optimize_config, model_config, gguf_loader)
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/optimize/optimize.py", line 31, in inject
module_cls=getattr(import(import_module_name, fromlist=[""]), import_class_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/models.py", line 22, in
from ktransformers.operators.dynamic_attention import DynamicScaledDotProductAttention
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/dynamic_attention.py", line 19, in
from ktransformers.operators.cpuinfer import CPUInfer, CPUInferKVCache
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/cpuinfer.py", line 25, in
import cpuinfer_ext
ImportError: /root/miniconda3/envs/ktransformers/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /kiwi/helly/ktransformers/ktransformers/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so)

解决方案：
先执行如下语句，在拉起项目服务：

# 先执行如下
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6# 在执行如下
python ktransformers/server/main.py \--port 10002 \--model_path /home/data_c/KT_data/config \--gguf_path /home/data_c/KT_data/dir \--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \--max_new_tokens 1024 \--cache_lens 32768 \--chunk_size 256 \--max_batch_size 4 \--backend_type balance_serve \--force_think # useful for R1

其他安装指令

conda create --name ktransformers python=3.11
conda activate ktransformers
conda install -c conda-forge libstdcxx-ng
sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
sudo env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

报错三：
项目启动后，加载模型参数阶段，出现如下报错：

Use docker and upgrade to 0.2.3 version
ktransformers 0.2.3.post1+cu121torch23fancyktransformers --gguf_path /models/GGUF --model_path /models --cpu_infer 190 --optimize_config_path /models/DeepSeek-V3-Chat-multi-gpu.yaml --port 10002 --max_new_tokens 8192 --host 0.0.0.0 --cache_lens 32768 --total_context 32768 --force_think --no-use_cuda_graph --model_name DeepSeek-R1Get warnings below but still can work.
No idea what's wrong.loading blk.3.attn_q_a_norm.weight to cuda:6
loading blk.3.attn_kv_a_norm.weight to cuda:6
loading blk.3.attn_kv_b.weight to cuda:6
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
set_mempolicy: Operation not permitted
set_mempolicy: Operation not permitted

报错原因：这是使用docker部署时，容器没有相应的权限，需要在创建容器是加上参数

--privileged

即：

docker run -it -d --gpus all --privileged  -p 8083:8080  -p 8084:8084   -p 8055:22 --name KT_ubuntu2204_cuda124_cudnn8700 -v /home/data_c/KT_data/:/home/data_c/KT_data/   -v /home/data_a/gqr/:/home/data_a  afa4f07f5e5e /bin/bash

其他安装指令：

conda create --name ktransformers python=3.11
conda activate ktransformers
conda install -c conda-forge libstdcxx-ng
sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
sudo env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

在这里插入图片描述

apt install libnuma-dev
export USE_NUMA=1
env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

6、利用docker安装KTransformers

本文安装的版本是0.2.1版本
在这里插入图片描述

拉取官方镜像指令：

docker pull approachingai/ktransformers:0.2.1

创建对应容器指令：

docker run  --gpus all -p 8077:8082  --privileged    -v /home/data_c/KT_data/:/models --name ktransformers -itd approachingai/ktransformers:0.2.1

进入容器内部：

docker exec -it ktransformers /bin/bash

启动模型对话：

python -m ktransformers.local_chat  --gguf_path /models/DeepSeek-V3 --model_path /models/deepseek-ai/DeepSeek-V3 --cpu_infer 33

如果运行上述语句报错，提示如下：

Illegal instruction (core dumped)

解决的方案是：
在该容器环境内，重新编译KTransformers，执行

bash install.sh

启动后，出现如下：
在这里插入图片描述

官方也提供OpenAI接口的形式：

运行如下指令：

python ktransformers/server/main.py   --port 8082   --model_path /models/deepseek-ai/DeepSeek-V3   --gguf_path  /models/DeepSeek-V3   --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml   --max_new_tokens 1024   --cache_lens 32768      --max_batch_size 4      --force_think True  --model_name DeepSeek-R1

在这里插入图片描述
运行后：

访问方式：

curl -X POST http://localhost:8082/v1/chat/completions \-H "accept: application/json" \-H "Content-Type: application/json" \-d '{"messages": [{"role": "user", "content": "hello"}],"model": "DeepSeek-R1","temperature": 0.3,"top_p": 1.0,"stream": true}'

参数：

usage: kvcache.ai [-h] [--host HOST] [--port PORT] [--ssl_keyfile SSL_KEYFILE] [--ssl_certfile SSL_CERTFILE] [--web WEB] [--model_name MODEL_NAME] [--model_dir MODEL_DIR] [--model_path MODEL_PATH] [--device DEVICE][--gguf_path GGUF_PATH] [--optimize_config_path OPTIMIZE_CONFIG_PATH] [--cpu_infer CPU_INFER] [--type TYPE] [--paged PAGED] [--total_context TOTAL_CONTEXT] [--max_batch_size MAX_BATCH_SIZE][--max_chunk_size MAX_CHUNK_SIZE] [--max_new_tokens MAX_NEW_TOKENS] [--json_mode JSON_MODE] [--healing HEALING] [--ban_strings BAN_STRINGS] [--gpu_split GPU_SPLIT] [--length LENGTH] [--rope_scale ROPE_SCALE][--rope_alpha ROPE_ALPHA] [--no_flash_attn NO_FLASH_ATTN] [--low_mem LOW_MEM] [--experts_per_token EXPERTS_PER_TOKEN] [--load_q4 LOAD_Q4] [--fast_safetensors FAST_SAFETENSORS] [--draft_model_dir DRAFT_MODEL_DIR][--no_draft_scale NO_DRAFT_SCALE] [--modes MODES] [--mode MODE] [--username USERNAME] [--botname BOTNAME] [--system_prompt SYSTEM_PROMPT] [--temperature TEMPERATURE] [--smoothing_factor SMOOTHING_FACTOR][--dynamic_temperature DYNAMIC_TEMPERATURE] [--top_k TOP_K] [--top_p TOP_P] [--top_a TOP_A] [--skew SKEW] [--typical TYPICAL] [--repetition_penalty REPETITION_PENALTY] [--frequency_penalty FREQUENCY_PENALTY][--presence_penalty PRESENCE_PENALTY] [--max_response_tokens MAX_RESPONSE_TOKENS] [--response_chunk RESPONSE_CHUNK] [--no_code_formatting NO_CODE_FORMATTING] [--cache_8bit CACHE_8BIT] [--cache_q4 CACHE_Q4][--ngram_decoding NGRAM_DECODING] [--print_timings PRINT_TIMINGS] [--amnesia AMNESIA] [--batch_size BATCH_SIZE] [--cache_lens CACHE_LENS] [--log_dir LOG_DIR] [--log_file LOG_FILE] [--log_level LOG_LEVEL][--backup_count BACKUP_COUNT] [--db_type DB_TYPE] [--db_host DB_HOST] [--db_port DB_PORT] [--db_name DB_NAME] [--db_pool_size DB_POOL_SIZE] [--db_database DB_DATABASE] [--user_secret_key USER_SECRET_KEY][--user_algorithm USER_ALGORITHM] [--force_think FORCE_THINK] [--web_cross_domain WEB_CROSS_DOMAIN] [--file_upload_dir FILE_UPLOAD_DIR] [--assistant_store_dir ASSISTANT_STORE_DIR] [--prompt_file PROMPT_FILE]

报错：

在docker容器内运行0.2.1版本的API时，会出现以下报错，解决方案是拉取0.2.0版本的docker容器，构建项目

During handling of the above exception, another exception occurred:Exception Group Traceback (most recent call last):
| File "/usr/python310/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/usr/python310/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
| return await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/applications.py", line 112, in call
| await self.middleware_stack(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in call
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in call
| await self.app(scope, receive, _send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call
| await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 715, in call
| await self.middleware_stack(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 261, in call
| async with anyio.create_task_group() as task_group:
| File "/usr/python310/lib/python3.10/site-packages/anyio/_backends/asyncio.py", line 767, in aexit
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 264, in wrap
| await func()
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 245, in stream_response
| async for chunk in self.body_iterator:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 80, in check_client_link
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 93, in to_stream_reply
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 87, in add_done
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 107, in filter_chat_chunk
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/api/openai/endpoints/chat.py", line 34, in inner
| async for token in interface.inference(input_message,id):
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/ktransformers.py", line 181, in inference
| async for v in super().inference(local_messages, thread_id):
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/transformers.py", line 340, in inference
| for t in self.prefill(input_ids, self.check_is_new(thread_id)):
| File "/usr/python310/lib/python3.10/site-packages/torch/utils/contextlib.py", line 36, in generator_context
| response = gen.send(None)
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/ktransformers.py", line 130, in prefill
| self.cache.reset()
| File "/mnt/data/ktransformers-0.2.1/ktransformers/models/custom_cache.py", line 175, in reset
| self.value_cache[layer_idx].zero()
| AttributeError: 'NoneType' object has no attribute 'zero'

在这里插入图片描述

修改后：

    def reset(self):"""Resets the cache values while preserving the objects"""for layer_idx in range(len(self.key_cache)):# In-place ops prevent breaking the static addressif self.key_cache[layer_idx] is not None:self.key_cache[layer_idx].zero_()if self.value_cache[layer_idx] is not None:self.value_cache[layer_idx].zero_()