在ComfyUI的python_embeded下编译安装module
在ComfyUI中安装某些module时,并不能直接获得whl文件安装,而是拉取module项目包,并编译生成whl后再安装。例如 insightface、flash-attn、dlib、xformers等。
由于遇到一个问题,尝试需要安装的包没有找到可用的whl文件,于是再次有了这次的尝试,并成功编译安装了insightface。
(先按照1、2、3、4 点配置好环境)
1、首先是要下载安装 Visual Studio 生成工具 2022:
后来发现,生成工具版本不是越高越好,在将生成工具升级到版本17.12后,发现想在python3.11编译flash_attn时不成功, 查到这里有文章《CUDA compatibility with Visual Studio 2022 version 17.10》说:编译工具版本还需要与cuda版本匹配,而我的cuda12.4,按照文章所说,生成工具版本降到17.10比较稳妥。
只好卸载,从这里下载 Fixed version bootstrappers 安装固定版本:
参考文章《Windows 如何仅安装 MSVC 而不安装 Visual Studio》(好像除了下面设置有用外,没发现其他设置起什么作用,就是有与没有没什么区别)设置环境变量(打开设置--->系统--->关于--->高级系统设置--->环境变量):
下面路径指向 cl.exe ,以令到 错误“H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:382: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。”消失。
path=C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\bin\Hostx64\x64;%path%
2、安装中提示:无法打开包括文件: “Python.h”
错误信息:无法打开包括文件: “Python.h”:
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -Iinsightface/thirdparty/face3d/mesh/cython -IH:\V.0.2.7\python_embeded\Lib\site-packages\numpy\core\include -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" /EHsc /Tpinsightface/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp /Fobuild\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.objmesh_core_cython.cppinsightface/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp(36): fatal error C1083: 无法打开包括文件: “Python.h”: No such file or directoryerror: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.41.34120\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2[end of output]note: This error originates from a subprocess, and is likely not a problem with pip.ERROR: Failed building wheel for insightface
Failed to build insightface
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (insightface)
搜索过很多文章说如何安装python-dev包什么的,都不靠谱(未能解决我的问题),最后还是直接下载 python完整安装包(与ComfyUI的python_embeded相同版本,3.12安装大约345M),安装到某个路径下,并且安装时不勾选设置环境变量(为防止因为环境变量而出现影响comfyUI内嵌python的情况发生),并将文件夹include复制到H:\V.0.2.7\python_embeded\include。
无法打开包括文件: “Python.h” 问题解决。
3、无法打开文件“python312.lib”
同样,将完整安装包路径下的文件夹 libs 复制到H:\V.0.2.7\python_embeded\libs即可解决问题
H:\V.0.2.7\python_embeded\Lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\link.exe" /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:H:\V.0.2.7\python_embeded\libs /LIBPATH:H:\V.0.2.7\python_embeded /LIBPATH:H:\V.0.2.7\python_embeded\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64" /EXPORT:PyInit_mesh_core_cython build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core.obj build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.obj /OUT:build\lib.win-amd64-cpython-312\insightface\thirdparty\face3d\mesh\cython\mesh_core_cython.cp312-win_amd64.pyd /IMPLIB:build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython\mesh_core_cython.cp312-win_amd64.libLINK : fatal error LNK1104: 无法打开文件“python312.lib”error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.41.34120\\bin\\HostX86\\x64\\link.exe' failed with exit code 1104[end of output]note: This error originates from a subprocess, and is likely not a problem with pip.ERROR: Failed building wheel for insightface
Failed to build insightface
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (insightface)H:\V.0.2.7>
4、ninja.exe
从 GitHub - ninja-build/ninja: a small build system with a focus on speed 的自述文件可知到,我们只需要生成的exe即可:
从 Releases · ninja-build/ninja · GitHub 下载最新适用的windows版本解压得到ninja.exe,复制到 path 路径指向的任意某个文件夹即可。
设置好后, 警告错误:warnings.warn(msg.format('we could not find ninja.')) 消失。
在我的系统中,下面2个路径都有:
C:\Users\Monday\AppData\Local\Microsoft\WinGet\Packages\Ninja-build.Ninja_Microsoft.Winget.Source_8wekyb3d8bbweC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja
第一个路径文件较小只有557k为最新版本 1.12.1,第二个路径文件较大有2.42M版本为1.11.0(但ninja 官方GitHub上1.11.0版本也只有557k大小,这个估计微软编译的)
5、编译 insightface 并安装成功:
运行也没有什么问题。
6、编译 flash-attn 并安装成功
网上看过文章,知道在Windows下编译 时间很长,开始编译,经过漫长的等待(约3个小时左右吧),报如下错误:
tmpxft_000038e4_00000000-7_flash_fwd_hdim64_bf16_sm80.compute_90.cudafe1.cpp"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc" -c csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.cu -o build\temp.win-amd64-cpython-312\Release\csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.obj -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 --use-local-envflash_fwd_hdim64_fp16_causal_sm80.cucl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_OPERATORS__”(用“/U__CUDA_NO_HALF_OPERATORS__”)cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_CONVERSIONS__”(用“/U__CUDA_NO_HALF_CONVERSIONS__”)cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF2_OPERATORS__”(用“/U__CUDA_NO_HALF2_OPERATORS__”)cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_BFLOAT16_CONVERSIONS__”(用“/U__CUDA_NO_BFLOAT16_CONVERSIONS__”)flash_fwd_hdim64_fp16_causal_sm80.cuc1xx: fatal error C1083: 无法打开源文件: “C:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn\src\flash_fwd_hdim64_fp16_causal_sm80.cu”: No such file or directory
搜索到文章《win10下cuda12.1 +troch2.4.1+vs2022环境下编译安装flash-attn》中提到:安装cutlass库,该库为编译flash-attn的必须依赖。
于是执行命令安装nvidia-cutlass后再安装flash_attn:
python_embeded\python.exe -m pip install nvidia-cutlass
python_embeded\python.exe -m pip install flash_attn
再次经过更漫长的等待(耗时4小时45分),终于编译成功并安装。网上也没找到flash-attn-2.7.0.post2的whl安装包,这次自己终于编译得到了。
在等待过程中,怕会失败,也继续搜索看看第一次失败的原因是什么,有文章《nvcc fatal : Could not open output file '/tmp/tmpxft_00003d04_00000000'》说是因为用户权限没法读写文件导致,不过既然成功了,看来我的错误就是因为没安装 nvidia-cutlass 。
重装编译环境(Visual Studio 生成工具 2022、CUDA Toolkit),然后在python3.11下再次编译,耗时2个小时(这次看到CPU占用率100%,第一次时好像占用率不是很高,忘记了。)
7、编译安装开发版的xformers-0.0.29.dev940 失败
因为运行时遇到xformers报下面这个错误,才折腾尝试自己编译安装module:xformers-0.0.29.dev940。
Loading PuLID-Flux model.
!!! Exception during processing !!! No operator found for `memory_efficient_attention_forward` with inputs:query : shape=(1, 577, 16, 64) (torch.bfloat16)key : shape=(1, 577, 16, 64) (torch.bfloat16)value : shape=(1, 577, 16, 64) (torch.bfloat16)attn_bias : <class 'NoneType'>p : 0.0
`fa2F@v2.6.3-24-gbdf733b` is not supported because:requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)bf16 is only supported on A100+ GPUs
`cutlassF-pt` is not supported because:bf16 is only supported on A100+ GPUs
Traceback (most recent call last):File "H:\V.0.2.7\ComfyUI\execution.py", line 323, in executeoutput_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\ComfyUI\execution.py", line 198, in get_output_datareturn_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\ComfyUI\execution.py", line 169, in _map_node_over_listprocess_inputs(input_dict, i)File "H:\V.0.2.7\ComfyUI\execution.py", line 158, in process_inputsresults.append(getattr(obj, func)(**inputs))^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\pulidflux.py", line 382, in apply_pulid_fluxid_cond_vit, id_vit_hidden = eva_clip(face_features_image, return_all_features=False, return_hidden=True, shuffle=False)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_implreturn forward_call(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 544, in forwardx, hidden_states = self.forward_features(x, return_all_features, return_hidden, shuffle)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 531, in forward_featuresx = blk(x, rel_pos_bias=rel_pos_bias)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_implreturn forward_call(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 293, in forwardx = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias, attn_mask=attn_mask))^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_implreturn forward_call(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 208, in forwardx = xops.memory_efficient_attention(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 306, in memory_efficient_attentionreturn _memory_efficient_attention(^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 467, in _memory_efficient_attentionreturn _memory_efficient_attention_forward(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 486, in _memory_efficient_attention_forwardop = _dispatch_fw(inp, False)^^^^^^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\dispatch.py", line 135, in _dispatch_fwreturn _run_priority_list(^^^^^^^^^^^^^^^^^^^File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\dispatch.py", line 76, in _run_priority_listraise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:query : shape=(1, 577, 16, 64) (torch.bfloat16)key : shape=(1, 577, 16, 64) (torch.bfloat16)value : shape=(1, 577, 16, 64) (torch.bfloat16)attn_bias : <class 'NoneType'>p : 0.0
`fa2F@v2.6.3-24-gbdf733b` is not supported because:requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)bf16 is only supported on A100+ GPUs
`cutlassF-pt` is not supported because:bf16 is only supported on A100+ GPUsPrompt executed in 222.85 seconds
继续编译安装开发版的xformers-0.0.29.dev940,看看是否能解决我遇到运行中报错的问题。
报错:
只要报错信息如下:Building wheel for xformers (setup.py) ... errorerror: subprocess-exited-with-error× python setup.py bdist_wheel did not run successfully.│ exit code: 1╰─> [5830 lines of output]fatal: not a git repository (or any of the parent directories): .gitrunning bdist_wheelH:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:497: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.warnings.warn(msg.format('we could not find ninja.'))running build
省略 N 行H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:382: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。warnings.warn(f'Error checking compiler version for {compiler}: {error}')Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)[1/85] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.cu -o C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -Xcompiler /Zc:lambda -Xcompiler /Zc:preprocessor -Xcompiler /Zc:__cplusplus -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=sm_90 -DFLASHATTENTION_DISABLE_ALIBI --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C_flashattention -D_GLIBCXX_USE_CXX11_ABI=0FAILED: C:/Users/Monday/AppData/Local/Temp/pip-install-h5wrwrf7/xformers_385ac3dddc8a4e779d876f9cbb34ec19/build/temp.win-amd64-cpython-312/Release/Users/Monday/AppData/Local/Temp/pip-install-h5wrwrf7/xformers_385ac3dddc8a4e779d876f9cbb34ec19/third_party/flash-attention/csrc/flash_attn/src/flash_bwd_hdim192_bf16_causal_sm80.objC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.cu -o C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -Xcompiler /Zc:lambda -Xcompiler /Zc:preprocessor -Xcompiler /Zc:__cplusplus -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=sm_90 -DFLASHATTENTION_DISABLE_ALIBI --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C_flashattention -D_GLIBCXX_USE_CXX11_ABI=0flash_bwd_hdim192_bf16_causal_sm80.cucl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_OPERATORS__”(用“/U__CUDA_NO_HALF_OPERATORS__”)cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_CONVERSIONS__”(用“/U__CUDA_NO_HALF_CONVERSIONS__”)cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF2_OPERATORS__”(用“/U__CUDA_NO_HALF2_OPERATORS__”)cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_BFLOAT16_CONVERSIONS__”(用“/U__CUDA_NO_BFLOAT16_CONVERSIONS__”)
这里省略 N 多行flash_bwd_hdim192_bf16_causal_sm80.cufatal : Could not open output file C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d这里省略 N 多个 fatal : Could not open output file 错误
用 秘塔AI搜索 : cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_OPERATORS__”(用“/U__CUDA_NO_HALF_OPERATORS__”)
环境变量设置问题:
- ③和⒂ 提到,在 Windows 系统上安装 CUDA 时,环境变量设置不正确可能导致编译失败。特别是 ③ 中提到的
vcvars64.bat
脚本,用于设置 Visual Studio 的环境变量。- 解决方案:在安装 CUDA 之前,运行
vcvars64.bat
脚本来设置正确的环境变量。确保在命令提示符中运行该脚本,并且路径正确。
运行 C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build>vcvars64.bat
设置环境变量后,错误少了很多。
但根据错误再次设置了参数 set
DISTUTILS_USE_SDK=1
后,错误依旧一样多(与原来一样)。
而文心一言的答案则是(看起来更靠谱?因为就是冲突,导致命令没有结果:没有文件生成):
暂时不折腾这个了。
8、关于ModuleNotFoundError: No module named 'distutils.msvccompiler' 的问题 --> 编译安装APEX 成功,实际无效。
错误信息:ModuleNotFoundError: No module named 'distutils.msvccompiler'
H:\V.0.2.7>python_embeded\python.exe -m pip install apex
Collecting apexUsing cached apex-0.9.10dev.tar.gz (36 kB)Preparing metadata (setup.py) ... done
Collecting cryptacular (from apex)Using cached cryptacular-1.6.2.tar.gz (75 kB)Installing build dependencies ... doneGetting requirements to build wheel ... donePreparing metadata (pyproject.toml) ... errorerror: subprocess-exited-with-error× Preparing metadata (pyproject.toml) did not run successfully.│ exit code: 2╰─> [4 lines of output]scons: Reading SConscript files ...ModuleNotFoundError: No module named 'distutils.msvccompiler':File "C:\Users\Monday\AppData\Local\Temp\pip-install-whj_jyaw\cryptacular_4162e6ad50164a3baf1cd0472e6f84c1\SConstruct", line 21:import distutils.msvccompiler[end of output]note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed× Encountered error while generating package metadata.
╰─> See above for output.note: This is an issue with the package mentioned above, not pip.
hint: See above for details.H:\V.0.2.7>
查了多篇文章未能解决ModuleNotFoundError: No module named 'distutils.msvccompiler',官网文章:distutils — Building and installing Python modules — Python 3.10.16 documentation 说 distutils is deprecated with removal planned for Python 3.12.
还是到 GitHub 《关于解决ModuleNotFoundError: No module named 'torch'导致安装失败 #1852》 找到答案并编译成功,原来不是所有的 module 都能简单用 “pip module名” 命令进行编译安装。
运行下面代码编译安装
H:\V.0.2.7>git clone https://github.com/NVIDIA/apex.git
H:\V.0.2.7>cd apex
H:\V.0.2.7\apex>..\python_embeded\python.exe -m pip install -v --no-cache-dir .
H:\V.0.2.7>git clone https://github.com/NVIDIA/apex.git
Cloning into 'apex'...
remote: Enumerating objects: 11902, done.
remote: Counting objects: 100% (3970/3970), done.
remote: Compressing objects: 100% (759/759), done.
remote: Total 11902 (delta 3492), reused 3413 (delta 3205), pack-reused 7932 (from 1)
Receiving objects: 100% (11902/11902), 15.61 MiB | 4.25 MiB/s, done.
Resolving deltas: 100% (8321/8321), done.
Updating files: 100% (505/505), done.H:\V.0.2.7>cd apexH:\V.0.2.7\apex>..\python_embeded\python.exe -m pip install -v --no-cache-dir .
Using pip 24.3.1 from H:\V.0.2.7\python_embeded\Lib\site-packages\pip (python 3.12)
Processing h:\v.0.2.7\apexRunning command pip subprocess to install build dependenciesUsing pip 24.3.1 from H:\V.0.2.7\python_embeded\Lib\site-packages\pip (python 3.12)Collecting setuptoolsObtaining dependency information for setuptools from https://files.pythonhosted.org/packages/55/21/47d163f615df1d30c094f6c8bbb353619274edccf0327b185cc2493c2c33/setuptools-75.6.0-py3-none-any.whl.metadataUsing cached setuptools-75.6.0-py3-none-any.whl.metadata (6.7 kB)Collecting wheelObtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadataUsing cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)Using cached setuptools-75.6.0-py3-none-any.whl (1.2 MB)Using cached wheel-0.45.1-py3-none-any.whl (72 kB)Installing collected packages: wheel, setuptoolsCreating C:\Users\Monday\AppData\Local\Temp\pip-build-env-da54lkae\overlay\ScriptsSuccessfully installed setuptools-75.6.0 wheel-0.45.1Installing build dependencies ... doneRunning command Getting requirements to build wheeltorch.__version__ = 2.5.1+cu124running egg_infocreating apex.egg-infowriting apex.egg-info\PKG-INFO。。。。。。。。省略 N 行。。。。removing build\bdist.win-amd64\wheelBuilding wheel for apex (pyproject.toml) ... doneCreated wheel for apex: filename=apex-0.1-py3-none-any.whl size=406607 sha256=206aca315212aa0a76b14de395b6afe1ecdcd4c5fdd61b57986dabb509e83121Stored in directory: C:\Users\Monday\AppData\Local\Temp\pip-ephem-wheel-cache-zwa4z7gq\wheels\65\c7\12\b7e49ba4abd3da74df298dc51ea0f6a086d496566f4310f620
Successfully built apex
Installing collected packages: apex
Successfully installed apex-0.1H:\V.0.2.7\apex>
但还是说没有安装apex:
从文章《NVIDIA APEX安装完全指南及Megatron-LM/Pytorch运行问题解决》 发现,上面使用的编译参数还是不正确的。需要使用官网里面的下面参数编译,但编译未能成功。
..\python_embeded\python.exe -m pip install -v --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" .
9、编译后获得的文件名,不能修改
例如编译后得到的文件为:insightface-0.7.3-cp312-cp312-win_amd64.whl
如果你将文件重命名为:insightface-0.7.3.whl
执行安装时会报错:ERROR: insightface-0.7.3.whl is not a valid wheel filename.
NotImplementedError:找不到 Memory_efficient_attention_forward 的运算符 - stable-diffusion - SO中文参考 - www.soinside.com