当前位置：首页 > news >正文

实战指南：封装Whisper为FastAPI接口并实现高并发处理-附整合包

news 2025/7/1 7:02:01

实战指南：封装Whisper为FastAPI接口并实现高并发处理

下面给出一个详细的示例，说明如何使用 FastAPI 封装 OpenAI 的 Whisper 模型，提供一个对外的 REST API 接口，并支持一定的并发请求。

下面是主要步骤和示例代码。

1. 环境准备

Python 环境： 建议使用 Python 3.8+。
依赖库：
- FastAPI：轻量级、高性能的 Python web 框架。
- Uvicorn：用于运行 FastAPI 的 ASGI 服务器。
- Whisper：开源的语音识别模型，依赖于 PyTorch，因此需提前安装 torch（根据具体设备配置选择版本）。

可以使用 pip 安装依赖：

pip install fastapi uvicorn
# pip install git+https://github.com/openai/whisper.git 这个网络问题比较大
pip install openai-whisper
pip install torch  # 根据硬件环境选择合适的版本# torch
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

2. 项目结构

项目目录结构可以如下：

whisper_fastapi/
├── models
├── app.py
└── requirements.txt

其中 requirements.txt 可写入：

fastapi
uvicorn
torch
# git+https://github.com/openai/whisper.git
openai-whisper

3. 编写 FastAPI 应用

在 app.py 中完成以下主要内容：

模型加载
为了避免每次请求都重复加载模型，建议在应用启动时加载一次模型，可以定义为全局变量。
接口定义
使用 POST 接口接收音频文件（例如 MP3、WAV 等）通过文件上传方式。注意这里使用 UploadFile 与 File。
并发执行
由于 Whisper 的转录过程比较耗时且是 CPU 或 GPU 密集型的计算，我们可以将其放入线程池中执行。FastAPI 中通过 asyncio.get_running_loop().run_in_executor(...) 调用同步的转录方法，让异步接口可以处理并发。

下面给出示例代码：

import sys
import shutil
import tempfile
import asyncio
import warnings
import torch  # 用于检测 CUDA 是否可用
from fastapi import FastAPI, UploadFile, File, HTTPException, Query
from fastapi.responses import JSONResponse
import whisper  # 导入 OpenAI 的 Whisper 模型
from concurrent.futures import ThreadPoolExecutor# 检查 ffmpeg 是否可用
if shutil.which("ffmpeg") is None:sys.exit("错误：未找到 ffmpeg。请下载并安装 ffmpeg，并确保其所在目录已添加到系统 PATH 环境变量中。")app = FastAPI(title="Whisper FastAPI 接口")# 指定本地模型文件存储目录，事先要将下载好的模型文件放入该目录
local_model_dir = "./models"# 用一个全局字典缓存加载过的模型，避免重复加载
loaded_models = {}# 自动检测设备：如果 CUDA 可用则使用 GPU，否则使用 CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cpu":warnings.filterwarnings("ignore", message="FP16 is not supported on CPU; using FP32 instead")
print(f"使用的设备：{device}")def load_model_if_needed(model_name: str):"""检查全局字典中是否已存在指定 model_name 对应的模型，如果不存在，则从本地目录加载模型并保存到缓存中，并使用 device 参数确保模型加载到正确的设备上。"""if model_name not in loaded_models:try:model = whisper.load_model(model_name, download_root=local_model_dir, device=device)loaded_models[model_name] = modelexcept Exception as e:raise RuntimeError(f"加载 Whisper 模型 {model_name} 失败，请检查本地模型文件是否存在或模型路径配置是否正确") from ereturn loaded_models[model_name]# 创建线程池，用于并发处理（模型加载和转录过程可能较为耗时）
executor = ThreadPoolExecutor(max_workers=4)def transcribe_audio(model, file_path: str) -> dict:"""对给定音频文件进行转录，返回转录结果。根据设备自动启用或禁用 fp16 模式：- GPU：fp16=True- CPU：fp16=False"""try:result = model.transcribe(file_path, fp16=(device == "cuda"))return resultexcept Exception as e:return {"error": str(e)}@app.post("/transcribe")
async def transcribe(file: UploadFile = File(...),model_name: str = Query("base", description="使用的模型名称（如：tiny, base, small, medium, large）")
):"""接收上传的音频文件及可选参数 model_name，通过指定的模型进行转录并返回结果。"""if file.content_type not in ["audio/wav","audio/x-wav","audio/wave","audio/x-pn-wav","audio/mpeg","audio/mp3"]:raise HTTPException(status_code=400, detail="文件类型不支持，请上传 WAV 或 MP3 格式的音频文件")# 保存上传文件到临时文件try:suffix = "." + file.filename.split(".")[-1]except Exception:suffix = ".wav"try:with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp:contents = await file.read()tmp.write(contents)tmp_path = tmp.nameexcept Exception as e:raise HTTPException(status_code=500, detail="保存临时文件失败")# 使用线程池加载指定的模型（如果尚未加载）loop = asyncio.get_running_loop()try:model = await loop.run_in_executor(executor, load_model_if_needed, model_name)except Exception as e:raise HTTPException(status_code=500, detail=str(e))# 异步调用转录任务，放入线程池执行以避免阻塞事件循环transcription_result = await loop.run_in_executor(executor, transcribe_audio, model, tmp_path)if "error" in transcription_result:raise HTTPException(status_code=500, detail=transcription_result["error"])return JSONResponse(content=transcription_result)if __name__ == "__main__":import uvicornuvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)

4. 运行与部署

本地测试
在项目目录下运行：
```
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
```
访问 http://localhost:8000/docs 可查看 FastAPI 自动生成的 API 文档，测试接口。
并发支持说明
- 这里我们通过 ThreadPoolExecutor 将转录任务分发到子线程上，利用多线程来处理阻塞的 CPU 密集型任务，支持一定的并发。
- 在正式生产环境中，建议考虑使用 GPU 加速模型推理，并根据服务器硬件资源配置合理的线程数或进程数。另外，也可使用 Uvicorn 的多进程启动，例如：
```
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4
```
容错与日志
根据需要可以增加异常处理、日志记录和监控，这里给出一个简单示例，您可以根据需求扩展。

5. 总结

环境搭建与依赖安装：确保 Python、FastAPI、Uvicorn、Whisper 及其相关依赖正确安装。
全局加载模型：避免重复加载模型，提高接口响应效率。
接口实现：使用 FastAPI 实现 /transcribe 接口，通过上传文件参数进行音频转录。
http://localhost:8000/transcribe?model_name=base
并发处理：将耗时的模型转录调用放置在线程池中执行，并结合 uvicorn 部署参数进一步扩展并发。

这样，一个简单的封装了 Whisper 模型的 FastAPI 接口就搭建完成了，可以支持并发调用，对外提供语音转文本的服务。

查看全文

http://www.mrgr.cn/news/98507.html