vllm - Sawyer Zheng's Blog

部署 embedding 模型

部署 bge-m3 模型

1
CUDA_VISIBLE_DEVICES=1 vllm serve /data/llm-model/bge/bge-m3/ --host 0.0.0.0 --port 15080 --served-model-name chat-embed

curl 调用：

1
2
3
4
5
6
7
curl http://172.16.10.88:15080/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "input": "Your text string goes here",
    "model": "chat-embed"
  }'

文章目录

部署 embedding 模型