Gateway API Inference Extension#
kubernetes-sigs/gateway-api-inference-extension
HF_TOKEN=xxxxx
# --- 安装 GAIE ---
IGW_LATEST_RELEASE=$(curl -s https://api.github.com/repos/kubernetes-sigs/gateway-api-inference-extension/releases \
| jq -r '.[] | select(.prerelease == false) | .tag_name' \
| sort -V \
| tail -n1)
kubectl create secret generic hf-token --from-literal=token="$HF_TOKEN" --dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f "https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/${IGW_LATEST_RELEASE}/manifests.yaml"
# v1.3.1 有 bug
curl -L "https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/vllm/gpu-deployment.yaml" \
| sed -e 's/restartPolicy: IfNotPresent/restartPolicy: Always/' \
-e 's/replicas: 3/replicas: 1/' \
-e '/^kind: Deployment$/,/^---$/ s/^ spec:$/ spec:\
runtimeClassName: nvidia\
nodeSelector:\
accelerator: nvidia/' \
| kubectl apply -f -
# --- 观察状态 ---
kubectl get pod -A -o wide
kubectl get pod -l app=vllm-llama3-8b-instruct -o wide
kubectl describe pod -l app=vllm-llama3-8b-instruct
叶王 © 2013-2026 版权所有。如果本文档对你有所帮助,可以请作者喝饮料。