vLLM deployment advisor

Pulls weight sizes from Hugging Face, estimates KV memory, and suggests tensor parallelism and vllm serve commands. Add several models to estimate total GPUs on your preferred GPU type (separate vLLM instances). Estimates are heuristic — validate on your hardware.

Each model is a separate vllm serve process. Planning assumes tensor-parallel groups do not share GPUs with another model unless you colocate manually.