Supporting a New Hardware
ServerlessLLM actively expands support for new hardware configurations to meet diverse deployment needs.
Support Standards
Hardware is considered supported by ServerlessLLM if:
- Any of the inference backends used (e.g., Transformers, vLLM) can run model inference on the hardware.
- ServerlessLLM Store can successfully load model checkpoints on the hardware.
Steps to Support a New Hardware
- Check Inference Backend Compatibility: Refer to the specific inference backend documentation (e.g., for vLLM, Transformers) for hardware support.
- ServerlessLLM Store Configuration:
- If the hardware provides CUDA-compatible APIs (e.g., ROCm), adjust the build script (
CMakeLists.txt
) by adding necessary compiler flags. - For non-CUDA-compatible APIs, implementing a custom checkpoint loading function might be required.
- If the hardware provides CUDA-compatible APIs (e.g., ROCm), adjust the build script (
Verifying Hardware Support in ServerlessLLM Store
The hardware support is verified if it successfully completes the Quick Start Guide examples, showcasing checkpoint loading and inference functionality without errors.
If the hardware is not publicly available (i.e., can't be tested by the ServerlessLLM team), a screenshot or output log of the successful execution of the Quick Start Guide examples is required to verify hardware support.
If you encounter any issues or have questions, please reach out to the ServerlessLLM team by raising an issue on the GitHub repository.