vllm.compilation.activation_quant_fusion
FUSED_OPS module-attribute
¶
FUSED_OPS: dict[QuantKey, OpOverload] = {
kFp8StaticTensorSym: default
}
silu_and_mul_nvfp4_quant_supported module-attribute
¶
silu_and_mul_nvfp4_quant_supported = is_cuda() and hasattr(
_C, "silu_and_mul_nvfp4_quant"
)
ActivationQuantFusionPass ¶
Bases: VllmInductorPass
This pass fuses a pre-defined set of custom ops into fused ops. It uses the torch pattern matcher to find the patterns and replace them.
Because patterns can only be registered once, the pass is a singleton. This will be addressed in a future version of PyTorch: https://github.com/pytorch/pytorch/pull/139321#issuecomment-2452354980
Source code in vllm/compilation/activation_quant_fusion.py
patterns instance-attribute
¶
__init__ ¶
__init__(config: VllmConfig)
Source code in vllm/compilation/activation_quant_fusion.py
ActivationQuantPattern ¶
Bases: ABC
The base class for Activation+Quant fusions. Should not be used directly.
Source code in vllm/compilation/activation_quant_fusion.py
__init__ ¶
__init__(quant_key: QuantKey)
Source code in vllm/compilation/activation_quant_fusion.py
empty_quant ¶
SiluMulFp8StaticQuantPattern ¶
Bases: ActivationQuantPattern
Fusion for SiluMul+Fp8StaticQuant Pattern
Source code in vllm/compilation/activation_quant_fusion.py
register ¶
Source code in vllm/compilation/activation_quant_fusion.py
SiluMulNvfp4QuantPattern ¶
Bases: ActivationQuantPattern
Fusion for SiluMul+Nvfp4Quant Pattern