vllm.distributed.eplb.eplb_utils ¶
Utility functions for EPLB (Expert Parallel Load Balancing).
CpuGpuEvent ¶
Combines a CUDA event with a CPU threading event to enforce record->wait ordering across two threads.
This class is designed for exactly two threads: one producer that calls record() and one consumer that calls wait(). Using it with more than two threads is not supported and will produce undefined behavior.
CUDA events alone are insufficient for cross-thread synchronization because waiting on an unrecorded CUDA event is a no-op. The wait will return immediately instead of blocking. This class adds a threading.Event so that the waiting thread blocks on the CPU side until record() is called, at which point the CUDA event is guaranteed to be in-flight and event.wait() will correctly synchronize the GPU stream.
Source code in vllm/distributed/eplb/eplb_utils.py
record ¶
record(stream: Stream | None = None)
Unblocks the waiting thread after calling event.record().
Should only be called by the main thread.
Source code in vllm/distributed/eplb/eplb_utils.py
heat_cell ¶
Wrap text in green-to-red ANSI color based on val in [vmin, vmax].
Source code in vllm/distributed/eplb/eplb_utils.py
override_envs_for_eplb ¶
override_envs_for_eplb(
parallel_config: ParallelConfig,
) -> None
Override environment variables for EPLB when specific conditions are met.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parallel_config | ParallelConfig | The parallel configuration object. | required |