vllm.v1.sample.logits_processor.builtin
LogitBiasLogitsProcessor ¶
Bases: LogitsProcessor
Source code in vllm/v1/sample/logits_processor/builtin.py
logits_slice instance-attribute
¶
__init__ ¶
Source code in vllm/v1/sample/logits_processor/builtin.py
_device_tensor ¶
apply ¶
is_argmax_invariant ¶
is_argmax_invariant() -> bool
Logit bias can rebalance token probabilities and change the outcome of argmax in greedy sampling.
update_state ¶
update_state(batch_update: Optional[BatchUpdate])
Source code in vllm/v1/sample/logits_processor/builtin.py
MinPLogitsProcessor ¶
Bases: LogitsProcessor
Source code in vllm/v1/sample/logits_processor/builtin.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
min_p_cpu_tensor instance-attribute
¶
min_p_cpu_tensor = zeros(
(max_num_reqs,),
dtype=float32,
device="cpu",
pin_memory=is_pin_memory,
)
min_p_device instance-attribute
¶
__init__ ¶
__init__(
vllm_config: VllmConfig,
device: device,
is_pin_memory: bool,
)
Source code in vllm/v1/sample/logits_processor/builtin.py
apply ¶
Source code in vllm/v1/sample/logits_processor/builtin.py
get_min_p_by_index ¶
update_state ¶
update_state(batch_update: Optional[BatchUpdate])
Source code in vllm/v1/sample/logits_processor/builtin.py
MinTokensLogitsProcessor ¶
Bases: LogitsProcessor
Source code in vllm/v1/sample/logits_processor/builtin.py
logits_slice instance-attribute
¶
__init__ ¶
__init__(
vllm_config: VllmConfig,
device: device,
is_pin_memory: bool,
)
Source code in vllm/v1/sample/logits_processor/builtin.py
_device_tensor ¶
add_request staticmethod
¶
add_request(
params: SamplingParams,
_: list[int],
output_tok_ids: list[int],
) -> Optional[tuple[int, Sequence[int], set[int]]]
Source code in vllm/v1/sample/logits_processor/builtin.py
apply ¶
is_argmax_invariant ¶
is_argmax_invariant() -> bool
By censoring stop tokens, min-tokens can change the outcome of the argmax operation in greedy sampling.
update_state ¶
update_state(batch_update: Optional[BatchUpdate])
Source code in vllm/v1/sample/logits_processor/builtin.py
process_dict_updates ¶
process_dict_updates(
req_entries: dict[int, T],
batch_update: Optional[BatchUpdate],
new_state: Callable[
[SamplingParams, list[int], list[int]], Optional[T]
],
) -> bool
Utility function to update dict state for sparse LogitsProcessors.