Welcome to hashdial’s documentation!¶
Implements a hash dial for hash based decision making.
Implements, through hashing, decision making that is deterministic on input, but probabilistic across a set of inputs.
For example, suppose a set of components in a distributed system wish to emit a log entry for 1% of requests - but each component should log the same 1% of requests, they could do so as such:
if hashdial.decide(request.id, 0.01):
log_request(request)
Seeds¶
All functions take an optional seed
keyword argument. It is intended to be used in cases where different uses
of the library require orthogonal decision making, or it is desirable to make the decision making unpredictable. In
particular:
Avoiding untrusted input being tailored to be biased with respect to the hashing algorithm requires use of a seed that is not known to the untrusted source.
Filtering data which is the output of a previous filtering step using the same mechansim, requires use of a different seed in order to get correct behavior.
For example, filtering to keep 1% of lines in a file followed by applying the same filter again will result in no change in output relative to just filtering once - since line that was kept the first time will also be kept the second time.
Determinism across versions¶
Any change to an existing function (including default seed and choice of hashing algorithm) that would alter the output of the function given the same input, will not be done without a major version bump to the library.
API¶
-
hashdial.
decide
(key: bytes, probability: float, *, seed: bytes = b'') → bool¶ Decide between
True
and False` basd onkey
such that the probability ofTrue
for a given input over a large set of unique inputs isprobability
.For example, to retain 25% of lines read from stdin:
for line in sys.stdin: if decide(line.encode('utf-8'), 0.25): sys.stdout.write(line)
- Parameters
key – The bytes to hash.
probability – The probability of a given
key
returning True. Must be in range [0, 1].seed – Seed to hash prior to hashing
key
.
- Returns
Whether to take the action.
-
hashdial.
range
(key: bytes, stop: int, *, start: int = 0, seed: bytes = b'') → int¶ Select an integer in range
[start, stop)
by hashingkey
.Example partitioned filtering of a workload on
stdin
assuming this is partition 3 out of 10:for line in sys.stdin: if range(line.encode('utf-8'), 10) == 3: sys.stdout.write(line)
The difference between stop and start must be sufficiently small to be exactly representable as a float (no larger than
2**(sys.float_info.mant_dig) - 1
).- Parameters
key – The bytes to hash.
stop – The exclusive end of the range of integers among which to select.
start – The inclusive start of the range of integers among which to select.
seed – Seed to hash prior to hashing
key
.
- Returns
The selected integer.
-
hashdial.
select
(key: bytes, seq: Sequence[BucketType], *, seed: bytes = b'') → BucketType¶ Select one of the elements in seq based on the hash of
key
.Example partitioning of input on
stdin
into buckets:bucketed_lines = {} # type: Dict[int, str] for line in sys.stdin: buckets[choice(b, [0, 1, 2, 3, 4, 5])] = line
- Parameters
key – The bytes to hash.
seq – The sequence from which to select an element. Must be non-empty.
seed – Seed to hash prior to hashing b.
- Raises
ValueError – If
seq
is empty.- Returns
One of the elements in
seq
.