learnedbf.BF package

Module contents

class learnedbf.BF.BloomFilter(n=None, epsilon=None)

Bases: object

Base BloomFilter class.

estimate_FPR(X, y=None)

Compute the empirical false positive rate of the Bloom Filter on a set of queries.

Parameters:

X (array of numerical arrays) – queries to be checked.
y (array of bool) – labels for the elements in X, defaults to None.

Returns:

empirical false positive rate.

Return type:

float

Note: the rate is computed only on non-key values. Thus, if y is provided, all true labels (that is, keys) are removed; otherwise, all elements in X are assumed to be non-key values. When dealing with learned filters, it is important that X does not contain any non-key used to build the filter, in order to avoid overfitting in the empirical FPR estimate.

abstract classmethod fit(X, y=None): Build the Bloom Filter. Abstract method implemented in subclasses.

abstract classmethod get_size(): Return the Bloom Filter size (in bits). Abstract method implemented in subclasses.

abstract classmethod predict(X): Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter. Abstract method implemented in subclasses.

class learnedbf.BF.ClassicalBloomFilter(n=None, epsilon=None, m=None, filter_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>)

Bases: BloomFilter, BaseEstimator, ClassifierMixin

Implementation of a classical Bloom Filter.

add(x)

Delegate add method to the bloom filter implementation

Parameters:: x – key to be added to the filter.

check(x)

Delegate check method to the bloom filter implementation

Parameters:: x – key to be added to the filter.

fit(X, y=None)

Build the Bloom Filter.

Parameters:

X (array of numbers) – array containing keys.
y (array of bool) – array containing labels for X’s elements, defaults to None.

Returns:

the fit Bloom Filter instance.

Return type:

BloomFilter

Raises:

ValueError if X is empty.

get_size()

Return the Bloom Filter size.

Returns:: dictionary describing the overall size of the Bloom Filter, in which the key ‘bitmap’ is associated to the number of bits required by the bitmap of the filter.
Return type:: dict

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter.

Parameters:: X (array of numerical arrays) – queries to be checked.
Returns:: prediction for each value in X.
Return type:: array of bool

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ClassicalBloomFilter

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class learnedbf.BF.ClassicalBloomFilterImpl(n=None, epsilon=None, m=None)

Bases: object

add(item): Add an item in the filter

check(item): Check for existence of an item in filter

classmethod get_hash_count(m, n)

Return the hash function(k) to be used using following formula k = (m/n) *)

mint: size of(k)ar
nint: number (k)ems exped to be stored in filter

get_size()

class learnedbf.BF.PyBloomLiveAdapter(n=None, epsilon=None, m=None)

Bases: object

Adapter allowing to use objects of pybloom_live.BloomFilter using the same interface of pybloom.BloomFilter.

add(x)

Delegate add method to pybloom_live.BloomFilter

Parameters:: x (numerical array) – key to be added to the filter.

check(x)

Implements check of pybloom.BloomFilter interface in terms of the implementation of operator in in pybloom_live.BloomFilter.

Parameters:: x (numerical array) – query to be checked for inclusion in the filter.
Returns:: True if the query is predicted to be in the filter, False otherwise.
Type:: bool

get_size()

Return the size of the Bloom Filter (in bits).

Returns:: size of the bit array (in bits)
Return type:: int

class learnedbf.BF.VarhashBloomFilter(m, k_max)

Bases: BaseEstimator, ClassifierMixin, BloomFilter

add(key, k)

Adds a key to the filter using a specified number of hash functions.

Parameters:

key (int) – the key to add.
k (int) – the number of hash function to use.

check(key, k)

Test the key membership using a specified number of hash functions.

Parameters:

key (int) – the key to be checked.
k (int) – the number of hash function to use.

fit(X, y=None, K=None)

Build the Bloom Filter.

Parameters:

X (array of numbers) – array containing keys.
y (array of bool) – array containing labels for X’s elements, defaults to None.
K (array of int) – array containing the number of hash functions to be used for each element, defaults to`None`.

Returns:

the fit Varhash Bloom Filter instance.

Return type:

VarhashBloomFilter

Raises:

ValueError if X is empty.

Raises:

ValueError if K is None or X and K lengths don’t match.

get_size(): Return the Bloom Filter size (in bits). Abstract method implemented in subclasses.

predict(X, K=None)

Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter.

Parameters:

X (array of numerical arrays) – queries to be checked.
K (array of int) – array containing the number of hash functions to be used for each element, defaults to`None`.

Returns:

prediction for each value in X.

Return type:

array of bool

Raises:

ValueError if K is None or X and K lengths don’t match.

set_fit_request(*, K: bool | None | str = '$UNCHANGED$') → VarhashBloomFilter

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: K (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for K parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_predict_request(*, K: bool | None | str = '$UNCHANGED$') → VarhashBloomFilter

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: K (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for K parameter in predict.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → VarhashBloomFilter

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

learnedbf.BF.check_params_(n, epsilon, m)

Check the consistency of the arguments passed to the constructor.

Parameters:

n (int) – number of keys, defaults to None.
epsilon (float) – expected false positive rate, defaults to None.
m (int) – size (in bits) of the bitmap, defaults to None.

Returns:

a tuple (n, epsilon, m) containing the values of the three parameters once consistency has been verified or enforced.

Type:

tuple

Note: the three parameters are linked by the following relation:

\[\]

m = -(n cdot log epsilon) / ((log 2)^2)

thus if they are all specified, the function will verify that this relation holds, rasing ValueError otherwise; when two out of three parameters are specified, the unspecified one will be obtained by enforcing this relation; in all other cases, a ValueError will be raised.

class learnedbf.BF.hashfunc(m): Bases: object