learnedbf.BF package

Module contents

class learnedbf.BF.BloomFilter(n=None, epsilon=None)

Bases: object

Base BloomFilter class.

estimate_FPR(X, y=None)

Compute the empirical false positive rate of the Bloom Filter on a set of queries.

Parameters:
  • X (array of numerical arrays) – queries to be checked.

  • y (array of bool) – labels for the elements in X, defaults to None.

Returns:

empirical false positive rate.

Return type:

float

Note: the rate is computed only on non-key values. Thus, if y is provided, all true labels (that is, keys) are removed; otherwise, all elements in X are assumed to be non-key values. When dealing with learned filters, it is important that X does not contain any non-key used to build the filter, in order to avoid overfitting in the empirical FPR estimate.

abstract classmethod fit(X, y=None)

Build the Bloom Filter. Abstract method implemented in subclasses.

abstract classmethod get_size()

Return the Bloom Filter size (in bits). Abstract method implemented in subclasses.

abstract classmethod predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter. Abstract method implemented in subclasses.

class learnedbf.BF.ClassicalBloomFilter(n=None, epsilon=None, m=None, filter_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>)

Bases: BloomFilter, BaseEstimator, ClassifierMixin

Implementation of a classical Bloom Filter.

add(x)

Delegate add method to the bloom filter implementation

Parameters:

x – key to be added to the filter.

check(x)

Delegate check method to the bloom filter implementation

Parameters:

x – key to be added to the filter.

fit(X, y=None)

Build the Bloom Filter.

Parameters:
  • X (array of numbers) – array containing keys.

  • y (array of bool) – array containing labels for X’s elements, defaults to None.

Returns:

the fit Bloom Filter instance.

Return type:

BloomFilter

Raises:

ValueError if X is empty.

get_size()

Return the Bloom Filter size.

Returns:

dictionary describing the overall size of the Bloom Filter, in which the key ‘bitmap’ is associated to the number of bits required by the bitmap of the filter.

Return type:

dict

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter.

Parameters:

X (array of numerical arrays) – queries to be checked.

Returns:

prediction for each value in X.

Return type:

array of bool

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ClassicalBloomFilter

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class learnedbf.BF.ClassicalBloomFilterImpl(n=None, epsilon=None, m=None)

Bases: object

add(item)

Add an item in the filter

check(item)

Check for existence of an item in filter

classmethod get_hash_count(m, n)

Return the hash function(k) to be used using following formula k = (m/n) *)

mint

size of(k)ar

nint

number (k)ems exped to be stored in filter

get_size()
class learnedbf.BF.PyBloomLiveAdapter(n=None, epsilon=None, m=None)

Bases: object

Adapter allowing to use objects of pybloom_live.BloomFilter using the same interface of pybloom.BloomFilter.

add(x)

Delegate add method to pybloom_live.BloomFilter

Parameters:

x (numerical array) – key to be added to the filter.

check(x)

Implements check of pybloom.BloomFilter interface in terms of the implementation of operator in in pybloom_live.BloomFilter.

Parameters:

x (numerical array) – query to be checked for inclusion in the filter.

Returns:

True if the query is predicted to be in the filter, False otherwise.

Type:

bool

get_size()

Return the size of the Bloom Filter (in bits).

Returns:

size of the bit array (in bits)

Return type:

int

class learnedbf.BF.VarhashBloomFilter(m, k_max)

Bases: BaseEstimator, ClassifierMixin, BloomFilter

add(key, k)

Adds a key to the filter using a specified number of hash functions.

Parameters:
  • key (int) – the key to add.

  • k (int) – the number of hash function to use.

check(key, k)

Test the key membership using a specified number of hash functions.

Parameters:
  • key (int) – the key to be checked.

  • k (int) – the number of hash function to use.

fit(X, y=None, K=None)

Build the Bloom Filter.

Parameters:
  • X (array of numbers) – array containing keys.

  • y (array of bool) – array containing labels for X’s elements, defaults to None.

  • K (array of int) – array containing the number of hash functions to be used for each element, defaults to`None`.

Returns:

the fit Varhash Bloom Filter instance.

Return type:

VarhashBloomFilter

Raises:

ValueError if X is empty.

Raises:

ValueError if K is None or X and K lengths don’t match.

get_size()

Return the Bloom Filter size (in bits). Abstract method implemented in subclasses.

predict(X, K=None)

Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter.

Parameters:
  • X (array of numerical arrays) – queries to be checked.

  • K (array of int) – array containing the number of hash functions to be used for each element, defaults to`None`.

Returns:

prediction for each value in X.

Return type:

array of bool

Raises:

ValueError if K is None or X and K lengths don’t match.

set_fit_request(*, K: bool | None | str = '$UNCHANGED$') VarhashBloomFilter

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

K (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for K parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, K: bool | None | str = '$UNCHANGED$') VarhashBloomFilter

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

K (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for K parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VarhashBloomFilter

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

learnedbf.BF.check_params_(n, epsilon, m)

Check the consistency of the arguments passed to the constructor.

Parameters:
  • n (int) – number of keys, defaults to None.

  • epsilon (float) – expected false positive rate, defaults to None.

  • m (int) – size (in bits) of the bitmap, defaults to None.

Returns:

a tuple (n, epsilon, m) containing the values of the three parameters once consistency has been verified or enforced.

Type:

tuple

Note: the three parameters are linked by the following relation:

\[\]

m = -(n cdot log epsilon) / ((log 2)^2)

thus if they are all specified, the function will verify that this relation holds, rasing ValueError otherwise; when two out of three parameters are specified, the unspecified one will be obtained by enforcing this relation; in all other cases, a ValueError will be raised.

class learnedbf.BF.hashfunc(m)

Bases: object