learnedbf.BF package
Module contents
- class learnedbf.BF.BloomFilter(n=None, epsilon=None)
Bases:
objectBase BloomFilter class.
- estimate_FPR(X, y=None)
Compute the empirical false positive rate of the Bloom Filter on a set of queries.
- Parameters:
X (array of numerical arrays) – queries to be checked.
y (array of bool) – labels for the elements in X, defaults to None.
- Returns:
empirical false positive rate.
- Return type:
float
Note: the rate is computed only on non-key values. Thus, if y is provided, all true labels (that is, keys) are removed; otherwise, all elements in X are assumed to be non-key values. When dealing with learned filters, it is important that X does not contain any non-key used to build the filter, in order to avoid overfitting in the empirical FPR estimate.
- abstract classmethod fit(X, y=None)
Build the Bloom Filter. Abstract method implemented in subclasses.
- abstract classmethod get_size()
Return the Bloom Filter size (in bits). Abstract method implemented in subclasses.
- abstract classmethod predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter. Abstract method implemented in subclasses.
- class learnedbf.BF.ClassicalBloomFilter(n=None, epsilon=None, m=None, filter_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>)
Bases:
BloomFilter,BaseEstimator,ClassifierMixinImplementation of a classical Bloom Filter.
- add(x)
Delegate add method to the bloom filter implementation
- Parameters:
x – key to be added to the filter.
- check(x)
Delegate check method to the bloom filter implementation
- Parameters:
x – key to be added to the filter.
- fit(X, y=None)
Build the Bloom Filter.
- Parameters:
X (array of numbers) – array containing keys.
y (array of bool) – array containing labels for X’s elements, defaults to None.
- Returns:
the fit Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty.
- get_size()
Return the Bloom Filter size.
- Returns:
dictionary describing the overall size of the Bloom Filter, in which the key ‘bitmap’ is associated to the number of bits required by the bitmap of the filter.
- Return type:
dict
- predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter.
- Parameters:
X (array of numerical arrays) – queries to be checked.
- Returns:
prediction for each value in X.
- Return type:
array of bool
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ClassicalBloomFilter
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class learnedbf.BF.ClassicalBloomFilterImpl(n=None, epsilon=None, m=None)
Bases:
object- add(item)
Add an item in the filter
- check(item)
Check for existence of an item in filter
- classmethod get_hash_count(m, n)
Return the hash function(k) to be used using following formula k = (m/n) *)
- mint
size of(k)ar
- nint
number (k)ems exped to be stored in filter
- get_size()
- class learnedbf.BF.PyBloomLiveAdapter(n=None, epsilon=None, m=None)
Bases:
objectAdapter allowing to use objects of
pybloom_live.BloomFilterusing the same interface ofpybloom.BloomFilter.- add(x)
Delegate add method to
pybloom_live.BloomFilter- Parameters:
x (numerical array) – key to be added to the filter.
- check(x)
Implements check of
pybloom.BloomFilterinterface in terms of the implementation of operator in inpybloom_live.BloomFilter.- Parameters:
x (numerical array) – query to be checked for inclusion in the filter.
- Returns:
True if the query is predicted to be in the filter, False otherwise.
- Type:
bool
- get_size()
Return the size of the Bloom Filter (in bits).
- Returns:
size of the bit array (in bits)
- Return type:
int
- class learnedbf.BF.VarhashBloomFilter(m, k_max)
Bases:
BaseEstimator,ClassifierMixin,BloomFilter- add(key, k)
Adds a key to the filter using a specified number of hash functions.
- Parameters:
key (int) – the key to add.
k (int) – the number of hash function to use.
- check(key, k)
Test the key membership using a specified number of hash functions.
- Parameters:
key (int) – the key to be checked.
k (int) – the number of hash function to use.
- fit(X, y=None, K=None)
Build the Bloom Filter.
- Parameters:
X (array of numbers) – array containing keys.
y (array of bool) – array containing labels for X’s elements, defaults to None.
K (array of int) – array containing the number of hash functions to be used for each element, defaults to`None`.
- Returns:
the fit Varhash Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty.
- Raises:
ValueError if K is None or X and K lengths don’t match.
- get_size()
Return the Bloom Filter size (in bits). Abstract method implemented in subclasses.
- predict(X, K=None)
Computes predictions for a set of queries, each to be checked for inclusion in the Bloom Filter.
- Parameters:
X (array of numerical arrays) – queries to be checked.
K (array of int) – array containing the number of hash functions to be used for each element, defaults to`None`.
- Returns:
prediction for each value in X.
- Return type:
array of bool
- Raises:
ValueError if K is None or X and K lengths don’t match.
- set_fit_request(*, K: bool | None | str = '$UNCHANGED$') VarhashBloomFilter
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
K (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Kparameter infit.- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, K: bool | None | str = '$UNCHANGED$') VarhashBloomFilter
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
K (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Kparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') VarhashBloomFilter
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- learnedbf.BF.check_params_(n, epsilon, m)
Check the consistency of the arguments passed to the constructor.
- Parameters:
n (int) – number of keys, defaults to None.
epsilon (float) – expected false positive rate, defaults to None.
m (int) – size (in bits) of the bitmap, defaults to None.
- Returns:
a tuple (n, epsilon, m) containing the values of the three parameters once consistency has been verified or enforced.
- Type:
tuple
Note: the three parameters are linked by the following relation:
\[\]m = -(n cdot log epsilon) / ((log 2)^2)
thus if they are all specified, the function will verify that this relation holds, rasing ValueError otherwise; when two out of three parameters are specified, the unspecified one will be obtained by enforcing this relation; in all other cases, a ValueError will be raised.
- class learnedbf.BF.hashfunc(m)
Bases:
object