learnedbf package

Subpackages

Submodules

learnedbf.complexity_measures module

class learnedbf.complexity_measures.C2

Bases: ComplexityMeasure

classmethod compute(X, y)

class learnedbf.complexity_measures.ComplexityMeasure

Bases: object

classmethod compute(X, y)

class learnedbf.complexity_measures.F1v

Bases: ComplexityMeasure

classmethod compute(X, y)

Module contents

class learnedbf.AdaBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, min_backup_size=1000.0, backup_filter_size=None, random_state=4678913, c_min=1.6, c_max=2.5, num_group_min=8, num_group_max=12, verbose=False)

Bases: BaseEstimator, BloomFilter, ClassifierMixin

Implementation of the Adaptive Learned Bloom Filter

fit(X, y)

Fits the Adaptive Learned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.

Parameters:

X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.

Returns:

the fit Bloom Filter instance.

Return type:

AdaBF

Raises:

ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.

NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.

get_size()

Return the Adaptive Learned Bloom Filter size.

Returns:: size of the Adaptive Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filter.
Return type:: dict

NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.

Parameters:: X (array of numerical arrays) – elements to classify.
Returns:: prediction for each value in ‘X’.
Return type:: array of bool
Raises:: NotFittedError if the classifier is not fitted.

NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → AdaBF

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class learnedbf.FastPLBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, num_group_min=4, num_group_max=6, N=1000, verbose=False)

Bases: BaseEstimator, BloomFilter, ClassifierMixin

Implementation of the Partitioned Learned Bloom Filter

fit(X, y)

Fits the Partitioned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.

Parameters:

X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.

Returns:

the fit Bloom Filter instance.

Return type:

AdaBF

Raises:

ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.

NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.

get_size()

Return the Partitioned Learned Bloom Filter size.

Returns:: size of the Partitioned Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filters.
Return type:: dict

NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.

Parameters:: X (array of numerical arrays) – elements to classify.
Returns:: prediction for each value in ‘X’.
Return type:: array of bool
Raises:: NotFittedError if the classifier is not fitted.

NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → FastPLBF

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class learnedbf.FastPLBFpp(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, num_group_min=4, num_group_max=6, N=1000, verbose=False)

Bases: BaseEstimator, BloomFilter, ClassifierMixin

Implementation of the Partitioned Learned Bloom Filter

fit(X, y)

Fits the Partitioned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.

Parameters:

X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.

Returns:

the fit Bloom Filter instance.

Return type:

AdaBF

Raises:

ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.

NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.

get_size()

Return the Partitioned Learned Bloom Filter size.

Returns:: size of the Partitioned Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filters.
Return type:: dict

NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.

Parameters:: X (array of numerical arrays) – elements to classify.
Returns:: prediction for each value in ‘X’.
Return type:: array of bool
Raises:: NotFittedError if the classifier is not fitted.

NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → FastPLBFpp

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class learnedbf.LBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, num_candidate_thresholds=10, threshold_test_size=0.7, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, threshold_evaluate=<function threshold_evaluate>, threshold=None, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, backup_filter_size=None, min_backup_size=1000.0, random_state=4678913, verbose=False)

Bases: BaseEstimator, BloomFilter, ClassifierMixin

Implementation of the Learned Bloom Filter

fit(X, y)

Fits the Learned Bloom Filter, training its classifier, setting the score threshold and building the backup filter.

Parameters:

X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.

Returns:

the fit Bloom Filter instance.

Return type:

LBF

Raises:

ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.

NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.

get_size()

Return the Learned Bloom Filter size.

Returns:: size of the Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filter.
Return type:: dict

NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.

Parameters:: X (array of numerical arrays) – elements to classify.
Returns:: prediction for each value in ‘X’.
Return type:: array of bool
Raises:: NotFittedError if the classifier is not fitted.

NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LBF

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class learnedbf.PLBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, num_group_min=4, num_group_max=6, N=50, verbose=False)

Bases: BaseEstimator, BloomFilter, ClassifierMixin

Implementation of the Partitioned Learned Bloom Filter

fit(X, y)

Fits the Partitioned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.

Parameters:

X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.

Returns:

the fit Bloom Filter instance.

Return type:

AdaBF

Raises:

ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.

NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.

get_size()

Return the Partitioned Learned Bloom Filter size.

Returns:: size of the Partitioned Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filters.
Return type:: dict

NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.

Parameters:: X (array of numerical arrays) – elements to classify.
Returns:: prediction for each value in ‘X’.
Return type:: array of bool
Raises:: NotFittedError if the classifier is not fitted.

NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → PLBF

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class learnedbf.SLBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, num_candidate_thresholds=10, threshold_test_size=0.7, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, threshold_evaluate=<function threshold_evaluate>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, verbose=False)

Bases: BaseEstimator, BloomFilter, ClassifierMixin

Implementation of the Sandwiched Learned Bloom Filter

fit(X, y)

Fits the Learned Bloom Filter, training its classifier, setting the score threshold and building the backup filter.

Parameters:

X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.

Returns:

the fit Bloom Filter instance.

Return type:

LBF

Raises:

ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.

NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.

get_size()

Return the Learned Bloom Filter size.

Returns:: size of the Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filter.
Return type:: dict

NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.

predict(X)

Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.

Parameters:: X (array of numerical arrays) – elements to classify.
Returns:: prediction for each value in ‘X’.
Return type:: array of bool
Raises:: NotFittedError if the classifier is not fitted.

NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SLBF

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

learnedbf.auprc(y, y_hat)

learnedbf.auprc_score(cls, X, y)

learnedbf.check_y(y): Check if the input array has valid labels for binary classification. Valid combinations are (False, True), (0, 1), (-1, 1)

learnedbf.threshold_evaluate(epsilon, key_predictions, nonkey_predictions)