learnedbf package
Subpackages
- learnedbf.BF package
- learnedbf.classifiers package
Submodules
learnedbf.complexity_measures module
- class learnedbf.complexity_measures.C2
Bases:
ComplexityMeasure- classmethod compute(X, y)
- class learnedbf.complexity_measures.F1v
Bases:
ComplexityMeasure- classmethod compute(X, y)
Module contents
- class learnedbf.AdaBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, min_backup_size=1000.0, backup_filter_size=None, random_state=4678913, c_min=1.6, c_max=2.5, num_group_min=8, num_group_max=12, verbose=False)
Bases:
BaseEstimator,BloomFilter,ClassifierMixinImplementation of the Adaptive Learned Bloom Filter
- fit(X, y)
Fits the Adaptive Learned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.
- Parameters:
X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.
- Returns:
the fit Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.
NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.
- get_size()
Return the Adaptive Learned Bloom Filter size.
- Returns:
size of the Adaptive Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filter.
- Return type:
dict
NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.
- predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.
- Parameters:
X (array of numerical arrays) – elements to classify.
- Returns:
prediction for each value in ‘X’.
- Return type:
array of bool
- Raises:
NotFittedError if the classifier is not fitted.
NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') AdaBF
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class learnedbf.FastPLBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, num_group_min=4, num_group_max=6, N=1000, verbose=False)
Bases:
BaseEstimator,BloomFilter,ClassifierMixinImplementation of the Partitioned Learned Bloom Filter
- fit(X, y)
Fits the Partitioned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.
- Parameters:
X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.
- Returns:
the fit Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.
NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.
- get_size()
Return the Partitioned Learned Bloom Filter size.
- Returns:
size of the Partitioned Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filters.
- Return type:
dict
NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.
- predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.
- Parameters:
X (array of numerical arrays) – elements to classify.
- Returns:
prediction for each value in ‘X’.
- Return type:
array of bool
- Raises:
NotFittedError if the classifier is not fitted.
NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FastPLBF
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class learnedbf.FastPLBFpp(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, num_group_min=4, num_group_max=6, N=1000, verbose=False)
Bases:
BaseEstimator,BloomFilter,ClassifierMixinImplementation of the Partitioned Learned Bloom Filter
- fit(X, y)
Fits the Partitioned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.
- Parameters:
X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.
- Returns:
the fit Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.
NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.
- get_size()
Return the Partitioned Learned Bloom Filter size.
- Returns:
size of the Partitioned Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filters.
- Return type:
dict
NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.
- predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.
- Parameters:
X (array of numerical arrays) – elements to classify.
- Returns:
prediction for each value in ‘X’.
- Return type:
array of bool
- Raises:
NotFittedError if the classifier is not fitted.
NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FastPLBFpp
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class learnedbf.LBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, num_candidate_thresholds=10, threshold_test_size=0.7, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, threshold_evaluate=<function threshold_evaluate>, threshold=None, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, backup_filter_size=None, min_backup_size=1000.0, random_state=4678913, verbose=False)
Bases:
BaseEstimator,BloomFilter,ClassifierMixinImplementation of the Learned Bloom Filter
- fit(X, y)
Fits the Learned Bloom Filter, training its classifier, setting the score threshold and building the backup filter.
- Parameters:
X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.
- Returns:
the fit Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.
NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.
- get_size()
Return the Learned Bloom Filter size.
- Returns:
size of the Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filter.
- Return type:
dict
NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.
- predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.
- Parameters:
X (array of numerical arrays) – elements to classify.
- Returns:
prediction for each value in ‘X’.
- Return type:
array of bool
- Raises:
NotFittedError if the classifier is not fitted.
NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LBF
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class learnedbf.PLBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, threshold_test_size=0.2, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, num_group_min=4, num_group_max=6, N=50, verbose=False)
Bases:
BaseEstimator,BloomFilter,ClassifierMixinImplementation of the Partitioned Learned Bloom Filter
- fit(X, y)
Fits the Partitioned Bloom Filter, training its classifier, setting the score thresholds and building the backup filter.
- Parameters:
X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.
- Returns:
the fit Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.
NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.
- get_size()
Return the Partitioned Learned Bloom Filter size.
- Returns:
size of the Partitioned Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filters.
- Return type:
dict
NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.
- predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.
- Parameters:
X (array of numerical arrays) – elements to classify.
- Returns:
prediction for each value in ‘X’.
- Return type:
array of bool
- Raises:
NotFittedError if the classifier is not fitted.
NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PLBF
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class learnedbf.SLBF(n=None, epsilon=None, m=None, classifier=ScoredDecisionTreeClassifier(), hyperparameters={}, num_candidate_thresholds=10, threshold_test_size=0.7, model_selection_method=StratifiedKFold(n_splits=5, random_state=None, shuffle=True), scoring=<function auprc_score>, threshold_evaluate=<function threshold_evaluate>, classical_BF_class=<class 'learnedbf.BF.ClassicalBloomFilterImpl'>, min_backup_size=1000.0, random_state=4678913, verbose=False)
Bases:
BaseEstimator,BloomFilter,ClassifierMixinImplementation of the Sandwiched Learned Bloom Filter
- fit(X, y)
Fits the Learned Bloom Filter, training its classifier, setting the score threshold and building the backup filter.
- Parameters:
X (array of numerical arrays) – examples to be used for fitting the classifier.
y (array of bool) – labels of the examples.
- Returns:
the fit Bloom Filter instance.
- Return type:
- Raises:
ValueError if X is empty, or if no threshold value is compliant with the false positive rate requirements.
NOTE: If the classifier variable instance has been specified as an already trained classifier, X and y are considered as the dataset to be used to build the LBF, that is, setting the threshold for the in output of the classifier, and evaluating the overall empirical FPR. In this case, X and y are assumed to contain values not used in order to infer the classifier, in order to ensure a fair estimate of FPR. Otherwise, X and y are meant to be the examples to be used to train the classifier, and subsequently set the threshold and evaluate the empirical FPR.
- get_size()
Return the Learned Bloom Filter size.
- Returns:
size of the Learned Bloom Filter (in bits), detailed w.r.t. the size of the classifier and of the backup filter.
- Return type:
dict
NOTE: the implementation assumes that the classifier object used to build the Learned Bloom Filter has a get_size method, returning the size of the classifier, measured in bits.
- predict(X)
Computes predictions for a set of queries, each to be checked for inclusion in the Learned Bloom Filter.
- Parameters:
X (array of numerical arrays) – elements to classify.
- Returns:
prediction for each value in ‘X’.
- Return type:
array of bool
- Raises:
NotFittedError if the classifier is not fitted.
NOTE: the implementation assumes that the classifier which is used (either pre-trained or trained in fit) refers to 1 as the label of keys in the Bloom filter
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SLBF
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- learnedbf.auprc(y, y_hat)
- learnedbf.auprc_score(cls, X, y)
- learnedbf.check_y(y)
Check if the input array has valid labels for binary classification. Valid combinations are (False, True), (0, 1), (-1, 1)
- learnedbf.threshold_evaluate(epsilon, key_predictions, nonkey_predictions)