performance¶
Pre-defined Performance
Implement classical methods
The metric method is called by performance_metric(ground-truth, prediction, param_dict) All metric method would pop the parameters in the param_dict first.
-
s3l.metrics.performance.
accuracy_score
(y_true, y_pred, param_dict=None)[source]¶ Accuracy classification score.
Parameters: - y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) _labels.
- y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted _labels, as returned by a classifier.
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: score
Return type:
-
s3l.metrics.performance.
zero_one_loss
(y_true, y_pred, param_dict=None)[source]¶ Zero-one classification loss.
If normalize is
True
, return the fraction of misclassifications (float), else it returns the number of misclassifications (int). The best performance is 0.Read more in the User Guide.
Parameters: - y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) labels.
- y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted labels, as returned by a classifier.
- param_dict (dict) –
A dictory saving the parameters including:
normalize : bool, optional (default=True) If ``False``, return the number of misclassifications. Otherwise, return the fraction of misclassifications. sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: loss – If
normalize == True
, return the fraction of misclassifications (float), else it returns the number of misclassifications (int).Return type:
-
s3l.metrics.performance.
roc_auc_score
(y_true, y_score, param_dict=None)[source]¶ Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
Parameters: - y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary _labels or binary label indicators. True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.
- y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers). For binary y_true, y_score is supposed to be the score of the class with greater label.
- param_dict (dict) –
A dictory saving the parameters including:
pos_label : int or str, optional, default=None Label considered as positive and others are considered negative. sample_weight : array-like of shape = [n_samples], optional, default=None Sample weights.
Returns: auc
Return type:
-
s3l.metrics.performance.
get_fps_tps_thresholds
(y_true, y_score, param_dict=None)[source]¶ Calculate true and false positives per binary classification threshold.
Parameters: - y_true (array, shape = [n_samples]) – True targets of binary classification
- y_score (array, shape = [n_samples]) – Estimated probabilities or decision function
- param_dict (dict) –
- A dictory saving the parameters including::
- pos_label : int or str, default=None
- The label of the positive class
Returns: - fps (array, shape = [n_thresholds]) – A count of false positives, at index i being the number of negative samples assigned a score >= thresholds[i]. The total number of negative samples is equal to fps[-1] (thus true negatives are given by fps[-1] - fps).
- tps (array, shape = [n_thresholds <= len(np.unique(y_score))]) – An increasing count of true positives, at index i being the number of positive samples assigned a score >= thresholds[i]. The total number of positive samples is equal to tps[-1] (thus false negatives are given by tps[-1] - tps).
- thresholds (array, shape = [n_thresholds]) – Decreasing score values.
-
s3l.metrics.performance.
f1_score
(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)[source]¶ Compute the F1 score, also known as balanced F-score or F-measure
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:
F1 = 2 * (precision * recall) / (precision + recall)
In the multi-class and multi-label case, this is the average of the F1 score of each class with weighting depending on the
average
parameter.Read more in the User Guide.
Parameters: - y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values.
- y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier.
- labels (list, optional) –
The set of labels to include when
average != 'binary'
, and their order ifaverage is None
. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels iny_true
andy_pred
are used in sorted order.Changed in version 0.17: parameter labels improved for multiclass problem.
- pos_label (str or int, 1 by default) – The class to report if
average='binary'
and the data is binary. If the data are multiclass or multilabel, this will be ignored; settinglabels=[pos_label]
andaverage != 'binary'
will report scores for that label only. - average (string, [None, 'binary' (default), 'micro', 'macro', 'samples', 'weighted']) –
This parameter is required for multiclass/multilabel targets. If
None
, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:'binary'
:- Only report results for the class specified by
pos_label
. This is applicable only if targets (y_{true,pred}
) are binary. 'micro'
:- Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro'
:- Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted'
:- Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
'samples'
:- Calculate metrics for each instance, and find their average (only
meaningful for multilabel classification where this differs from
accuracy_score()
).
- sample_weight (array-like of shape = [n_samples], optional) – Sample weights.
Returns: f1_score – F1 score of the positive class in binary classification or weighted average of the F1 scores of each class for the multiclass task.
Return type: float or array of float, shape = [n_unique_labels]
See also
fbeta_score()
,precision_recall_fscore_support()
,jaccard_score()
,multilabel_confusion_matrix()
References
[1] Wikipedia entry for the F1-score Examples
>>> from sklearn.metrics import f1_score >>> y_true = [0, 1, 2, 0, 1, 2] >>> y_pred = [0, 2, 1, 0, 0, 1] >>> f1_score(y_true, y_pred, average='macro') # doctest: +ELLIPSIS 0.26... >>> f1_score(y_true, y_pred, average='micro') # doctest: +ELLIPSIS 0.33... >>> f1_score(y_true, y_pred, average='weighted') # doctest: +ELLIPSIS 0.26... >>> f1_score(y_true, y_pred, average=None) array([0.8, 0. , 0. ])
Notes
When
true positive + false positive == 0
ortrue positive + false negative == 0
, f-score returns 0 and raisesUndefinedMetricWarning
.
-
s3l.metrics.performance.
hamming_loss
(y_true, y_pred, param_dict=None)[source]¶ Compute the average Hamming loss.
The Hamming loss is the fraction of labels that are incorrectly predicted.
Parameters: - y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) labels.
- y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted labels, as returned by a classifier.
- labels (array, shape = [n_labels], optional (default=None)) – Integer array of labels. If not provided, labels will be inferred from y_true and y_pred.
Returns: loss – Return the average Hamming loss between element of
y_true
andy_pred
.Return type:
-
s3l.metrics.performance.
one_error
(y_true, y_pred, param_dict=None)[source]¶ - Compute the one_error,similar to 0/1-loss.
Parameters: - y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary labels or binary label indicators.
- y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: one_error
Return type:
-
s3l.metrics.performance.
coverage_error
(y_true, y_score, param_dict=None)[source]¶ Coverage error measure. Compute how far we need to go through the ranked scores to cover all true labels. The best value is equal to the average number of labels in
y_true
per sample.Ties in
y_scores
are broken by giving maximal rank that would have been assigned to all tied values.Parameters: - y_true (array, shape = [n_samples, n_labels]) – True binary labels in binary indicator format.
- y_score (array, shape = [n_samples, n_labels]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: coverage_error
Return type:
-
s3l.metrics.performance.
label_ranking_loss
(y_true, y_score, param_dict=None)[source]¶ Compute Ranking loss measure.
Compute the average number of label pairs that are incorrectly ordered given y_score weighted by the size of the label set and the number of labels not in the label set.
Parameters: - y_true (array or sparse matrix, shape = [n_samples, n_labels]) – True binary labels in binary indicator format.
- y_score (array, shape = [n_samples, n_labels]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: loss
Return type: References
[1] Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook (pp. 667-685). Springer US.
-
s3l.metrics.performance.
label_ranking_average_precision_score
(y_true, y_score, param_dict=None)[source]¶ Compute ranking-based average precision
Parameters: - y_true (array or sparse matrix, shape = [n_samples, n_labels]) – True binary labels in binary indicator format.
- y_score (array, shape = [n_samples, n_labels]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: score
Return type:
-
s3l.metrics.performance.
micro_auc_score
(y_true, y_score, param_dict=None)[source]¶ Compute the micro_auc_score.
Parameters: - y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary labels or binary label indicators.
- y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: micro_auc_score
Return type:
-
s3l.metrics.performance.
Average_precision_score
(y_true, y_score, param_dict=None)[source]¶ - Compute average precision (AP) from prediction scores
AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:
\[\text{AP} = \sum_n (R_n - R_{n-1}) P_n\]Parameters: - y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary labels or binary label indicators.
- y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: average_precision
Return type:
-
s3l.metrics.performance.
minus_mean_square_error
(y_true, y_pred, param_dict=None)[source]¶ Minus mean square error
Parameters: - y_true (1d array-like) – Ground truth (correct) values.
- y_pred (1d array-like) – Predicted values, as returned by a regressor.
- param_dict (dict) –
A dictory saving the parameters including:
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns: score
Return type: