performance

Pre-defined Performance

Implement classical methods

The metric method is called by performance_metric(ground-truth, prediction, param_dict) All metric method would pop the parameters in the param_dict first.

s3l.metrics.performance.accuracy_score(y_true, y_pred, param_dict=None)[source]

Accuracy classification score.

Parameters:
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) _labels.
  • y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted _labels, as returned by a classifier.
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples], optional Sample
    weights.
    
Returns:

score

Return type:

float

s3l.metrics.performance.zero_one_loss(y_true, y_pred, param_dict=None)[source]

Zero-one classification loss.

If normalize is True, return the fraction of misclassifications (float), else it returns the number of misclassifications (int). The best performance is 0.

Read more in the User Guide.

Parameters:
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) labels.
  • y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted labels, as returned by a classifier.
  • param_dict (dict) –

    A dictory saving the parameters including:

    normalize : bool, optional (default=True)
        If ``False``, return the number of misclassifications.
        Otherwise, return the fraction of misclassifications.
    
    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.
    
Returns:

loss – If normalize == True, return the fraction of misclassifications (float), else it returns the number of misclassifications (int).

Return type:

float or int,

s3l.metrics.performance.roc_auc_score(y_true, y_score, param_dict=None)[source]

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

Parameters:
  • y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary _labels or binary label indicators. True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.
  • y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers). For binary y_true, y_score is supposed to be the score of the class with greater label.
  • param_dict (dict) –

    A dictory saving the parameters including:

    pos_label : int or str, optional, default=None
        Label considered as positive and others are considered negative.
    sample_weight : array-like of shape = [n_samples], optional,
    default=None
        Sample weights.
    
Returns:

auc

Return type:

float

s3l.metrics.performance.get_fps_tps_thresholds(y_true, y_score, param_dict=None)[source]

Calculate true and false positives per binary classification threshold.

Parameters:
  • y_true (array, shape = [n_samples]) – True targets of binary classification
  • y_score (array, shape = [n_samples]) – Estimated probabilities or decision function
  • param_dict (dict) –
    A dictory saving the parameters including::
    pos_label : int or str, default=None
    The label of the positive class
Returns:

  • fps (array, shape = [n_thresholds]) – A count of false positives, at index i being the number of negative samples assigned a score >= thresholds[i]. The total number of negative samples is equal to fps[-1] (thus true negatives are given by fps[-1] - fps).
  • tps (array, shape = [n_thresholds <= len(np.unique(y_score))]) – An increasing count of true positives, at index i being the number of positive samples assigned a score >= thresholds[i]. The total number of positive samples is equal to tps[-1] (thus false negatives are given by tps[-1] - tps).
  • thresholds (array, shape = [n_thresholds]) – Decreasing score values.

s3l.metrics.performance.f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)[source]

Compute the F1 score, also known as balanced F-score or F-measure

The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:

F1 = 2 * (precision * recall) / (precision + recall)

In the multi-class and multi-label case, this is the average of the F1 score of each class with weighting depending on the average parameter.

Read more in the User Guide.

Parameters:
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values.
  • y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier.
  • labels (list, optional) –

    The set of labels to include when average != 'binary', and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order.

    Changed in version 0.17: parameter labels improved for multiclass problem.

  • pos_label (str or int, 1 by default) – The class to report if average='binary' and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != 'binary' will report scores for that label only.
  • average (string, [None, 'binary' (default), 'micro', 'macro', 'samples', 'weighted']) –

    This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

    'binary':
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.
    'micro':
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    'macro':
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    'weighted':
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
    'samples':
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score()).
  • sample_weight (array-like of shape = [n_samples], optional) – Sample weights.
Returns:

f1_score – F1 score of the positive class in binary classification or weighted average of the F1 scores of each class for the multiclass task.

Return type:

float or array of float, shape = [n_unique_labels]

See also

fbeta_score(), precision_recall_fscore_support(), jaccard_score(), multilabel_confusion_matrix()

References

[1]Wikipedia entry for the F1-score

Examples

>>> from sklearn.metrics import f1_score
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> f1_score(y_true, y_pred, average='macro')  # doctest: +ELLIPSIS
0.26...
>>> f1_score(y_true, y_pred, average='micro')  # doctest: +ELLIPSIS
0.33...
>>> f1_score(y_true, y_pred, average='weighted')  # doctest: +ELLIPSIS
0.26...
>>> f1_score(y_true, y_pred, average=None)
array([0.8, 0. , 0. ])

Notes

When true positive + false positive == 0 or true positive + false negative == 0, f-score returns 0 and raises UndefinedMetricWarning.

s3l.metrics.performance.hamming_loss(y_true, y_pred, param_dict=None)[source]

Compute the average Hamming loss.

The Hamming loss is the fraction of labels that are incorrectly predicted.

Parameters:
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) labels.
  • y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted labels, as returned by a classifier.
  • labels (array, shape = [n_labels], optional (default=None)) – Integer array of labels. If not provided, labels will be inferred from y_true and y_pred.
Returns:

loss – Return the average Hamming loss between element of y_true and y_pred.

Return type:

float or int,

s3l.metrics.performance.one_error(y_true, y_pred, param_dict=None)[source]
Compute the one_error,similar to 0/1-loss.
Parameters:
  • y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary labels or binary label indicators.
  • y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.
    
Returns:

one_error

Return type:

float

s3l.metrics.performance.coverage_error(y_true, y_score, param_dict=None)[source]

Coverage error measure. Compute how far we need to go through the ranked scores to cover all true labels. The best value is equal to the average number of labels in y_true per sample.

Ties in y_scores are broken by giving maximal rank that would have been assigned to all tied values.

Parameters:
  • y_true (array, shape = [n_samples, n_labels]) – True binary labels in binary indicator format.
  • y_score (array, shape = [n_samples, n_labels]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.
    
Returns:

coverage_error

Return type:

float

s3l.metrics.performance.label_ranking_loss(y_true, y_score, param_dict=None)[source]

Compute Ranking loss measure.

Compute the average number of label pairs that are incorrectly ordered given y_score weighted by the size of the label set and the number of labels not in the label set.

Parameters:
  • y_true (array or sparse matrix, shape = [n_samples, n_labels]) – True binary labels in binary indicator format.
  • y_score (array, shape = [n_samples, n_labels]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.
    
Returns:

loss

Return type:

float

References

[1]Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook (pp. 667-685). Springer US.
s3l.metrics.performance.label_ranking_average_precision_score(y_true, y_score, param_dict=None)[source]

Compute ranking-based average precision

Parameters:
  • y_true (array or sparse matrix, shape = [n_samples, n_labels]) – True binary labels in binary indicator format.
  • y_score (array, shape = [n_samples, n_labels]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.
    
Returns:

score

Return type:

float

s3l.metrics.performance.micro_auc_score(y_true, y_score, param_dict=None)[source]

Compute the micro_auc_score.

Parameters:
  • y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary labels or binary label indicators.
  • y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.
    
Returns:

micro_auc_score

Return type:

float

s3l.metrics.performance.Average_precision_score(y_true, y_score, param_dict=None)[source]
Compute average precision (AP) from prediction scores

AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:

\[\text{AP} = \sum_n (R_n - R_{n-1}) P_n\]
Parameters:
  • y_true (array, shape = [n_samples] or [n_samples, n_classes]) – True binary labels or binary label indicators.
  • y_score (array, shape = [n_samples] or [n_samples, n_classes]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.
    
Returns:

average_precision

Return type:

float

s3l.metrics.performance.minus_mean_square_error(y_true, y_pred, param_dict=None)[source]

Minus mean square error

Parameters:
  • y_true (1d array-like) – Ground truth (correct) values.
  • y_pred (1d array-like) – Predicted values, as returned by a regressor.
  • param_dict (dict) –

    A dictory saving the parameters including:

    sample_weight : array-like of shape = [n_samples],
    optional Sample weights.
    
Returns:

score

Return type:

float