base

Base IO code for all datasets

s3l.datasets.base.get_data_home(data_home=None)[source]

Return the path of the data dir.

This folder is used by some large dataset loaders to avoid downloading the data several times.

If the folder does not already exist, it is automatically created.

Parameters:data_home (str | None) – The path to data dir.
s3l.datasets.base.clear_data_home(data_home=None)[source]

Delete all the content of the data home cache.

Parameters:data_home (str | None) – The path to data dir.
s3l.datasets.base.load_data(feature_file=None, label_file=None)[source]

Load data from absolute path.

Parameters:
  • feature_file (string.optional (default=None)) –

    The absolute path of the user-provided feature dataset. The File should be in ‘.csv’ format and organized as follows:

    feature_name: [1,n_features]
    data: [m_samples,n]

    When the feature is a sparse matrix, the file should be in ‘*./mat/npz’ format.

  • label_file (string.optional (default=None)) –

    The absolute path of the user-provided label dataset. The File should be in ‘.csv’ format and organized as follows:

    label_name: [1,n_labels]
    label: [m_samples,n]

    When the label is a sparse matrix, the file should be in ‘*./mat/npz’ format.

    Besides,the number of rows in the label_file should be the same as the feature_file.

Returns:

  • X (array-like) – Data matrix with [m_samples, n_features].The data will be used to train models.
  • y (array-like) – The label of load data with [m_samples, n_labels].

s3l.datasets.base.load_graph(name, graph_file=None)[source]

Load graph from self-contained data set or user-provided data set. The self-contained data set is loaded first according to the provided data set name. Load the dataset according to the provided path when the dataset name is empty or does not exist.

Parameters:
  • name (string.optional (default=None)) – Name should be the name of the data in the self-contained data list.
  • graph_file (string.optional (default=None)) – The absolute path of the user-provided feature dataset. The File should be in ‘*.csv/mat/npz’ format .
Returns:

W

Return type:

np.nda

s3l.datasets.base.load_boston(return_X_y=False)[source]

Load and return the boston house-prices dataset (regression).

Samples total 506
Dimensionality 13
Features real, positive
Targets real 5. - 50.
Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the regression targets, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)
s3l.datasets.base.load_diabetes(return_X_y=False)[source]

Load and return the diabetes dataset (regression).

Samples total 442
Dimensionality 10
Features real, -.2 < x < .2
Targets integer 25 - 346

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn and ‘target’, the regression target for each sample.
  • (data, target) (tuple if return_X_y is True)
s3l.datasets.base.load_digits(return_X_y=False)[source]

Load and return the digits dataset (classification).

Each datapoint is a 8x8 image of a digit.

Classes 10
Samples per class ~180
Samples total 1797
Dimensionality 64
Features integers 0-16

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘images’, the images corresponding to each sample, ‘target’, the classification labels for each sample, ‘target_names’, the meaning of the labels, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

This is a copy of the test set of the UCI ML hand-written digits datasets http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

s3l.datasets.base.load_iris(return_X_y=False)[source]

Load and return the iris dataset (classification).

The iris dataset is a classic and very easy multi-class classification dataset.

Classes 3
Samples per class 50
Samples total 150
Dimensionality 4
Features real, positive

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)
s3l.datasets.base.load_breast_cancer(return_X_y=False)[source]

Load and return the breast cancer wisconsin dataset (classification).

The breast cancer dataset is a classic and very easy binary classification dataset.

Classes 2
Samples per class 212(M),357(B)
Samples total 569
Dimensionality 30
Features real, positive
Parameters:return_X_y (boolean, default=False) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

The copy of UCI ML Breast Cancer Wisconsin (Diagnostic) dataset is downloaded from: https://goo.gl/U2Uwz2

s3l.datasets.base.load_linnerud(return_X_y=False)[source]

Load and return the linnerud dataset (multivariate regression).

Samples total 20
Dimensionality 3 (for both data and target)
Features integer
Targets integer
Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’ and ‘targets’, the two multivariate datasets, with ‘data’ corresponding to the exercise and ‘targets’ corresponding to the physiological measurements, as well as ‘feature_names’ and ‘target_names’.
  • (data, target) (tuple if return_X_y is True)
s3l.datasets.base.load_wine(return_X_y=False)[source]

Load and return the wine dataset (classification).

The wine dataset is a classic and very easy multi-class classification dataset.

Classes 3
Samples per class [59,71,48]
Samples total 178
Dimensionality 13
Features real, positive

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

The copy of UCI ML Wine Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

s3l.datasets.base.load_ionosphere(return_X_y=False)[source]

Load and return the ionosphere dataset (classification).

The ionosphere dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class [225,126]
Samples total 351
Dimensionality 34
Features good, bad

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

The copy of UCI ML ionosphere Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data

s3l.datasets.base.load_australian(return_X_y=False)[source]

Load and return the australian dataset (classification).

The australian dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class [307,383]
Samples total 690
Dimensionality 14
Features class_1, class_0

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

The copy of UCI ML australian Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/australian/australian.dat

s3l.datasets.base.load_bupa(return_X_y=False)[source]

Load and return the bupa dataset (classification).

The bupa dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class [145,200]
Samples total 345
Dimensionality 6
Features class_1, class_0

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

The copy of UCI ML bupa Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/liver-disorders/bupa.data

s3l.datasets.base.load_haberman(return_X_y=False)[source]

Load and return the haberman dataset (classification).

The haberman dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class [225,81]
Samples total 306
Dimensionality 3
Features class_1, class_2

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (X, y) instead of a Bunch object. See below for more information about the X and y object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

The copy of UCI ML haberman Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data

s3l.datasets.base.load_vehicle(return_X_y=False)[source]

Load and return the vehicle dataset (classification).

The vehicle dataset is a classic and very easy multi-class classification dataset.

Classes 4
Samples per class[137,148,168,143]
Samples total 596
Dimensionality 18
Features class_1, class_2

Read more in the User Guide.

Parameters:return_X_y (boolean, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.
Returns:
  • data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
  • (data, target) (tuple if return_X_y is True)

The copy of libsvm vehicle Data Set dataset is downloaded and modified to fit standard format from: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/vehicle.scale

Besides,We dropped the missing data.

s3l.datasets.base.load_covtype(return_X_y=False)[source]

Load and return the covtype dataset (classification).

The covtype dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class[297711,283301]
Samples total 581012
Dimensionality 54
Features class_1, class_-1

Read more in the User Guide.

Parameters:return_X_y (boolean, default=True.) – If True, returns (data, target) . See below for more information about the data and target object.
Returns:(data, target)
Return type:tuple if return_X_y is True
s3l.datasets.base.load_housing10(return_X_y=False)[source]

Load and return the housing10 dataset (classification).

The housing10 dataset is a classic and very easy multi-class classification dataset.

Classes
Samples per
Samples total 506
Dimensionality 13
Features continue.

Read more in the User Guide.

Parameters:return_X_y (boolean, default=True.) – If True, returns (data, target) . See below for more information about the data and target object.
Returns:(data ,target)
Return type:tuple if return_X_y is True
s3l.datasets.base.load_spambase(return_X_y=False)[source]

Load and return the spambase dataset (classification).

The spambase dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class[1813,2788]
Samples total 4601
Dimensionality 57
Features class_1, class_-1

Read more in the User Guide.

Parameters:return_X_y (boolean, default=True.) – If True, returns (data, target) . See below for more information about the data and target object.
Returns:(data, target)
Return type:tuple if return_X_y is True
s3l.datasets.base.load_house(return_X_y=False)[source]

Load and return the spambase dataset (classification).

The spambase dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class[108,124]
Samples total 232
Dimensionality 16
Features class_1, class_-1

Read more in the User Guide.

Parameters:return_X_y (boolean, default=True.) – If True, returns (data, target) . See below for more information about the data and target object.
Returns:(data, target)
Return type:tuple if return_X_y is True
s3l.datasets.base.load_clean1(return_X_y=False)[source]

Load and return the house dataset (classification).

The spambase dataset is a classic and very easy multi-class classification dataset.

Classes 2
Samples per class[207,269]
Samples total 476
Dimensionality 166
Features class_1, class_-1

Read more in the User Guide.

Parameters:return_X_y (boolean, default=True.) – If True, returns (data, target) . See below for more information about the data and target object.
Returns:(data, target)
Return type:tuple if return_X_y is True
s3l.datasets.base.load_dataset(name=None, feature_file=None, label_file=None)[source]

Load data from self-contained data set or user-provided data set. The self-contained data set is loaded first according to the provided data set name. Load the dataset according to the provided path when the dataset name is empty or does not exist.

Parameters:
  • name (string.optional (default=None)) – Name should be the name of the data in the self-contained data list.
  • feature_file (string.optional (default=None)) –

    The absolute path of the user-provided feature dataset. The File should be in ‘.csv’ format and organized as follows:

    feature_name: [1,n_features]
    data: [m_samples,n]
  • label_file (string.optional (default=None)) –

    The absolute path of the user-provided label dataset. The File should be in ‘.csv’ format and organized as follows:

    label_name: [1,n_labels]
    label: [m_samples,n]

    Besides,the number of rows in the label_file should be the same as the feature_file.

Returns:

  • X (array-like) – Data matrix with [m_samples, n_features].The data will be used to train models.
  • y (array-like) – The label of load data with [m_samples, n_labels].