base¶
Base IO code for all datasets
-
s3l.datasets.base.
get_data_home
(data_home=None)[source]¶ Return the path of the data dir.
This folder is used by some large dataset loaders to avoid downloading the data several times.
If the folder does not already exist, it is automatically created.
Parameters: data_home (str | None) – The path to data dir.
-
s3l.datasets.base.
clear_data_home
(data_home=None)[source]¶ Delete all the content of the data home cache.
Parameters: data_home (str | None) – The path to data dir.
-
s3l.datasets.base.
load_data
(feature_file=None, label_file=None)[source]¶ Load data from absolute path.
Parameters: - feature_file (string.optional (default=None)) –
The absolute path of the user-provided feature dataset. The File should be in ‘.csv’ format and organized as follows:
feature_name: [1,n_features] data: [m_samples,n] When the feature is a sparse matrix, the file should be in ‘*./mat/npz’ format.
- label_file (string.optional (default=None)) –
The absolute path of the user-provided label dataset. The File should be in ‘.csv’ format and organized as follows:
label_name: [1,n_labels] label: [m_samples,n] When the label is a sparse matrix, the file should be in ‘*./mat/npz’ format.
Besides,the number of rows in the label_file should be the same as the feature_file.
Returns: - X (array-like) – Data matrix with [m_samples, n_features].The data will be used to train models.
- y (array-like) – The label of load data with [m_samples, n_labels].
- feature_file (string.optional (default=None)) –
-
s3l.datasets.base.
load_graph
(name, graph_file=None)[source]¶ Load graph from self-contained data set or user-provided data set. The self-contained data set is loaded first according to the provided data set name. Load the dataset according to the provided path when the dataset name is empty or does not exist.
Parameters: - name (string.optional (default=None)) – Name should be the name of the data in the self-contained data list.
- graph_file (string.optional (default=None)) – The absolute path of the user-provided feature dataset. The File should be in ‘*.csv/mat/npz’ format .
Returns: W
Return type: np.nda
-
s3l.datasets.base.
load_boston
(return_X_y=False)[source]¶ Load and return the boston house-prices dataset (regression).
Samples total 506 Dimensionality 13 Features real, positive Targets real 5. - 50. Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the regression targets, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
-
s3l.datasets.base.
load_diabetes
(return_X_y=False)[source]¶ Load and return the diabetes dataset (regression).
Samples total 442 Dimensionality 10 Features real, -.2 < x < .2 Targets integer 25 - 346 Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn and ‘target’, the regression target for each sample.
- (data, target) (tuple if
return_X_y
is True)
-
s3l.datasets.base.
load_digits
(return_X_y=False)[source]¶ Load and return the digits dataset (classification).
Each datapoint is a 8x8 image of a digit.
Classes 10 Samples per class ~180 Samples total 1797 Dimensionality 64 Features integers 0-16 Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘images’, the images corresponding to each sample, ‘target’, the classification labels for each sample, ‘target_names’, the meaning of the labels, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
This is a copy of the test set of the UCI ML hand-written digits datasets http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
-
s3l.datasets.base.
load_iris
(return_X_y=False)[source]¶ Load and return the iris dataset (classification).
The iris dataset is a classic and very easy multi-class classification dataset.
Classes 3 Samples per class 50 Samples total 150 Dimensionality 4 Features real, positive Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
-
s3l.datasets.base.
load_breast_cancer
(return_X_y=False)[source]¶ Load and return the breast cancer wisconsin dataset (classification).
The breast cancer dataset is a classic and very easy binary classification dataset.
Classes 2 Samples per class 212(M),357(B) Samples total 569 Dimensionality 30 Features real, positive Parameters: return_X_y (boolean, default=False) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
The copy of UCI ML Breast Cancer Wisconsin (Diagnostic) dataset is downloaded from: https://goo.gl/U2Uwz2
-
s3l.datasets.base.
load_linnerud
(return_X_y=False)[source]¶ Load and return the linnerud dataset (multivariate regression).
Samples total 20 Dimensionality 3 (for both data and target) Features integer Targets integer Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’ and ‘targets’, the two multivariate datasets, with ‘data’ corresponding to the exercise and ‘targets’ corresponding to the physiological measurements, as well as ‘feature_names’ and ‘target_names’.
- (data, target) (tuple if
return_X_y
is True)
-
s3l.datasets.base.
load_wine
(return_X_y=False)[source]¶ Load and return the wine dataset (classification).
The wine dataset is a classic and very easy multi-class classification dataset.
Classes 3 Samples per class [59,71,48] Samples total 178 Dimensionality 13 Features real, positive Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
The copy of UCI ML Wine Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data
-
s3l.datasets.base.
load_ionosphere
(return_X_y=False)[source]¶ Load and return the ionosphere dataset (classification).
The ionosphere dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class [225,126] Samples total 351 Dimensionality 34 Features good, bad Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
The copy of UCI ML ionosphere Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data
-
s3l.datasets.base.
load_australian
(return_X_y=False)[source]¶ Load and return the australian dataset (classification).
The australian dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class [307,383] Samples total 690 Dimensionality 14 Features class_1, class_0 Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
The copy of UCI ML australian Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/australian/australian.dat
-
s3l.datasets.base.
load_bupa
(return_X_y=False)[source]¶ Load and return the bupa dataset (classification).
The bupa dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class [145,200] Samples total 345 Dimensionality 6 Features class_1, class_0 Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
The copy of UCI ML bupa Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/liver-disorders/bupa.data
-
s3l.datasets.base.
load_haberman
(return_X_y=False)[source]¶ Load and return the haberman dataset (classification).
The haberman dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class [225,81] Samples total 306 Dimensionality 3 Features class_1, class_2 Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (X, y)
instead of a Bunch object. See below for more information about the X and y object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
The copy of UCI ML haberman Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
-
s3l.datasets.base.
load_vehicle
(return_X_y=False)[source]¶ Load and return the vehicle dataset (classification).
The vehicle dataset is a classic and very easy multi-class classification dataset.
Classes 4 Samples per class[137,148,168,143] Samples total 596 Dimensionality 18 Features class_1, class_2 Read more in the User Guide.
Parameters: return_X_y (boolean, default=False.) – If True, returns (data, target)
instead of a Bunch object. See below for more information about the data and target object.Returns: - data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset.
- (data, target) (tuple if
return_X_y
is True)
The copy of libsvm vehicle Data Set dataset is downloaded and modified to fit standard format from: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/vehicle.scale
Besides,We dropped the missing data.
-
s3l.datasets.base.
load_covtype
(return_X_y=False)[source]¶ Load and return the covtype dataset (classification).
The covtype dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class[297711,283301] Samples total 581012 Dimensionality 54 Features class_1, class_-1 Read more in the User Guide.
Parameters: return_X_y (boolean, default=True.) – If True, returns (data, target)
. See below for more information about the data and target object.Returns: (data, target) Return type: tuple if return_X_y
is True
-
s3l.datasets.base.
load_housing10
(return_X_y=False)[source]¶ Load and return the housing10 dataset (classification).
The housing10 dataset is a classic and very easy multi-class classification dataset.
Classes Samples per Samples total 506 Dimensionality 13 Features continue. Read more in the User Guide.
Parameters: return_X_y (boolean, default=True.) – If True, returns (data, target)
. See below for more information about the data and target object.Returns: (data ,target) Return type: tuple if return_X_y
is True
-
s3l.datasets.base.
load_spambase
(return_X_y=False)[source]¶ Load and return the spambase dataset (classification).
The spambase dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class[1813,2788] Samples total 4601 Dimensionality 57 Features class_1, class_-1 Read more in the User Guide.
Parameters: return_X_y (boolean, default=True.) – If True, returns (data, target)
. See below for more information about the data and target object.Returns: (data, target) Return type: tuple if return_X_y
is True
-
s3l.datasets.base.
load_house
(return_X_y=False)[source]¶ Load and return the spambase dataset (classification).
The spambase dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class[108,124] Samples total 232 Dimensionality 16 Features class_1, class_-1 Read more in the User Guide.
Parameters: return_X_y (boolean, default=True.) – If True, returns (data, target)
. See below for more information about the data and target object.Returns: (data, target) Return type: tuple if return_X_y
is True
-
s3l.datasets.base.
load_clean1
(return_X_y=False)[source]¶ Load and return the house dataset (classification).
The spambase dataset is a classic and very easy multi-class classification dataset.
Classes 2 Samples per class[207,269] Samples total 476 Dimensionality 166 Features class_1, class_-1 Read more in the User Guide.
Parameters: return_X_y (boolean, default=True.) – If True, returns (data, target)
. See below for more information about the data and target object.Returns: (data, target) Return type: tuple if return_X_y
is True
-
s3l.datasets.base.
load_dataset
(name=None, feature_file=None, label_file=None)[source]¶ Load data from self-contained data set or user-provided data set. The self-contained data set is loaded first according to the provided data set name. Load the dataset according to the provided path when the dataset name is empty or does not exist.
Parameters: - name (string.optional (default=None)) – Name should be the name of the data in the self-contained data list.
- feature_file (string.optional (default=None)) –
The absolute path of the user-provided feature dataset. The File should be in ‘.csv’ format and organized as follows:
feature_name: [1,n_features] data: [m_samples,n] - label_file (string.optional (default=None)) –
The absolute path of the user-provided label dataset. The File should be in ‘.csv’ format and organized as follows:
label_name: [1,n_labels] label: [m_samples,n] Besides,the number of rows in the label_file should be the same as the feature_file.
Returns: - X (array-like) – Data matrix with [m_samples, n_features].The data will be used to train models.
- y (array-like) – The label of load data with [m_samples, n_labels].