base¶

Base IO code for all datasets

s3l.datasets.base.get_data_home(data_home=None)[source]¶

Return the path of the data dir.

This folder is used by some large dataset loaders to avoid downloading the data several times.

If the folder does not already exist, it is automatically created.

Parameters:	data_home (str \| None) – The path to data dir.

s3l.datasets.base.clear_data_home(data_home=None)[source]¶

Delete all the content of the data home cache.

Parameters:	data_home (str \| None) – The path to data dir.

s3l.datasets.base.load_data(feature_file=None, label_file=None)[source]¶

Load data from absolute path.

Parameters:

feature_file (string.optional (default=None)) –
The absolute path of the user-provided feature dataset. The File should be in ‘.csv’ format and organized as follows:

feature_name: [1,n_features]

data: [m_samples,n]

When the feature is a sparse matrix, the file should be in ‘*./mat/npz’ format.
label_file (string.optional (default=None)) –
The absolute path of the user-provided label dataset. The File should be in ‘.csv’ format and organized as follows:

label_name: [1,n_labels]

label: [m_samples,n]

When the label is a sparse matrix, the file should be in ‘*./mat/npz’ format.

Besides,the number of rows in the label_file should be the same as the feature_file.

Returns:

X (array-like) – Data matrix with [m_samples, n_features].The data will be used to train models.
y (array-like) – The label of load data with [m_samples, n_labels].

s3l.datasets.base.load_graph(name, graph_file=None)[source]¶

Load graph from self-contained data set or user-provided data set. The self-contained data set is loaded first according to the provided data set name. Load the dataset according to the provided path when the dataset name is empty or does not exist.

Parameters:	name (string.optional (default=None)) – Name should be the name of the data in the self-contained data list. graph_file (string.optional (default=None)) – The absolute path of the user-provided feature dataset. The File should be in ‘*.csv/mat/npz’ format .
Returns:	W
Return type:	np.nda

s3l.datasets.base.load_boston(return_X_y=False)[source]¶

Load and return the boston house-prices dataset (regression).

Samples total	506
Dimensionality	13
Features	real, positive
Targets	real 5. - 50.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the regression targets, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

s3l.datasets.base.load_diabetes(return_X_y=False)[source]¶

Load and return the diabetes dataset (regression).

Samples total	442
Dimensionality	10
Features	real, -.2 < x < .2
Targets	integer 25 - 346

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn and ‘target’, the regression target for each sample. (data, target) (tuple if `return_X_y` is True)

s3l.datasets.base.load_digits(return_X_y=False)[source]¶

Load and return the digits dataset (classification).

Each datapoint is a 8x8 image of a digit.

Classes	10
Samples per class	~180
Samples total	1797
Dimensionality	64
Features	integers 0-16

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘images’, the images corresponding to each sample, ‘target’, the classification labels for each sample, ‘target_names’, the meaning of the labels, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

This is a copy of the test set of the UCI ML hand-written digits datasets http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

s3l.datasets.base.load_iris(return_X_y=False)[source]¶

Load and return the iris dataset (classification).

The iris dataset is a classic and very easy multi-class classification dataset.

Classes	3
Samples per class	50
Samples total	150
Dimensionality	4
Features	real, positive

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

s3l.datasets.base.load_breast_cancer(return_X_y=False)[source]¶

Load and return the breast cancer wisconsin dataset (classification).

The breast cancer dataset is a classic and very easy binary classification dataset.

Classes	2
Samples per class	212(M),357(B)
Samples total	569
Dimensionality	30
Features	real, positive

Parameters:	return_X_y (boolean, default=False) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

The copy of UCI ML Breast Cancer Wisconsin (Diagnostic) dataset is downloaded from: https://goo.gl/U2Uwz2

s3l.datasets.base.load_linnerud(return_X_y=False)[source]¶

Load and return the linnerud dataset (multivariate regression).

Samples total	20
Dimensionality	3 (for both data and target)
Features	integer
Targets	integer

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’ and ‘targets’, the two multivariate datasets, with ‘data’ corresponding to the exercise and ‘targets’ corresponding to the physiological measurements, as well as ‘feature_names’ and ‘target_names’. (data, target) (tuple if `return_X_y` is True)

s3l.datasets.base.load_wine(return_X_y=False)[source]¶

Load and return the wine dataset (classification).

The wine dataset is a classic and very easy multi-class classification dataset.

Classes	3
Samples per class	[59,71,48]
Samples total	178
Dimensionality	13
Features	real, positive

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

The copy of UCI ML Wine Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

s3l.datasets.base.load_ionosphere(return_X_y=False)[source]¶

Load and return the ionosphere dataset (classification).

The ionosphere dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per class	[225,126]
Samples total	351
Dimensionality	34
Features	good, bad

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

The copy of UCI ML ionosphere Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data

s3l.datasets.base.load_australian(return_X_y=False)[source]¶

Load and return the australian dataset (classification).

The australian dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per class	[307,383]
Samples total	690
Dimensionality	14
Features	class_1, class_0

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

The copy of UCI ML australian Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/australian/australian.dat

s3l.datasets.base.load_bupa(return_X_y=False)[source]¶

Load and return the bupa dataset (classification).

The bupa dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per class	[145,200]
Samples total	345
Dimensionality	6
Features	class_1, class_0

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

The copy of UCI ML bupa Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/liver-disorders/bupa.data

s3l.datasets.base.load_haberman(return_X_y=False)[source]¶

Load and return the haberman dataset (classification).

The haberman dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per class	[225,81]
Samples total	306
Dimensionality	3
Features	class_1, class_2

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(X, y)` instead of a Bunch object. See below for more information about the X and y object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

The copy of UCI ML haberman Data Set dataset is downloaded and modified to fit standard format from: https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data

s3l.datasets.base.load_vehicle(return_X_y=False)[source]¶

Load and return the vehicle dataset (classification).

The vehicle dataset is a classic and very easy multi-class classification dataset.

Classes	4
Samples per	class[137,148,168,143]
Samples total	596
Dimensionality	18
Features	class_1, class_2

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=False.) – If True, returns `(data, target)` instead of a Bunch object. See below for more information about the data and target object.
Returns:	data (Bunch) – Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘target_names’, the meaning of the labels, ‘feature_names’, the meaning of the features, and ‘DESCR’, the full description of the dataset. (data, target) (tuple if `return_X_y` is True)

The copy of libsvm vehicle Data Set dataset is downloaded and modified to fit standard format from: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/vehicle.scale

Besides,We dropped the missing data.

s3l.datasets.base.load_covtype(return_X_y=False)[source]¶

Load and return the covtype dataset (classification).

The covtype dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per	class[297711,283301]
Samples total	581012
Dimensionality	54
Features	class_1, class_-1

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=True.) – If True, returns `(data, target)` . See below for more information about the data and target object.
Returns:	(data, target)
Return type:	tuple if `return_X_y` is True

s3l.datasets.base.load_housing10(return_X_y=False)[source]¶

Load and return the housing10 dataset (classification).

The housing10 dataset is a classic and very easy multi-class classification dataset.

Classes
Samples per
Samples total	506
Dimensionality	13
Features	continue.

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=True.) – If True, returns `(data, target)` . See below for more information about the data and target object.
Returns:	(data ,target)
Return type:	tuple if `return_X_y` is True

s3l.datasets.base.load_spambase(return_X_y=False)[source]¶

Load and return the spambase dataset (classification).

The spambase dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per	class[1813,2788]
Samples total	4601
Dimensionality	57
Features	class_1, class_-1

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=True.) – If True, returns `(data, target)` . See below for more information about the data and target object.
Returns:	(data, target)
Return type:	tuple if `return_X_y` is True

s3l.datasets.base.load_house(return_X_y=False)[source]¶

Load and return the spambase dataset (classification).

The spambase dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per	class[108,124]
Samples total	232
Dimensionality	16
Features	class_1, class_-1

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=True.) – If True, returns `(data, target)` . See below for more information about the data and target object.
Returns:	(data, target)
Return type:	tuple if `return_X_y` is True

s3l.datasets.base.load_clean1(return_X_y=False)[source]¶

Load and return the house dataset (classification).

The spambase dataset is a classic and very easy multi-class classification dataset.

Classes	2
Samples per	class[207,269]
Samples total	476
Dimensionality	166
Features	class_1, class_-1

Read more in the User Guide.

Parameters:	return_X_y (boolean, default=True.) – If True, returns `(data, target)` . See below for more information about the data and target object.
Returns:	(data, target)
Return type:	tuple if `return_X_y` is True

s3l.datasets.base.load_dataset(name=None, feature_file=None, label_file=None)[source]¶

Load data from self-contained data set or user-provided data set. The self-contained data set is loaded first according to the provided data set name. Load the dataset according to the provided path when the dataset name is empty or does not exist.

Parameters:

name (string.optional (default=None)) – Name should be the name of the data in the self-contained data list.
feature_file (string.optional (default=None)) –
The absolute path of the user-provided feature dataset. The File should be in ‘.csv’ format and organized as follows:

feature_name: [1,n_features]

data: [m_samples,n]
label_file (string.optional (default=None)) –
The absolute path of the user-provided label dataset. The File should be in ‘.csv’ format and organized as follows:

label_name: [1,n_labels]

label: [m_samples,n]

Besides,the number of rows in the label_file should be the same as the feature_file.

Returns:

X (array-like) – Data matrix with [m_samples, n_features].The data will be used to train models.
y (array-like) – The label of load data with [m_samples, n_labels].