Set of Data classes
Ivory uses four classes for data presentation: Data
, Dataset
, Datasets
, and DataLoaders
. In this tutorial, we use the following Python module to explain them.
File 5 rectangle/data.py
Data Class
First import the module and check the basic behavior.
import rectangle.data
data = rectangle.data.Data()
data
[2] 2020-06-20 15:23:40 (5.00ms) python3 (7.21s)
Data(train_size=800, test_size=200)
In Data.init()
, we need to define 4 attributes:
index
: Index of samples.input
: Input data.target
: Target data.fold
: Fold number.
Data.get()
returns a tuple of (index
, input
, target
). This function is called from Dataset
instances when the dataset is indexed.
data.get(0) # Integer index.
[3] 2020-06-20 15:23:40 (5.00ms) python3 (7.21s)
(0,
array([2.0772183, 4.9461417], dtype=float32),
array([10.418284], dtype=float32))
data.get([0, 10, 20]) # Array-like index. list or np.ndarray
[4] 2020-06-20 15:23:40 (4.00ms) python3 (7.22s)
(array([ 0, 10, 20]),
array([[2.0772183, 4.9461417],
[3.9508767, 1.6149015],
[1.3982536, 2.687021 ]], dtype=float32),
array([[10.418284 ],
[ 6.441745 ],
[ 3.8166006]], dtype=float32))
Dataset Class
An instance of the Dataset
class holds one of train, validation, and test dataset. We use the Ivory's default Dataset
here instead of defining a subclass. Dataset()
initializer requires three arguments: A Data
instance, mode
, and fold
.
import ivory.core.data
dataset = ivory.core.data.Dataset(data, 'train', 0)
dataset
[5] 2020-06-20 15:23:40 (4.00ms) python3 (7.22s)
Dataset(mode='train', num_samples=600)
ivory.core.data.Dataset(data, 'val', 1) # Another mode is `test`.
[6] 2020-06-20 15:23:40 (3.00ms) python3 (7.23s)
Dataset(mode='val', num_samples=200)
As the Data
, the Dataset
has init()
without any arguments and returned value. You can define any code to modify data.
To get data from an Dataset
instance, use normal indexing
dataset[0] # Integer index.
[7] 2020-06-20 15:23:40 (4.00ms) python3 (7.23s)
(0,
array([2.0772183, 4.9461417], dtype=float32),
array([10.418284], dtype=float32))
dataset[[0, 10, 20]] # Array-like index. list or np.ndarray
[8] 2020-06-20 15:23:40 (4.00ms) python3 (7.23s)
(array([ 0, 16, 33]),
array([[2.0772183, 4.9461417],
[4.3623295, 3.6915543],
[1.6263498, 4.103133 ]], dtype=float32),
array([[10.418284],
[15.843957],
[ 6.581139]], dtype=float32))
index, *_ = dataset[:] # Get all data.
print(len(index))
index[:10]
[9] 2020-06-20 15:23:40 (5.00ms) python3 (7.24s)
600
array([ 0, 2, 3, 4, 6, 7, 10, 12, 13, 15])
These data come from a subset of the Data
instance according to the mode and fold.
The Dataset
takes an optional and callable argument: transform
.
def transform(mode: str, input, target):
if mode == 'train':
input = input * 2
target = target * 2
return input, target
dataset_transformed = ivory.core.data.Dataset(data, 'train', 0, transform)
dataset_transformed[0]
[10] 2020-06-20 15:23:40 (5.00ms) python3 (7.24s)
(0,
array([4.1544366, 9.892283 ], dtype=float32),
array([20.836569], dtype=float32))
2 * dataset[0][1], 2 * dataset[0][2]
[11] 2020-06-20 15:23:40 (4.00ms) python3 (7.25s)
(array([4.1544366, 9.892283 ], dtype=float32),
array([20.836569], dtype=float32))
Usually, we don't instantiate the Dataset
directly. Instead, the Datasets
class create dataset instances.
Datasets Class
An instance of the Datasets
class holds a set of train, validation, and test dataset. We use the Ivory's default Datasets
here instead of defining a subclass. The Datasets()
initializer requires three arguments: A Data
instance, Dataset
factory, and fold
.
from ivory.core.data import Dataset
datasets = ivory.core.data.Datasets(data, Dataset, 0)
datasets
[12] 2020-06-20 15:23:40 (4.00ms) python3 (7.25s)
Datasets(data=Data(train_size=800, test_size=200), dataset=<class 'ivory.core.data.Dataset'>, fold=0)
Note
The second argument (dataset
) is not a Dataset
instance but its factory that returns a Dataset
instance. It may be a Dataset
itself or any other function that returns a Dataset
instance.
for mode, dataset in datasets.items():
print(mode, dataset)
[13] 2020-06-20 15:23:40 (10.0ms) python3 (7.26s)
train Dataset(mode='train', num_samples=600)
val Dataset(mode='val', num_samples=200)
test Dataset(mode='test', num_samples=200)
Each dataset can be accessed by indexing or attributes.
datasets['train'], datasets.val
[14] 2020-06-20 15:23:40 (5.00ms) python3 (7.27s)
(Dataset(mode='train', num_samples=600), Dataset(mode='val', num_samples=200))
Using the Datasets
, we can easily split a whole data stored in a Data
instance into three train, validation, and test dataset.
DataLoaders Class
The DataLoaders
class is used internally by ivory.torch.trainer.Trainer
or
ivory.nnabla.trainer.Trainer
classes to yield a minibatch in training loop.
from ivory.torch.data import DataLoaders
dataloaders = DataLoaders(datasets, batch_size=4, shuffle=True)
dataloaders
[15] 2020-06-20 15:23:40 (4.00ms) python3 (7.27s)
DataLoaders(['train', 'val', 'test'])
for mode, dataloader in dataloaders.items():
print(mode, dataloader)
[16] 2020-06-20 15:23:40 (8.00ms) python3 (7.28s)
train <torch.utils.data.dataloader.DataLoader object at 0x00000140518D5408>
val <torch.utils.data.dataloader.DataLoader object at 0x00000140518D59C8>
test <torch.utils.data.dataloader.DataLoader object at 0x00000140518D5448>
next(iter(dataloaders.train)) # Shuffled
[17] 2020-06-20 15:23:40 (13.0ms) python3 (7.29s)
[tensor([110, 857, 0, 21], dtype=torch.int32),
tensor([[3.0064, 3.9603],
[1.2233, 2.2332],
[2.0772, 4.9461],
[4.4023, 2.1658]]),
tensor([[11.8943],
[ 2.7360],
[10.4183],
[ 9.3503]])]
next(iter(dataloaders.val)) # Not shuffled, regardless of `shuffle` argument
[18] 2020-06-20 15:23:40 (4.00ms) python3 (7.30s)
[tensor([ 1, 5, 8, 14], dtype=torch.int32),
tensor([[1.9011, 4.9406],
[2.8263, 2.8066],
[2.2066, 3.6569],
[4.0281, 1.4676]]),
tensor([[9.5829],
[8.0773],
[8.1560],
[5.7046]])]
next(iter(dataloaders.test)) # Not shuffled, regardless of `shuffle` argument
[19] 2020-06-20 15:23:40 (4.00ms) python3 (7.30s)
[tensor([ 9, 11, 19, 23], dtype=torch.int32),
tensor([[4.3053, 4.1206],
[1.8368, 3.9429],
[3.0406, 3.5035],
[3.2625, 4.5660]]),
tensor([[nan],
[nan],
[nan],
[nan]])]