Creating Instances
In this tutorial, we will learn about Ivory's internal instance creation system. This is worth to understand the way of writing a YAML file for machine learning.
Basic idea
A syntax to create an instance is similar to a dictionary.
can be equivalently written as
Ivory exactly uses this relationship.
from ivory.core.instance import create_instance
params = {'data': {'class': 'rectangle.data.Data', 'n_splits': 5}}
data = create_instance(params, 'data')
data
[2] 2020-06-20 15:23:40 (5.00ms) python3 (7.32s)
Data(train_size=834, test_size=166)
Here, the create_instance()
requires the second argument name
to specify a key because the first argument params
can have multiple keys. Note that we added a n_splits
parameter that is different from the default value 5. Let's see unique values of fold.
import numpy as np
np.unique(data.fold) # 5-fold for train and 1-fold for test.
[3] 2020-06-20 15:23:40 (4.00ms) python3 (7.32s)
array([-1, 0, 1, 2, 3, 4], dtype=int8)
For writing a dictionary easily, we use PyYAML library in this tutorial.
import yaml
# A helper function.
def create(doc, name, **kwargs):
params = yaml.safe_load(doc)
return create_instance(params, name, **kwargs)
doc = """
data:
class: rectangle.data.Data
n_splits: 5
"""
create(doc, 'data')
[4] 2020-06-20 15:23:40 (7.00ms) python3 (7.33s)
Data(train_size=834, test_size=166)
Hierarchal Structure
Next create a Dataset
instance. The Dataset
class requires a Data
instance as the first argument so that the corresponding dictionary have a hierarchal structure.
doc = """
dataset:
class: ivory.core.data.Dataset
data:
class: rectangle.data.Data
n_splits: 5
mode: train
fold: 0
"""
create(doc, 'dataset')
[5] 2020-06-20 15:23:40 (6.00ms) python3 (7.34s)
Dataset(mode='train', num_samples=667)
As you can see, Ivory can treat this hierarchal structure correctly. Next, create a Datasets
instance.
doc = """
datasets:
class: ivory.core.data.Datasets
data:
class: rectangle.data.Data
n_splits: 5
dataset:
def: ivory.core.data.Dataset
fold: 0
"""
create(doc, 'datasets')
[6] 2020-06-20 15:23:40 (6.00ms) python3 (7.34s)
Datasets(data=Data(train_size=834, test_size=166), dataset=<class 'ivory.core.data.Dataset'>, fold=0)
Remember that the argument dataset
for the Datasets
class is not an instance but a callable that returns a Dataset
instance (See the previous section). To describe this behavior, we use a new def
key to create a callable instead of a class
key.
Default Class
In the above example, the two lines using an Ivory's original class seems to be verbose a little bit. Ivory adds a default class if the class
or def
key is missing.
Here is the list of default classes prepared by Ivory:
from ivory.core.default import DEFAULT_CLASS
for library, values in DEFAULT_CLASS.items():
print(f'library: {library}')
for name, value in values.items():
print(" ", name, "---", value)
[7] 2020-06-20 15:23:40 (80.7ms) python3 (7.42s)
library: core
client --- ivory.core.client.Client
tracker --- ivory.core.tracker.Tracker
tuner --- ivory.core.tuner.Tuner
experiment --- ivory.core.base.Experiment
objective --- ivory.core.objective.Objective
run --- ivory.core.run.Run
task --- ivory.core.run.Task
study --- ivory.core.run.Study
data --- ivory.core.data.Data
dataset --- ivory.core.data.Dataset
datasets --- ivory.core.data.Datasets
results --- ivory.callbacks.results.Results
metrics --- ivory.callbacks.metrics.Metrics
monitor --- ivory.callbacks.monitor.Monitor
early_stopping --- ivory.callbacks.early_stopping.EarlyStopping
library: torch
run --- ivory.torch.run.Run
dataset --- ivory.torch.data.Dataset
results --- ivory.torch.results.Results
metrics --- ivory.torch.metrics.Metrics
trainer --- ivory.torch.trainer.Trainer
library: tensorflow
run --- ivory.tensorflow.run.Run
trainer --- ivory.tensorflow.trainer.Trainer
library: nnabla
results --- ivory.callbacks.results.BatchResults
metrics --- ivory.nnabla.metrics.Metrics
trainer --- ivory.nnabla.trainer.Trainer
library: sklearn
estimator --- ivory.sklearn.estimator.Estimator
metrics --- ivory.sklearn.metrics.Metrics
Therefore, we can omit the lines using default classes like below. Here, the library
key is used to overload the default classes of the ivory.core
package by the specific library.
import torch.utils.data
doc = """
library: torch # Use default class for PyTorch.
datasets:
data:
class: rectangle.data.Data
n_splits: 5
dataset:
fold: 0
"""
datasets = create(doc, 'datasets')
isinstance(datasets.train, torch.utils.data.Dataset)
[8] 2020-06-20 15:23:40 (7.00ms) python3 (7.43s)
True
Default Value
If a callable has arguments with default value, you can use __default__
to get the default value from the callable signature.
doc = """
datasets:
data:
class: rectangle.data.Data
n_splits: __default__
dataset:
fold: 0
"""
datasets = create(doc, 'datasets')
datasets.data.n_splits
[9] 2020-06-20 15:23:40 (6.00ms) python3 (7.44s)
4
Positional Arguments
Do you know the name of the first argument of numpy.array()
?
import numpy as np
print(np.array.__doc__[:200])
[10] 2020-06-20 15:23:40 (4.00ms) python3 (7.44s)
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
Create an array.
Parameters
----------
object : array_like
An array, any object exposing the array inter
It's object
. But do you want to write like this?
doc = """
x:
class: numpy.array # Or `call` instead of `class`.
object: [1, 2, 3]
"""
create(doc, 'x')
[11] 2020-06-20 15:23:40 (4.00ms) python3 (7.44s)
array([1, 2, 3])
This is inconvenient and ugly. Use underscore-notation:
doc = """
x:
class: numpy.array
_: [1, 2, 3]
"""
create(doc, 'x')
[12] 2020-06-20 15:23:40 (4.00ms) python3 (7.45s)
array([1, 2, 3])
The second argument of numpy.array()
is dtype
. You can also use double underscore, which is unpacked.
doc = """
x:
call: numpy.array
__: [[1, 2, 3], 'float']
"""
create(doc, 'x')
[13] 2020-06-20 15:23:40 (4.00ms) python3 (7.45s)
array([1., 2., 3.])