Creating Instances

In this tutorial, we will learn about Ivory's internal instance creation system. This is worth to understand the way of writing a YAML file for machine learning.

Basic idea

A syntax to create an instance is similar to a dictionary.

example = ExampleCalss(arg1=123, arg2='abc')

can be equivalently written as

{'example': {'class': 'ExampleCalss', 'args1': 123, 'arg2': 'abc'}}

Ivory exactly uses this relationship.

from ivory.core.instance import create_instance

params = {'data': {'class': 'rectangle.data.Data', 'n_splits': 5}}
data = create_instance(params, 'data')
data

[2] 2020-06-20 15:23:40 (5.00ms) python3 (7.32s)

Data(train_size=834, test_size=166)

Here, the create_instance() requires the second argument name to specify a key because the first argument params can have multiple keys. Note that we added a n_splits parameter that is different from the default value 5. Let's see unique values of fold.

import numpy as np

np.unique(data.fold)  # 5-fold for train and 1-fold for test.

[3] 2020-06-20 15:23:40 (4.00ms) python3 (7.32s)

array([-1,  0,  1,  2,  3,  4], dtype=int8)

For writing a dictionary easily, we use PyYAML library in this tutorial.

import yaml

# A helper function.
def create(doc, name, **kwargs):
    params = yaml.safe_load(doc)
    return create_instance(params, name, **kwargs)

doc = """
data:
  class: rectangle.data.Data
  n_splits: 5
"""
create(doc, 'data')

[4] 2020-06-20 15:23:40 (7.00ms) python3 (7.33s)

Data(train_size=834, test_size=166)

Hierarchal Structure

Next create a Dataset instance. The Dataset class requires a Data instance as the first argument so that the corresponding dictionary have a hierarchal structure.

doc = """
dataset:
  class: ivory.core.data.Dataset
  data:
    class: rectangle.data.Data
    n_splits: 5
  mode: train
  fold: 0
"""
create(doc, 'dataset')

[5] 2020-06-20 15:23:40 (6.00ms) python3 (7.34s)

Dataset(mode='train', num_samples=667)

As you can see, Ivory can treat this hierarchal structure correctly. Next, create a Datasets instance.

doc = """
datasets:
  class: ivory.core.data.Datasets
  data:
    class: rectangle.data.Data
    n_splits: 5
  dataset:
    def: ivory.core.data.Dataset
  fold: 0
"""
create(doc, 'datasets')

[6] 2020-06-20 15:23:40 (6.00ms) python3 (7.34s)

Datasets(data=Data(train_size=834, test_size=166), dataset=<class 'ivory.core.data.Dataset'>, fold=0)

Remember that the argument dataset for the Datasets class is not an instance but a callable that returns a Dataset instance (See the previous section). To describe this behavior, we use a new def key to create a callable instead of a class key.

Default Class

In the above example, the two lines using an Ivory's original class seems to be verbose a little bit. Ivory adds a default class if the class or def key is missing. Here is the list of default classes prepared by Ivory:

from ivory.core.default import DEFAULT_CLASS

for library, values in DEFAULT_CLASS.items():
    print(f'library: {library}')
    for name, value in values.items():
        print("    ", name, "---", value)

[7] 2020-06-20 15:23:40 (80.7ms) python3 (7.42s)

library: core
     client --- ivory.core.client.Client
     tracker --- ivory.core.tracker.Tracker
     tuner --- ivory.core.tuner.Tuner
     experiment --- ivory.core.base.Experiment
     objective --- ivory.core.objective.Objective
     run --- ivory.core.run.Run
     task --- ivory.core.run.Task
     study --- ivory.core.run.Study
     data --- ivory.core.data.Data
     dataset --- ivory.core.data.Dataset
     datasets --- ivory.core.data.Datasets
     results --- ivory.callbacks.results.Results
     metrics --- ivory.callbacks.metrics.Metrics
     monitor --- ivory.callbacks.monitor.Monitor
     early_stopping --- ivory.callbacks.early_stopping.EarlyStopping
library: torch
     run --- ivory.torch.run.Run
     dataset --- ivory.torch.data.Dataset
     results --- ivory.torch.results.Results
     metrics --- ivory.torch.metrics.Metrics
     trainer --- ivory.torch.trainer.Trainer
library: tensorflow
     run --- ivory.tensorflow.run.Run
     trainer --- ivory.tensorflow.trainer.Trainer
library: nnabla
     results --- ivory.callbacks.results.BatchResults
     metrics --- ivory.nnabla.metrics.Metrics
     trainer --- ivory.nnabla.trainer.Trainer
library: sklearn
     estimator --- ivory.sklearn.estimator.Estimator
     metrics --- ivory.sklearn.metrics.Metrics

Therefore, we can omit the lines using default classes like below. Here, the library key is used to overload the default classes of the ivory.core package by the specific library.

import torch.utils.data

doc = """
library: torch  # Use default class for PyTorch.
datasets:
  data:
    class: rectangle.data.Data
    n_splits: 5
  dataset:
  fold: 0
"""
datasets  = create(doc, 'datasets')
isinstance(datasets.train, torch.utils.data.Dataset)

[8] 2020-06-20 15:23:40 (7.00ms) python3 (7.43s)

True

Default Value

If a callable has arguments with default value, you can use __default__ to get the default value from the callable signature.

doc = """
datasets:
  data:
    class: rectangle.data.Data
    n_splits: __default__
  dataset:
  fold: 0
"""
datasets = create(doc, 'datasets')
datasets.data.n_splits

[9] 2020-06-20 15:23:40 (6.00ms) python3 (7.44s)

4

Positional Arguments

Do you know the name of the first argument of numpy.array()?

import numpy as np

print(np.array.__doc__[:200])

[10] 2020-06-20 15:23:40 (4.00ms) python3 (7.44s)

array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

    Create an array.

    Parameters
    ----------
    object : array_like
        An array, any object exposing the array inter

It's object. But do you want to write like this?

doc = """
x:
  class: numpy.array  # Or `call` instead of `class`.
  object: [1, 2, 3]
"""
create(doc, 'x')

[11] 2020-06-20 15:23:40 (4.00ms) python3 (7.44s)

array([1, 2, 3])

This is inconvenient and ugly. Use underscore-notation:

doc = """
x:
  class: numpy.array
  _: [1, 2, 3]
"""
create(doc, 'x')

[12] 2020-06-20 15:23:40 (4.00ms) python3 (7.45s)

array([1, 2, 3])

The second argument of numpy.array() is dtype. You can also use double underscore, which is unpacked.

doc = """
x:
  call: numpy.array
  __: [[1, 2, 3], 'float']
"""
create(doc, 'x')

[13] 2020-06-20 15:23:40 (4.00ms) python3 (7.45s)

array([1., 2., 3.])