Ivory Core Entities

Client

Ivory has the Client class that manages the workflow of machine learning. In this tutorial, we are working with data and model to predict rectangle area. The source module exists under the examples directory.

First, create a Client instance.

import ivory

client = ivory.create_client("examples")  # Set the working directory
client

[3] 2020-06-20 15:23:39 (6.00ms) python3 (6.53s)

Client(num_instances=2)
list(client)

[4] 2020-06-20 15:23:39 (4.00ms) python3 (6.53s)

['tracker', 'tuner']

The first instance is a Tracker instance that connects Ivory to MLFlow Tracking. The second instance is named tuner. A Tuner instance connects Ivory to Optuna.

Show files in the working directory examples.

import os

os.listdir('examples')

[5] 2020-06-20 15:23:39 (4.00ms) python3 (6.53s)

['base.yml',
 'client.yml',
 'data.yml',
 'data2.yml',
 'lgb.yml',
 'mlruns',
 'nnabla.yml',
 'rectangle',
 'rfr.yml',
 'ridge.yml',
 'study.yml',
 'tensorflow.yml',
 'torch.yml',
 'torch2.yml']

rectangle is a Python package that contains our examples. YAML files with extension of .yml or possibly .yaml are parameter files to define a machine learning workflow. Basically, one YAML file is corresponding to one Experiment as discussed later, except the client.yml file. A YAML file name without the extension becomes an experiment name. mlruns is a directory automatically created by MLFlow Tracking in which our trained model and callbacks instances are saved.

The client.yml is a configuration file for a Client instance. In our case, the file just contains the minimal settings.

File 7 client.yml

client:
  tracker:
  tuner:

Note

If you don't need any customization, the YAML file for client is not required. If there is no file for client, Ivory creates a default client with a tracker and tuner. (So, the above file is unnecessary.)

If you don't need a tracker and/or tuner, for example in debugging, use ivory.create_client(tracker=False, tuner=False).

Experiment

Client.create_experiment() creates an Experiment instance. If the Client instance has a tracker, an experiment of MLFlow Tracking is also created at the same time if it hasn't existed yet. By clicking an icon () in the below cell, you can see the log.

experiment = client.create_experiment('torch')  # Read torch.yml as params.
experiment

[6] 2020-06-20 15:23:39 (15.0ms) python3 (6.55s)

[I 200620 15:23:39 tracker:48] A new experiment created with name: 'torch'
Experiment(id='1', name='torch', num_instances=1)

The ID for this experiment was given by MLFlow Tracking. The Client.create_experiment() loads a YAML file corresponding to the first argument from the working directory.

File 8 torch.yml

library: torch
datasets:
  data:
    class: rectangle.data.Data
    n_splits: 4
  dataset:
  fold: 0
model:
  class: rectangle.torch.Model
  hidden_sizes: [20, 30]
optimizer:
  class: torch.optim.SGD
  params: $.model.parameters()
  lr: 1e-3
scheduler:
  class: torch.optim.lr_scheduler.ReduceLROnPlateau
  optimizer: $
  factor: 0.5
  patience: 4
results:
metrics:
monitor:
  metric: val_loss
early_stopping:
  patience: 10
trainer:
  loss: mse
  batch_size: 10
  epochs: 10
  shuffle: true
  verbose: 2

After loading, the Experiment instance setups the parameters for creating runs later. The parameters are stored in the params attribute.

experiment.params

[7] 2020-06-20 15:23:39 (4.00ms) python3 (6.55s)

{'run': {'datasets': {'data': {'class': 'rectangle.data.Data', 'n_splits': 4},
   'dataset': {'def': 'ivory.torch.data.Dataset'},
   'fold': 0,
   'class': 'ivory.core.data.Datasets'},
  'model': {'class': 'rectangle.torch.Model', 'hidden_sizes': [20, 30]},
  'optimizer': {'class': 'torch.optim.SGD',
   'params': '$.model.parameters()',
   'lr': 0.001},
  'scheduler': {'class': 'torch.optim.lr_scheduler.ReduceLROnPlateau',
   'optimizer': '$',
   'factor': 0.5,
   'patience': 4},
  'results': {'class': 'ivory.torch.results.Results'},
  'metrics': {'class': 'ivory.torch.metrics.Metrics'},
  'monitor': {'metric': 'val_loss',
   'class': 'ivory.callbacks.monitor.Monitor'},
  'early_stopping': {'patience': 10,
   'class': 'ivory.callbacks.early_stopping.EarlyStopping'},
  'trainer': {'loss': 'mse',
   'batch_size': 10,
   'epochs': 10,
   'shuffle': True,
   'verbose': 2,
   'class': 'ivory.torch.trainer.Trainer'},
  'class': 'ivory.torch.run.Run'},
 'experiment': {'name': 'torch',
  'class': 'ivory.core.base.Experiment',
  'id': '1'}}

This is similar to the YAML file we read before, but has been slightly changed.

  • Run and experiment keys are inserted.
  • Run name is assigned by Ivory Client.
  • Experiment ID and Run ID are assigned by MLFlow Tracking.
  • Default classes are specified, for example the ivory.torch.trainer.Trainer class for a trainer instance.

Run

After setting up an Experiment instance, you can create runs with various parameters. Ivory provides several way to configure them as below.

Default parameters

Calling without arguments creates a run with default parameters.

run = experiment.create_run()
run

[8] 2020-06-20 15:23:39 (34.0ms) python3 (6.59s)

Run(id='a69b41e9dbf344d692ce184f069fc514', name='run#0', num_instances=12)

Here, the ID for this run is assigned by MLFlow Tracking. On the other hand, the name is assigned by Ivory as the form of "(run class name in lower case)#(run number)".

Simple literal (int, float, str)

Passing key-value pairs, you can change the parameters.

run = experiment.create_run(fold=1)
run.datasets.fold

[9] 2020-06-20 15:23:39 (37.0ms) python3 (6.62s)

1

But the type of parameter must be equal, otherwise a ValueError is raised.

run = experiment.create_run(fold=0.5)
run.datasets.fold

[10] 2020-06-20 15:23:39 (137ms) python3 (6.76s)

ValueError: different type: <class 'int'> != <class 'float'>
ValueError                                Traceback (most recent call last)
<ipython-input-100-db3b6dd1af57> in <module>
----> 1 run = experiment.create_run(fold=0.5)
      2 run.datasets.fold

~\Documents\github\ivory\ivory\core\base.py in create_run(self, args, name, **kwargs)
    104                 [`create_params()`](#ivory.core.base.Creator.create_params) function.
    105         """
--> 106         params, args = self.create_params(args, name, **kwargs)
    107         run = instance.create_base_instance(params, name, self.source_name)
    108         if self.tracker:

~\Documents\github\ivory\ivory\core\base.py in create_params(self, args, name, **kwargs)
     88             params.update(default.get(name))
     89         update, args = utils.params.create_update(params[name], args, **kwargs)
---> 90         utils.params.update_dict(params[name], update)
     91         return params, args
     92 

~\Documents\github\ivory\ivory\utils\params.py in update_dict(org, update)
     28             x[k] = value
     29         elif type(x[k]) is not type(value) and x[k] is not None:
---> 30             raise ValueError(f"different type: {type(x[k])} != {type(value)}")
     31         else:
     32             if isinstance(x[k], dict):

List

A list parameter can be overwritten by passing a new list. Off course you can change the length of the list. The original hidden_sizes was [10, 20]. Modify it.

run = experiment.create_run(hidden_sizes=[2, 3, 4])
run.model

[11] 2020-06-20 15:23:39 (139ms) python3 (6.90s)

Model(
  (layers): ModuleList(
    (0): Linear(in_features=2, out_features=2, bias=True)
    (1): Linear(in_features=2, out_features=3, bias=True)
    (2): Linear(in_features=3, out_features=4, bias=True)
    (3): Linear(in_features=4, out_features=1, bias=True)
  )
)

As an alternative way, you can use 0-indexed colon-notation like below. In this case, pass a dictionary to the first argument, because a colon (:) can't be in keyword arguments.

params = {
    "hidden_sizes:0": 10,  # Order is important.
    "hidden_sizes:1": 20,  # Start from 0.
    "hidden_sizes:2": 30,  # No skip. No reverse.
}
run = experiment.create_run(params)
run.model

[12] 2020-06-20 15:23:39 (46.9ms) python3 (6.95s)

Model(
  (layers): ModuleList(
    (0): Linear(in_features=2, out_features=10, bias=True)
    (1): Linear(in_features=10, out_features=20, bias=True)
    (2): Linear(in_features=20, out_features=30, bias=True)
    (3): Linear(in_features=30, out_features=1, bias=True)
  )
)

Do you feel this function is unnecessary? This function is prepared for hyperparameter tuning.

In some case, you may want to change elements of list. Use 0-indexed dot-notation.

params = {"hidden_sizes.1": 5}
run = experiment.create_run(params)
run.model

[13] 2020-06-20 15:23:39 (48.5ms) python3 (6.99s)

Model(
  (layers): ModuleList(
    (0): Linear(in_features=2, out_features=20, bias=True)
    (1): Linear(in_features=20, out_features=5, bias=True)
    (2): Linear(in_features=5, out_features=1, bias=True)
  )
)

Duplicated parameter name

Duplicated parameters with the same name are updated together.

run = experiment.create_run(patience=5)
run.scheduler.patience, run.early_stopping.patience

[14] 2020-06-20 15:23:40 (46.0ms) python3 (7.04s)

(5, 5)

This behavior is natural to update the parameters with the same meaning. But in the above example, the patience of early stopping becomes equal to that of scheduler, so the scheduler doesn't work at all.

Scoping by dots

To specify an individual parameter even if there are other parameters with the same name, use scoping by dots, or parameter fullname.

params = {'scheduler.patience': 8, 'early_stopping.patience': 20}
run = experiment.create_run(params)
run.scheduler.patience, run.early_stopping.patience

[15] 2020-06-20 15:23:40 (49.0ms) python3 (7.09s)

(8, 20)

Object type

Parameters are not limited to a literal such as int, float, or str. For example,

run = experiment.create_run()
run.optimizer

[16] 2020-06-20 15:23:40 (49.0ms) python3 (7.14s)

SGD (
Parameter Group 0
    dampening: 0
    lr: 0.001
    momentum: 0
    nesterov: False
    weight_decay: 0
)
run = experiment.create_run({'optimizer.class': 'torch.optim.Adam'})
run.optimizer

[17] 2020-06-20 15:23:40 (52.0ms) python3 (7.19s)

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
)

This means that you can compare optimizer algorithms easily through multiple runs with minimal effort.

Creating a run from a client

In the above examples, we created runs using the experiment.create_run(). In addition, you can do the same thing by client.create_run() with an experiment name as the first argument. The following code blocks are equivalent.

Code 1

experiment = client.create_experiment('torch')
run = experiment.create_run(fold=3)

Code 2

run = client.create_run('torch', fold=3)