Training a Model

First, create data and model set. For more details about the following code, see Creating Instance section.

import yaml

params = yaml.safe_load("""
library: torch
run:
  datasets:
    data:
      class: rectangle.data.Data
      n_splits: 4
    dataset:
    fold: 0
  model:
    class: rectangle.torch.Model
    hidden_sizes: [100, 100]
  optimizer:
    class: torch.optim.SGD
    params: $.model.parameters()
    lr: 0.001
  scheduler:
    class: torch.optim.lr_scheduler.ReduceLROnPlateau
    optimizer: $
    factor: 0.5
    patience: 4
  results:
  metrics:
  monitor:
    metric: val_loss
  early_stopping:
    patience: 10
  trainer:
    loss: torch.nn.functional.mse_loss
    batch_size: 10
    epochs: 10
    verbose: 2
""")
params

[2] 2020-06-20 15:23:57 (9.00ms) python3 (24.0s)

{'library': 'torch',
 'run': {'datasets': {'data': {'class': 'rectangle.data.Data', 'n_splits': 4},
   'dataset': None,
   'fold': 0},
  'model': {'class': 'rectangle.torch.Model', 'hidden_sizes': [100, 100]},
  'optimizer': {'class': 'torch.optim.SGD',
   'params': '$.model.parameters()',
   'lr': 0.001},
  'scheduler': {'class': 'torch.optim.lr_scheduler.ReduceLROnPlateau',
   'optimizer': '$',
   'factor': 0.5,
   'patience': 4},
  'results': None,
  'metrics': None,
  'monitor': {'metric': 'val_loss'},
  'early_stopping': {'patience': 10},
  'trainer': {'loss': 'torch.nn.functional.mse_loss',
   'batch_size': 10,
   'epochs': 10,
   'verbose': 2}}}

Note

Key-order in the params dictionary is meaningful, because the callback functions are called by this order. For example, Monitor uses the results of Metrics so that Monitor should appear later than Metrics.

ivory.core.instance.create_base_instance() is more useful to create a run from a dictionary than the ivory.core.instance.create_instance() because it can create multiple objects by one step processing $-notation properly.

import ivory.core.instance

run = ivory.core.instance.create_base_instance(params, 'run')
list(run)

[3] 2020-06-20 15:23:57 (8.00ms) python3 (24.0s)

['datasets',
 'model',
 'optimizer',
 'scheduler',
 'results',
 'metrics',
 'monitor',
 'early_stopping',
 'trainer']

Callbacks

Check callbacks of the Run instance.

import ivory.core.base

# A helper function
def print_callbacks(obj):
    for func in ivory.core.base.Callback.METHODS:
        if hasattr(obj, func) and callable(getattr(obj, func)):
            print('  ', func)

for name, obj in run.items():
    print(f'[{name}]')
    print_callbacks(obj)

[4] 2020-06-20 15:23:57 (34.0ms) python3 (24.0s)

[datasets]
[model]
[optimizer]
[scheduler]
[results]
   on_train_begin
   on_train_end
   on_val_end
   on_test_begin
   on_test_end
[metrics]
   on_epoch_begin
   on_train_begin
   on_train_end
   on_val_begin
   on_val_end
   on_epoch_end
[monitor]
   on_epoch_end
[early_stopping]
   on_epoch_end
[trainer]
   on_init_begin
   on_train_begin
   on_val_begin
   on_epoch_end
   on_test_begin

Metrics

The role of Metrics class is to record a set of metric for evaluation of model performance. The metirics are updated at each epoch end.

run.metrics  # Now, metrics are empty.

[5] 2020-06-20 15:23:57 (4.00ms) python3 (24.0s)

Metrics()

Monitor

The Monitor class is monitoring the most important metric to measure the model score or to determine the training logic (early stopping or pruning).

run.monitor  # Monitoring `val_loss`.  Lower is better.

[6] 2020-06-20 15:23:57 (3.00ms) python3 (24.0s)

Monitor(metric='val_loss', mode='min')

EarlyStopping

The EarlyStopping class is to stop the training loop when a monitored metric has stopped improving.

run.early_stopping  # Early stopping occurs when `wait` > `patience`.

[7] 2020-06-20 15:23:57 (3.00ms) python3 (24.0s)

EarlyStopping(patience=10, wait=0)

Trainer

The Tainer class controls the model training. This is a callback, but at the same time, invokes callback functions at each step of training, validation, and test loop.

run.trainer  # Training hasn't started yet, so epoch = -1.

[8] 2020-06-20 15:23:57 (3.00ms) python3 (24.0s)

Trainer(epoch=-1, epochs=10, global_step=-1, batch_size=10, shuffle=True, dataloaders='ivory.torch.data.DataLoaders', verbose=2, loss=<function mse_loss at 0x000001404B1C93A8>, gpu=False, precision=32, amp_level='O1', scheduler_step_mode='epoch')

Using a Trainer

A Run instance invokes its trainer by Run.start().

run.start()  # create_callbacks() is called automatically.

[9] 2020-06-20 15:23:57 (523ms) python3 (24.5s)

[epoch#0] loss=17.13 val_loss=5.569 lr=0.001 best
[epoch#1] loss=5.436 val_loss=3.977 lr=0.001 best
[epoch#2] loss=3.7 val_loss=2.625 lr=0.001 best
[epoch#3] loss=2.331 val_loss=1.614 lr=0.001 best
[epoch#4] loss=1.427 val_loss=0.905 lr=0.001 best
[epoch#5] loss=0.9515 val_loss=0.6364 lr=0.001 best
[epoch#6] loss=0.7028 val_loss=0.5842 lr=0.001 best
[epoch#7] loss=0.6472 val_loss=0.5076 lr=0.001 best
[epoch#8] loss=0.5725 val_loss=0.4287 lr=0.001 best
[epoch#9] loss=0.5375 val_loss=0.4063 lr=0.001 best

You can update attributes of run's objects at any time.

run.trainer.epochs = 5
run.start()

[10] 2020-06-20 15:23:57 (275ms) python3 (24.8s)

[epoch#10] loss=0.465 val_loss=0.3766 lr=0.001 best
[epoch#11] loss=0.4343 val_loss=0.3537 lr=0.001 best
[epoch#12] loss=0.4087 val_loss=0.3376 lr=0.001 best
[epoch#13] loss=0.4152 val_loss=0.3248 lr=0.001 best
[epoch#14] loss=0.3732 val_loss=0.3337 lr=0.001

Note

The Run.start() doesn't reset the trainer's epoch.

Callbacks after Training

After training, the callbacks changes their states.

run.metrics  # Show metrics at current epoch.

[11] 2020-06-20 15:23:58 (4.00ms) python3 (24.8s)

Metrics(loss=0.3732, val_loss=0.3337, lr=0.001)
run.metrics.history.val_loss  # Metrics history.

[12] 2020-06-20 15:23:58 (4.00ms) python3 (24.8s)

{0: 5.56868793964386,
 1: 3.976963722705841,
 2: 2.6250638723373414,
 3: 1.6140344977378844,
 4: 0.9049779623746872,
 5: 0.6364109225571155,
 6: 0.584185541048646,
 7: 0.5075690947473049,
 8: 0.42866935580968857,
 9: 0.4063351653516293,
 10: 0.37661438062787056,
 11: 0.35370331779122355,
 12: 0.33763624504208567,
 13: 0.3248191948980093,
 14: 0.333729387819767}
run.monitor  # Store the best score and its epoch.

[13] 2020-06-20 15:23:58 (4.00ms) python3 (24.8s)

Monitor(metric='val_loss', mode='min', best_score=0.325, best_epoch=13)
run.early_stopping  # Current `wait`.

[14] 2020-06-20 15:23:58 (3.00ms) python3 (24.8s)

EarlyStopping(patience=10, wait=1)
run.trainer  # Current epoch is 14 (0-indexed).

[15] 2020-06-20 15:23:58 (3.00ms) python3 (24.8s)

Trainer(epoch=14, epochs=5, global_step=899, batch_size=10, shuffle=True, dataloaders='ivory.torch.data.DataLoaders', verbose=2, loss=<function mse_loss at 0x000001404B1C93A8>, gpu=False, precision=32, amp_level='O1', scheduler_step_mode='epoch')