Training a Model
First, create data and model set. For more details about the following code, see Creating Instance section.
import yaml
params = yaml.safe_load("""
library: torch
run:
datasets:
data:
class: rectangle.data.Data
n_splits: 4
dataset:
fold: 0
model:
class: rectangle.torch.Model
hidden_sizes: [100, 100]
optimizer:
class: torch.optim.SGD
params: $.model.parameters()
lr: 0.001
scheduler:
class: torch.optim.lr_scheduler.ReduceLROnPlateau
optimizer: $
factor: 0.5
patience: 4
results:
metrics:
monitor:
metric: val_loss
early_stopping:
patience: 10
trainer:
loss: torch.nn.functional.mse_loss
batch_size: 10
epochs: 10
verbose: 2
""")
params
[2] 2020-06-20 15:23:57 (9.00ms) python3 (24.0s)
{'library': 'torch',
'run': {'datasets': {'data': {'class': 'rectangle.data.Data', 'n_splits': 4},
'dataset': None,
'fold': 0},
'model': {'class': 'rectangle.torch.Model', 'hidden_sizes': [100, 100]},
'optimizer': {'class': 'torch.optim.SGD',
'params': '$.model.parameters()',
'lr': 0.001},
'scheduler': {'class': 'torch.optim.lr_scheduler.ReduceLROnPlateau',
'optimizer': '$',
'factor': 0.5,
'patience': 4},
'results': None,
'metrics': None,
'monitor': {'metric': 'val_loss'},
'early_stopping': {'patience': 10},
'trainer': {'loss': 'torch.nn.functional.mse_loss',
'batch_size': 10,
'epochs': 10,
'verbose': 2}}}
Note
Key-order in the params
dictionary is meaningful, because the callback functions are called by this order. For example, Monitor
uses the results of Metrics
so that Monitor
should appear later than Metrics
.
ivory.core.instance.create_base_instance()
is more useful to create a run from a dictionary than the ivory.core.instance.create_instance()
because it can create multiple objects by one step processing $
-notation properly.
import ivory.core.instance
run = ivory.core.instance.create_base_instance(params, 'run')
list(run)
[3] 2020-06-20 15:23:57 (8.00ms) python3 (24.0s)
['datasets',
'model',
'optimizer',
'scheduler',
'results',
'metrics',
'monitor',
'early_stopping',
'trainer']
Callbacks
Check callbacks of the Run
instance.
import ivory.core.base
# A helper function
def print_callbacks(obj):
for func in ivory.core.base.Callback.METHODS:
if hasattr(obj, func) and callable(getattr(obj, func)):
print(' ', func)
for name, obj in run.items():
print(f'[{name}]')
print_callbacks(obj)
[4] 2020-06-20 15:23:57 (34.0ms) python3 (24.0s)
[datasets]
[model]
[optimizer]
[scheduler]
[results]
on_train_begin
on_train_end
on_val_end
on_test_begin
on_test_end
[metrics]
on_epoch_begin
on_train_begin
on_train_end
on_val_begin
on_val_end
on_epoch_end
[monitor]
on_epoch_end
[early_stopping]
on_epoch_end
[trainer]
on_init_begin
on_train_begin
on_val_begin
on_epoch_end
on_test_begin
Metrics
The role of Metrics
class is to record a set of metric for evaluation of model performance. The metirics are updated at each epoch end.
run.metrics # Now, metrics are empty.
[5] 2020-06-20 15:23:57 (4.00ms) python3 (24.0s)
Metrics()
Monitor
The Monitor
class is monitoring the most important metric to measure the model score or to determine the training logic (early stopping or pruning).
run.monitor # Monitoring `val_loss`. Lower is better.
[6] 2020-06-20 15:23:57 (3.00ms) python3 (24.0s)
Monitor(metric='val_loss', mode='min')
EarlyStopping
The EarlyStopping
class is to stop the training loop when a monitored metric has stopped improving.
run.early_stopping # Early stopping occurs when `wait` > `patience`.
[7] 2020-06-20 15:23:57 (3.00ms) python3 (24.0s)
EarlyStopping(patience=10, wait=0)
Trainer
The Tainer
class controls the model training. This is a callback, but at the same time, invokes callback functions at each step of training, validation, and test loop.
run.trainer # Training hasn't started yet, so epoch = -1.
[8] 2020-06-20 15:23:57 (3.00ms) python3 (24.0s)
Trainer(epoch=-1, epochs=10, global_step=-1, batch_size=10, shuffle=True, dataloaders='ivory.torch.data.DataLoaders', verbose=2, loss=<function mse_loss at 0x000001404B1C93A8>, gpu=False, precision=32, amp_level='O1', scheduler_step_mode='epoch')
Using a Trainer
A Run
instance invokes its trainer by Run.start()
.
run.start() # create_callbacks() is called automatically.
[9] 2020-06-20 15:23:57 (523ms) python3 (24.5s)
[epoch#0] loss=17.13 val_loss=5.569 lr=0.001 best
[epoch#1] loss=5.436 val_loss=3.977 lr=0.001 best
[epoch#2] loss=3.7 val_loss=2.625 lr=0.001 best
[epoch#3] loss=2.331 val_loss=1.614 lr=0.001 best
[epoch#4] loss=1.427 val_loss=0.905 lr=0.001 best
[epoch#5] loss=0.9515 val_loss=0.6364 lr=0.001 best
[epoch#6] loss=0.7028 val_loss=0.5842 lr=0.001 best
[epoch#7] loss=0.6472 val_loss=0.5076 lr=0.001 best
[epoch#8] loss=0.5725 val_loss=0.4287 lr=0.001 best
[epoch#9] loss=0.5375 val_loss=0.4063 lr=0.001 best
You can update attributes of run's objects at any time.
run.trainer.epochs = 5
run.start()
[10] 2020-06-20 15:23:57 (275ms) python3 (24.8s)
[epoch#10] loss=0.465 val_loss=0.3766 lr=0.001 best
[epoch#11] loss=0.4343 val_loss=0.3537 lr=0.001 best
[epoch#12] loss=0.4087 val_loss=0.3376 lr=0.001 best
[epoch#13] loss=0.4152 val_loss=0.3248 lr=0.001 best
[epoch#14] loss=0.3732 val_loss=0.3337 lr=0.001
Note
The Run.start()
doesn't reset the trainer's epoch.
Callbacks after Training
After training, the callbacks changes their states.
run.metrics # Show metrics at current epoch.
[11] 2020-06-20 15:23:58 (4.00ms) python3 (24.8s)
Metrics(loss=0.3732, val_loss=0.3337, lr=0.001)
run.metrics.history.val_loss # Metrics history.
[12] 2020-06-20 15:23:58 (4.00ms) python3 (24.8s)
{0: 5.56868793964386,
1: 3.976963722705841,
2: 2.6250638723373414,
3: 1.6140344977378844,
4: 0.9049779623746872,
5: 0.6364109225571155,
6: 0.584185541048646,
7: 0.5075690947473049,
8: 0.42866935580968857,
9: 0.4063351653516293,
10: 0.37661438062787056,
11: 0.35370331779122355,
12: 0.33763624504208567,
13: 0.3248191948980093,
14: 0.333729387819767}
run.monitor # Store the best score and its epoch.
[13] 2020-06-20 15:23:58 (4.00ms) python3 (24.8s)
Monitor(metric='val_loss', mode='min', best_score=0.325, best_epoch=13)
run.early_stopping # Current `wait`.
[14] 2020-06-20 15:23:58 (3.00ms) python3 (24.8s)
EarlyStopping(patience=10, wait=1)
run.trainer # Current epoch is 14 (0-indexed).
[15] 2020-06-20 15:23:58 (3.00ms) python3 (24.8s)
Trainer(epoch=14, epochs=5, global_step=899, batch_size=10, shuffle=True, dataloaders='ivory.torch.data.DataLoaders', verbose=2, loss=<function mse_loss at 0x000001404B1C93A8>, gpu=False, precision=32, amp_level='O1', scheduler_step_mode='epoch')