I am trying to understand 'epochs' in neural network training. Are the next experiments equivalent?

Question

Lets say I have a training sample (with their corresponding training labels) for a defined neural network (the architecture of the neural network does not matter for answering this question). Lets call the neural network 'model'.

In order to not create any missunderstandings, lets say that I introduce the initial weights and biases for 'model'.

Experiment 1.

I use the training sample and the training labels to train the 'model' for 40 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Final_experiment1.

Experiment 2

I use the training sample and the training labels to train 'model' for 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Intermediate.

Now I introduce WB_Intermediate in 'model' and train for another 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB__Final_experiment2.

Considerations. Every single parameter, hyperparameter, activation functions, loss functions....is exactly the same for both experiments, except the epochs.

Question: Are WB_Final_experiment1 and WB__Final_experiment2 exactly the same?

rami · Accepted Answer · 2020-03-20 20:27:44Z

0

If you follow this tutorial here, you will find the results of the two experiments as given below -

Experiment 1

Experiment 2

In the first experiment the model ran for 4 epochs and in the second experiment, the model ran for 2 epochs and then trained for 2 more epochs using last weights of previous training. You will find that the results vary but to a very small amount. And they will always vary due to the randomized initialization of weights. But the prediction of both models will lie very near to each other.

If the models are initialized with the same weights then the results at the end of 4 epochs for both the models will remain same.

On the other hand if you trained for 2 epochs, then shut down your training session and the weights are not saved and if you train now for 2 epochs after restarting session, the prediction won't be the same. To avoid that before training, always load the saved weights to continue training using model.load_weights("path to model").

TL;DR

If models are initialized with the exact same weights, then the output at the end of same training epochs will remain same. If they are randomly initialized the output will only vary slightly.

edited Mar 20, 2020 at 20:27

answered Mar 20, 2020 at 20:21

rami

2112 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Arman Mojaver Over a year ago

Thank you for your answer. I took into consideration what you are mentioning. Thats why in the question I mentioned that I initially introduce que weights by hand, so there isn't this type of situation where "different initial conditions will provide a different output for the same model". The fact that I am getting contradictory answers in this forum is interesting.

rami Over a year ago

The answers above (including mine) are not contradictory. They both are correct. You can think it as providing parts of the information.

Arman Mojaver Over a year ago

Thank you for participating in this forum. I am not sure who has provided the correct answer, but one thing is very clear, either WB_Final_experiment1 and WB__Final_experiment2 are exactly the same, or they are not exactly the same, it can't be both, it's a binary situation, therefore the answers are contradictory. At this point I am wondering what other specific information (variables, parameters, numbers) are kept by the optimizer that prevents the end result of experiment 1 and 2 to be exactly the same.

Marcel · Accepted Answer · 2020-03-20 19:34:29Z

0

If the operations you are doing are entirely deterministic, then yes. Epochs are implemented as an iteration number for a for loop around your training algorithm. You can see this in implementations in PyTorch.

answered Mar 20, 2020 at 19:34

Marcel

1,0581 gold badge10 silver badges20 bronze badges

Comments

daveg · Accepted Answer · 2020-03-20 19:44:08Z

0

Typically no, the model weights will not be the same as the optimisation will accrue its own values during training. You will need to save those too to truly resume from where you left off. See the Pytorch documentation regarding saving and resuming here. But this concept is not limited to the Pytorch framework.

Specifically:

It is important to also save the optimizer’s state_dict, as this contains buffers and parameters that are updated as the model trains.

answered Mar 20, 2020 at 19:44

daveg

5352 silver badges10 bronze badges

2 Comments

Arman Mojaver Over a year ago

Thank you for your answer. I'm trying to first understand it from a mathematical stand point, so later I can dive in the programming. I understand that the functions have buffers and stuff. But from a mathematical stand point, If I were to do this by hand, In experiment 2, besides the weights and biases, what other number do I need. I have looked at the link you provided and I still don't know the answer of the question.

daveg Over a year ago

I guess the literal meaning of an epoch is one complete view of the available training data. This is usually via mini-batches and for each mini-batch the model weights are updated - often using some form of gradient descent. If you really want to know the maths behind it, I can recommend Andrew Ng's Free Machine Learning Coursera module. He explains gradient descent formally, with code examples too.

Collectives™ on Stack Overflow

I am trying to understand 'epochs' in neural network training. Are the next experiments equivalent?

3 Answers 3

3 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related