3

Lets say I have a training sample (with their corresponding training labels) for a defined neural network (the architecture of the neural network does not matter for answering this question). Lets call the neural network 'model'.

In order to not create any missunderstandings, lets say that I introduce the initial weights and biases for 'model'.

Experiment 1.

I use the training sample and the training labels to train the 'model' for 40 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Final_experiment1.

Experiment 2

I use the training sample and the training labels to train 'model' for 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB_Intermediate.

Now I introduce WB_Intermediate in 'model' and train for another 20 epochs. After the training, the neural network will have a specific set of weights and biases for the entire neural network, lets call it WB__Final_experiment2.

Considerations. Every single parameter, hyperparameter, activation functions, loss functions....is exactly the same for both experiments, except the epochs.

Question: Are WB_Final_experiment1 and WB__Final_experiment2 exactly the same?

3 Answers 3

0

If you follow this tutorial here, you will find the results of the two experiments as given below -

Experiment 1

enter image description here

enter image description here

Experiment 2

enter image description here

enter image description here

In the first experiment the model ran for 4 epochs and in the second experiment, the model ran for 2 epochs and then trained for 2 more epochs using last weights of previous training. You will find that the results vary but to a very small amount. And they will always vary due to the randomized initialization of weights. But the prediction of both models will lie very near to each other.

If the models are initialized with the same weights then the results at the end of 4 epochs for both the models will remain same.

On the other hand if you trained for 2 epochs, then shut down your training session and the weights are not saved and if you train now for 2 epochs after restarting session, the prediction won't be the same. To avoid that before training, always load the saved weights to continue training using model.load_weights("path to model").

TL;DR

If models are initialized with the exact same weights, then the output at the end of same training epochs will remain same. If they are randomly initialized the output will only vary slightly.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your answer. I took into consideration what you are mentioning. Thats why in the question I mentioned that I initially introduce que weights by hand, so there isn't this type of situation where "different initial conditions will provide a different output for the same model". The fact that I am getting contradictory answers in this forum is interesting.
The answers above (including mine) are not contradictory. They both are correct. You can think it as providing parts of the information.
Thank you for participating in this forum. I am not sure who has provided the correct answer, but one thing is very clear, either WB_Final_experiment1 and WB__Final_experiment2 are exactly the same, or they are not exactly the same, it can't be both, it's a binary situation, therefore the answers are contradictory. At this point I am wondering what other specific information (variables, parameters, numbers) are kept by the optimizer that prevents the end result of experiment 1 and 2 to be exactly the same.
0

If the operations you are doing are entirely deterministic, then yes. Epochs are implemented as an iteration number for a for loop around your training algorithm. You can see this in implementations in PyTorch.

Comments

0

Typically no, the model weights will not be the same as the optimisation will accrue its own values during training. You will need to save those too to truly resume from where you left off. See the Pytorch documentation regarding saving and resuming here. But this concept is not limited to the Pytorch framework.

Specifically:

It is important to also save the optimizer’s state_dict, as this contains buffers and parameters that are updated as the model trains.

2 Comments

Thank you for your answer. I'm trying to first understand it from a mathematical stand point, so later I can dive in the programming. I understand that the functions have buffers and stuff. But from a mathematical stand point, If I were to do this by hand, In experiment 2, besides the weights and biases, what other number do I need. I have looked at the link you provided and I still don't know the answer of the question.
I guess the literal meaning of an epoch is one complete view of the available training data. This is usually via mini-batches and for each mini-batch the model weights are updated - often using some form of gradient descent. If you really want to know the maths behind it, I can recommend Andrew Ng's Free Machine Learning Coursera module. He explains gradient descent formally, with code examples too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.