Dans l'exemple ci-dessous, tiré de Documentation sur Keras Je veux comprendre comment grads
est calculée. Est-ce que le gradient grads
correspond au gradient moyen calculé en utilisant le lot (x_batch_train, y_batch_train)
? En d'autres termes, l'algorithme calcule-t-il le gradient, par rapport à chaque variable, en utilisant chaque échantillon du mini lot, puis en faisant la moyenne de ces échantillons pour obtenir grads
?
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))