2 votes

Couches empilées Keras seq2seq

Dans le tutoriel : https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html Nous avons un modèle seq2seq à une couche. Je voudrais étendre ce modèle avec une couche supplémentaire du côté de l'encodeur et une couche supplémentaire du côté du décodeur. La formation semble fonctionner, mais je n'arrive pas à configurer correctement le décodeur à l'inférence avec plusieurs couches. Voici les modifications que j'ai apportées au modèle mentionné dans le tutoriel.

Encodeur :

encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder1 = LSTM(
  latent_dim,
  return_sequences=True
)
encoder2 = LSTM(
  latent_dim,
  return_state=True,
)
x=encoder1(encoder_inputs)
encoder_outputs, state_h, state_c = encoder2(x)

Décodeur :

encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.

decoder1 = LSTM(
  latent_dim,
  return_sequences=True
)
decoder2 = LSTM(
  latent_dim,
  return_sequences=True, return_state=True
)

dx = decoder1(decoder_inputs, initial_state=encoder_states)

decoder_outputs, _, _ = decoder2(dx)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Inférence (c'est la partie où je ne sais pas comment créer un décodeur avec plusieurs couches) L'implémentation actuelle qui ne fonctionne pas est donnée ci-dessous :

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

out_decoder1 = LSTM(
  latent_dim,
  return_sequences=True, return_state=True
)
out_decoder2 = LSTM(
  latent_dim,
  return_sequences=True, return_state=True
)

odx = out_decoder1(decoder_inputs, initial_state=decoder_states_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, state_h, state_c = out_decoder2(odx)
#decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)

decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
    (i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
    (i, char) for char, i in target_token_index.items())

def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        print(output_tokens)
        print(sampled_token_index)
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence

for seq_index in range(1):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Thnx

2voto

Frank Points 41

Après des jours de lutte sur le même problème, voici ce que j'ai trouvé pour fonctionner :

# using multiple LSTM layers for encoding is not a problem at all.
# Here I used 3. Pay attention to the flags. The sequence of the last
# layer is not returned because we want a single vector that stores everything, not a time-sequence...
encoder_input = Input(shape=(None, num_allowed_chars), name='encoder_input')
encoder_lstm1 = LSTM(state_size, name='encoder_lstm1',
                    return_sequences=True, return_state=True)
encoder_lstm2 = LSTM(state_size, name='encoder_lstm2',
                    return_sequences=True, return_state=True)
encoder_lstm3 = LSTM(state_size, name='encoder_lstm3',
                    return_sequences=False, return_state=True)

# Connect all the LSTM-layers.
x = encoder_input
x, _, _ = encoder_lstm1(x)
x, _, _ = encoder_lstm2(x)
# only the states of the last layer are of interest.
x, state_h, state_c = encoder_lstm3(x)
encoder_output = x  # This is the encoded, fix-sized vector which seq2seq is all about
encoder_states = [state_h, state_c]

Passons maintenant au décodage (et au plus difficile) :

# here is something new: for every decoding layer we need an Input variable for both states hidden (h)
# and cell state (c). Here I will use two stacked decoding layers and therefore initialize h1,c1,h2,c2.

decoder_initial_state_h1 = Input(shape=(state_size,),
                                name='decoder_initial_state_h1')

decoder_initial_state_c1 = Input(shape=(state_size,),
                                name='decoder_initial_state_c1')

decoder_initial_state_h2 = Input(shape=(state_size,),
                                name='decoder_initial_state_h2')

decoder_initial_state_c2 = Input(shape=(state_size,),
                                name='decoder_initial_state_c2')

decoder_input = Input(shape=(None, num_allowed_chars), name='decoder_input')

# pay attention of the return_sequence and return_state flags.
decoder_lstm1 = LSTM(state_size, name='decoder_lstm1',
                    return_sequences=True, return_state=True)
decoder_lstm2 = LSTM(state_size, name='decoder_lstm2',
                    return_sequences=True, return_state=True)

decoder_dense = Dense(
    num_allowed_chars, activation='softmax', name="decoder_output")

# connect the decoder for training (initial state = encoder_state)
# I feed the encoder_states as inital input to both decoding lstm layers
x = decoder_input
x, h1, c1 = decoder_lstm1(x, initial_state=encoder_states)
# I tried to pass [h1, c1] as initial states in line below, but that result in rubbish
x, _, _ = decoder_lstm2(x, initial_state=encoder_states)
decoder_output = decoder_dense(x)

model_train = Model(inputs=[encoder_input, decoder_input],
                    outputs=decoder_output)

model_encoder = Model(inputs=encoder_input,
                    outputs=encoder_states)

C'est la partie où le décodeur est câblé pour l'inférence. Elle diffère légèrement de la configuration du décodeur pour la formation.

# this decoder model setup is used for inference
# important! Every layer keeps its own states. This is, again, important in decode_sequence()
x = decoder_input
x, h1, c1 = decoder_lstm1(
    x, initial_state=[decoder_initial_state_h1, decoder_initial_state_c1])
x, h2, c2 = decoder_lstm2(
    x, initial_state=[decoder_initial_state_h2, decoder_initial_state_c2])
decoder_output = decoder_dense(x)
decoder_states = [h1, c1, h2, c2]

model_decoder = Model(
    inputs=[decoder_input] + [decoder_initial_state_h1, decoder_initial_state_c1,
                            decoder_initial_state_h2, decoder_initial_state_c2],
    outputs=[decoder_output] + decoder_states) # model outputs h1,c1,h2,c2!

model_train.summary()
model_train.compile(optimizer='rmsprop',
                    loss='categorical_crossentropy', metrics=["acc"])

plot_model(model_train, to_file=data_path_prefix +
        'spellchecker/model_train.png')
plot_model(model_encoder, to_file=data_path_prefix +
        'spellchecker/model_encode.png')
plot_model(model_decoder, to_file=data_path_prefix +
        'spellchecker/model_decode.png')

C'est la partie décodage. En fonction de votre code, faites attention à la façon dont j'ai prédit le vecteur d'encodage en dehors de la boucle, et je l'ai répété pour qu'il puisse être introduit dans decoder_model.predict et être entré pour les deux couches lstm.

Le deuxième point délicat est de récupérer les quatre états de sortie de .predict() et de les réinjecter dans la prédiction au prochain pas de temps.

def decode_sequence(input_seq, maxlen_decoder_sequence):
    # Encode the input as state vectors.
    initial_state = model_encoder.predict(input_seq)
    # I simply repeat the encoder states since
    # both decoding layers were trained on the encoded-vector
    # as initialization. I pass them into model_decoder.predict()
    initial_state = initial_state + initial_state

    # Generate empty target sequence of length 1.
    decoder_input_data = np.zeros((1, 1, num_allowed_chars))
    # Populate the first character of target sequence with the start character.
    decoder_input_data[0, 0, char_to_int['a']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        # catch all returning states to feed them back in (see end of loop)
        one_hot_char, h1, c1, h2, c2 = model_decoder.predict(
            [decoder_input_data] + initial_state)

        one_hot_char = one_hot_char[0][-1] 

        char_as_int = np.argmax(one_hot_char)
        # print(char_as_int)
        char_as_char = int_to_char[char_as_int]
        decoded_sentence += char_as_char

        # Exit condition: either hit max length or find stop character. 
        # (z is stop-char in this case)
        if (char_as_char == 'z' or
                len(decoded_sentence) >= maxlen_decoder_sequence):
            stop_condition = True

        # feed the predicted char back into next prediction step
        decoder_input_data = np.zeros((1, 1, num_allowed_chars))
        decoder_input_data[0, 0, char_as_int] = 1.

        # Update states
        initial_state = [h1, c1, h2, c2]

    return decoded_sentence

J'espère que cela vous aidera. Il existe des millions d'exemples simples à une seule couche, mais aucun avec plus. Évidemment, il est maintenant facile d'étendre à plus de 2 couches de décodage.

Bonne chance ! (ma première réponse sur so :-) ! )

0voto

Akhil Kumar Points 21

J'ai fait quelques changements et cela semble fonctionner correctement.

Modèle de formation :

# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state= True, return_sequences=True)
encoder2 = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder2(encoder(encoder_inputs))

# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder2 = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder2(decoder(decoder_inputs, initial_state=encoder_states))
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Inférence

# Define sampling models
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder2_outputs, state_h2, state_c2 = decoder2(decoder(decoder_inputs, initial_state=[state_h, state_c]))
decoder_states = [state_h2, state_c2]
decoder_outputs = decoder_dense(decoder2_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

Voyez si ça marche.

Prograide.com

Prograide est une communauté de développeurs qui cherche à élargir la connaissance de la programmation au-delà de l'anglais.
Pour cela nous avons les plus grands doutes résolus en français et vous pouvez aussi poser vos propres questions ou résoudre celles des autres.

Powered by:

X