Following the instruction document, the procedure should be at ease. Limited by the computing resource, my model is not trained thoroughly, resulting in BLUE≈22.54, which is acceptable according to the handout.
The scaffold may be written under the Python2 environment, to run in Python3, some lines need to be altered.
nmt_model.py, line 438, alter to:
1 | prev_hyp_ids = top_cand_hyp_pos // len(self.vocab.tgt) |
This problem may be caused by the different behavior of operator slash and double slash between Python2 and Python3. Additionally, to run the test script, you should first make a directory called output.
In the following part, I’d like to review coding skills and some nasty problems.
LSTM and LSTMCell
Pytorch provides both LSTM and LSTMCell, LSTM accepts the sequential data elements simultaneously, and will get a bunch of hidden states. LSTMCell gives you the ability to manually operate one LSTM cell at a step. LSTM could be configured as bidirectional and multiple layers. If bidiretional=True, it’s a double layer RNN, one forward, the other backward, which leads hidden state and cell state to double-sized.
Pad and Pack
Sequential data should be padded to the same length before feeding to RNN. However, pack_padded_sequence packs a batch of padded sequence and send them to the RNN, pack_padded_sequence does vice versa.
Squeeze and Unsqeeze
These two methods are widely used for tensor reshaping. It is strongly advised to specify which dimension this operation is needed. By default, squeeze will compress all 1-item dimensions, which may cause unpredictable consequences.
本文作者:MyTech::Author