- Convolutional networks we have seen receive a single input (e.g., each image) and produce a single output.
Recurrent Neural Networks
- For some problems, sequential information is available,i.e., several inputs from which a single output has to be produced a single output (e.g., weather forecast).
- Problem: Weather forecast.
Recurrent Neural Networks
- Problem: Time series prediction.
Recurrent Neural Networks
- In other problems, a single input is avalaible and an output that is itself a sequence is needed.
- Problem: Music generation according to genre.
Recurrent Neural Networks
- RNNs can also receive a single input (e.g., each image) and produce a single output.
Recurrent Neuron Computation
\[
{\bf{s}}_{(t)} = f_W \left ( {\bf{s}}_{(t-1)}, {\bf{x}}_{(t)} \right)
\]
- \( {\bf{x}}_{(t)}\): Input.
- \( {\bf{s}}_{(t-1)}, {\bf{s}}_{(t)}\): States
at times \(t-1 \) and \( t \).
- \( f_W \): Current state of the neuron depends on the previous state and the input through a function parameterized by \( W \) .
Simple Recurrent Neurons
Memory cells
- The cell output is also a function of the current inputs and the previous state.
- For simple cells the output is equal to cell's state.
- For more complex cells the output of the cell and the cell's state do not coincide.
Recurrent neuron as memory cell
\[
{\bf{s}}_{(t)} = f_W \left ( {\bf{s}}_{(t-1)}, {\bf{x}}_{(t)} \right)
\]
- \( {\bf{x}}_{(t)}\): Input.
- \( {\bf{s}}_{(t-1)}, {\bf{s}}_{(t)}\): States
at times \(t-1 \) and \( t \).
- \( f_W \): Current state of the neuron depends on the previous state and the input through a function parameterized by \( W \) .
A. Geron. Hands-On Machine Learning with Scikit-Learn and TensorFlow. Concepts, Tools, and Techniques to Build Intelligent Systems. O'Reilly. 2017.
Recurrent Neuron Computation
\[
{\bf{s}}_{(t)} = \phi \left ( W_{s,s} \, {\bf{s}}_{(t-1)}
+ W_{x,s} \, {\bf{x}}_{(t)} \right)
\]
\[
{\bf{y}}_{(t)} = {\bf{s}}_{(t)} W_{s,y}
\]
- \( {\bf{x}}_{(t)}\): Input.
- \( {\bf{s}}_{(t-1)}, {\bf{s}}_{(t)}\): States
at times \(t-1 \) and \( t \).
- \({\bf{W}}_{s,s}\),\({\bf{W}}_{x,s}\), \({\bf{w}}_{s,y}\): Weights for the state, the input, and the output, respectively.
- \(\phi() \): Activation function.
Recurrent Neuron Computation
\[
{\bf{s}}_{(t)} = \phi \left ( W_{s,s} \, {\bf{s}}_{(t-1)}
+ W_{x,s} \, {\bf{x}}_{(t)} \right)
\]
\[
{\bf{y}}_{(t)} = {\bf{s}}_{(t)} W_{s,y}
\]
- \({\bf{W}}_{s,s}\),\({\bf{W}}_{x,s}\), \({\bf{w}}_{s,y}\) are shared along all time steps .
- Some simple RNN cells do not use \( W_{s,y} \) and assume that \( {\bf{y}}_{(t)} = {\bf{s}}_{(t)} \)
RNN problem
- \(t=1: \; \; \; {\bf{s}}_{(t)} = \phi \left ( W_{s,s} \, {\bf{s}}_{(t-1)}
+ W_{x,s} \, {\bf{x}}_{(t)} \right) \) .
RNN problem
- \(t=2: \; \; \; {\bf{s}}_{(t)} = \phi \left ( W_{s,s} \, {\bf{s}}_{(t-1)}
+ W_{x,s} \, {\bf{x}}_{(t)} \right) \) .
RNN problem
- \(t=3: \; \; \; {\bf{s}}_{(t)} = \phi \left ( W_{s,s} \, {\bf{s}}_{(t-1)}
+ W_{x,s} \, {\bf{x}}_{(t)} \right) \) .
RNN Variants
- Produce an output at each time step and have recurrent connections between hidden units.
- Produce an output at each time step and have recurrent connections only from the output at one time step to the hidden units at the next time step.
- Recurrent connections between hidden units, that read an entire sequence and then produce a single output.
One input to one output
- Example of application: Next character prediction.
Recurrent connections between hidden neurons
Connection between hidden neurons
\begin{align}
{\bf{a}}^{(t)} &= {\bf{b}} + {\bf{W}}{\bf{h}}^{(t-1)} + {\bf{U}}{\bf{x}}^{(t)} \\
{\bf{h}}^{(t)} &= tanh({\bf{a}}^{(t)}) \\
{\bf{o}}^{(t)} &= {\bf{c}} + {\bf{V}}{\bf{h}}^{(t)} \\
\hat{{\bf{y}}}^{(t)} &= softmax ({\bf{o}}^{(t)})
\end{align}
Connection between hidden neurons
Loss function
\begin{align}
& \mathcal{L} (\{ {\bf{x}}^{(1)},\dots,{\bf{x}}^{(\tau)}\},\{{\bf{y}}^{(1)},\dots,{\bf{y}}^{(\tau)} \} \\
=& \sum_t \mathcal{L}^{(t)} \\
=& \sum_t \log p_{rnn} \left( {\bf{y}}^{(t)} \mid \{{\bf{x}}^{(1)},\dots,{\bf{x}}^{(\tau)} \} \right)
\end{align}
Connection from output to hidden neurons
Many inputs to only one output
Many inputs to only one output
- Example of application: Time Series Classification.
Recurrent Layers
Characteristics
- A layer of recurrent functions is formed by joining \(n_{neurons} \;\) recurrent neurons.
- The neurons that multiply the input features by the weight matrix, as usual, and apply the activation function afterwards.
- All neurons in the layer will receive as input the original input \(x_t\) plus the concatenation of the outputs of the neurons in the previous layer.
Recurrent layers
Output recurrent layer
-
\[
\begin{align}
{\bf{S}}_{(t)} &= \phi \left( {\bf{X}}_{(t)} \cdot {\bf{W}}_{(x,s)} + {\bf{S}}_{(t-1)} \cdot {\bf{W}}_{(s,s)} + {\bf{b}}_s \right)
\end{align}
\]
- \( {\bf{S}}_{(t)} \): is an \( m \times n_{neurons} \; \) matrix where \(m\) is the size of the minibatch and \(n_{neurons} \; \) is the number of neurons in the layer.
- \( {\bf{X}}_{(t)} \): is an \( m \times n_{inputs} \) matrix where \(n_{inputs}\) is the number of input features.
- \( {\bf{W}}_{(x,t)} \): is an \(n_{inputs} \times n_{neurons} \; \) with the weights for the input features for the current time step.
- \( {\bf{W}}_{(s,s)} \): is an \(n_{neurons} \times n_{neurons} \; \) with the connection weights between the neurons in the previous and current time steps.
A. Geron. Hands-On Machine Learning with Scikit-Learn and TensorFlow. Concepts, Tools, and Techniques to Build Intelligent Systems. O'Reilly. 2017.
Deep RNNs
Deep learning with RNNs
- A recurrent neural network can be made deep in many ways.
- Adding depthness from the input to the hidden state.
- From the previous hidden state to the next hidden state.
- From the hidden state to the output.
- Depthness can increase the capacity of the model needed in these different parts.
Deep RNNs
Deep learning with RNNs
- The hidden recurrent state can be broken down into groups organized hierarchically.
- Deeper computation (e.g., an MLP) can be introduced in the input-to-hidden, hidden-to-hidden, and hidden-to-ouput parts.
- Significant benefit can be obtained from decomposing the state of an RNN into multiple ways.
- Adding depth may hurt learning by making optimization difficult.
RNNs with context information
Ways of adding context
- A single vector \(x\) can be used as input context to condition sequence prediction.
- Can be added as an extra input at each time step or;
- As the initial state \(h^{(0)}\).
- Or be used as both of these previous possibilities.
- Context is used for tasks such as image captioning.
Encoder-Decoder
Characteristics
- Similar to the sequence model.
- It has an initial encoding stage with no output signal to predict.
- In the encoding stage, the input sequence is consumed to produce a hidden state.
- It the decoding stage, an output signal is produced, one timestep at a time, conditioned on the hidden state from the encoding stage .
RNNs. Sequence to sequence (encoder-decoder) models
- Example of application: Text transcription.
Bidirectional RNN
Characteristics
- Two layers of hidden nodes are connected to input and output.
- The first has recurrent connections from the last time.
- The second has recurrent connections from the future.
- It requires a fixed point in the future and the past.