
RNNs
RNNs are primarily popular for many NLP tasks (even if they are currently being used in different scenarios, which we will talk about in Chapter 6, Recurrent Neural Networks). What's different about RNNs? Their peculiarity is that the connections between units form a directed graph along a sequence. This means that an RNN can exhibit a dynamic temporal behavior for a given time sequence. Therefore, they can use their internal state (memory) to process sequences of inputs, while in a traditional neural network, we assume that all inputs and outputs are independent of each other. This makes RNNs suitable for cases such as those, for example, when we want to predict the next word in a sentence – it is definitely better to know which words came before it. Now, you can understand why they are called recurrent – the same task is performed for every element of a sequence, with the output being dependent on the previous computations.
RNNs have loops in them, allowing information to persist, like so:

In the preceding diagram, a chunk of the neural network, H, receives some input, x and outputs a value, o. A loop allows information to be passed from one step of the network to the next. By unfolding the RNN in this diagram into a full network (as shown in the following diagram), it can be thought of as multiple copies of the same network, each passing information to a successor:

Here, xt is the input at time step t, Ht is the hidden state at time step t (and represents the memory of the network), and ot is the output at step t. The hidden states capture information about what happened in all the previous time steps. The output at a given step is calculated based only on the memory at time t. An RNN shares the same parameters across every step—that's because the same task is performed at each step; it just has different inputs—drastically reduces the total number of parameters it needs to learn. Outputs aren't necessary at each step, since this depends on the task at hand. Similarly, inputs aren't always needed at each time step.
RNNs were first developed in the 1980s and only lately have they come in many new variants. Here's a list of some of those architectures:
- Fully recurrent: Every element has a weighted one-way connection to every other element in the architecture and has a single feedback connection to itself.
- Recursive: The same set of weights is applied recursively over a structure, which resembles a graph structure. During this process, the structure is traversed in topological sorting (https://en.wikipedia.org/wiki/Topological_sorting).
- Hopfield: All of the connections are symmetrical. This is not suitable in scenarios where sequences of patterns need to be processed, as it requires stationary inputs only.
- Elman network: This is a three-layer network, arranged horizontally, plus a set of so-called context units. The middle hidden layer is connected to all of them, with a fixed weight of 1. What happens at each time step is that the input is fed forward and then a learning rule is applied. Because the back-connections are fixed, a copy of the previous values of the hidden units is saved in the context units. This is the way the network can maintain a state. For this reason, this kind of RNN allows you to perform tasks that are beyond the power of a standard multilayered neural network.
- Long short-term memory (LSTM): This is a DL that prevents back-propagated errors from vanishing or exploding gradients (this will be covered in more detail in Chapter 6, Recurrent Neural Networks). Errors can flow backward through (in theory) an unlimited number of virtual layers unfolded in space. This means that an LSTM can learn tasks that require memories of events that could have happened several time steps earlier.
- Bi-directional: By concatenating the outputs of two RNNs, it can predict each element of a finite sequence. The first RNN processes the sequence from left to right, while the second one does so in the opposite direction.
- Recurrent multilayer perceptron network: This consists of cascaded subnetworks, each containing multiple layers of nodes. Each subnetwork, except for the last layer (the only one that can have feedback connections), is feed-forward.
Chapter 5, Convolutional Neural Networks, and Chapter 6, Recurrent Neural Networks, will go into more detail about CNNs and RNNs.