Dimensionality Reduction by Autoencoder — a neural network architecture

Nazrul Miya
5 min readJun 22, 2021

Autoencoder or Encoder-Decoder model is a special type of neural network architecture that mainly aims to learn the hidden representation of input data in a lower-dimensional space. The model copies its input to its output.

Structure of an Autoencoder

Input data is encoded to a representation (h) through the mapping function f. The function h = f(x) id encoding function.

The encoded representation(h) is decoded back to output y. Where y is as similar as possible to the input x. The function y = g(h) is decoding function.

Autoencoder tries to reconstruct its input in the output layer, by learning the representation of the input. The approach is to minimize the loss which is the difference between input and output.

Note: Autoencoder does not copy or reconstruct its input perfectly. Usually, they are restricted in ways that allow them to copy only approximately, and to copy only input that resembles the training data. Because the model is forced to prioritize which aspects of the input should be copied, it often learns useful properties of the data.

An autoencoder comprises two components — an Encoder and a Decoder. The encoder encodes its input to internal representation and the decoder decodes this representation back to its input as closely as possible.

The basic architecture of autoencoder

Autoencoder is a feed-forward non-recurrent neural network comprised of an input layer, one or more hidden layers, and an output layer. The autoencoder is trained on input data to learn its representation.

The output layer has the same number of nodes(neurons) as its input layer because its purpose is to reconstruct its input.

mapping functions in autoencoder

Encoder mapping:

h = f (input_vector * weight_vector + bias) = f (X.W + b).

Decoder mapping:

y = g (encoded_vector * weight_vector + bias) = g (H.W` + b`) , where H is the encoded vector from hidden layer and W` is the weights associated to hidden layer neurons.

Loss function:

L = squared difference between input and output(reconstrcted input)

= || ( (X — Y)||²

= || X — ( g ( f (X.W + b) . W` + b`))||²

The autoencoder neural network is trained on input training data, learns accurate weights through backpropagation, and aims to minimize this reconstruction loss,

Controllable parameters:

The number of hidden layers (capacity) and the number of neurons in hidden layers (size) are two factors that can be set while implementing an autoencoder neural network.

If hidden layer size or dimension is lesser than input layer’s then the autoencoder model is called undercomplete autoencoder. And when hidden layer’s dimensions are larger or the capacity of hidden layers is huge then the autoencoder models are called overcomplete autoencoders.

Applications of Autoencoder:

  1. Dimensionality Reduction
  2. Anomaly Detection
  3. Machine Translation
  4. Image processing (denoising images)

Demo (Dimensionality Reduction):

I will implement an autoencoder neural network to reduce the dimensionality of the KDD 2009 dataset.

The data set has 50,000 observations and 230 features (190 numerical and 40 categorical). The purpose of this autoencoder model is to reduce dimensions from the dataset to 2.

The following steps need to be executed in order

  1. Loading KDD dataset
  2. Missing value treatment to deal with missing values
  3. Numerical features selection (Excluding categorical features in this example, however, categorical features should be encoded before using it in the autoencoder model)
  4. Normalization to bring features in a common range
  5. Finally, an autoencoder model is trained on the dataset

I will use undercomplete autoencoder for this example and architecture of which is below,

The size of hidden layers is less than the size of the input layer. Also, the size of the output layer must be the same as the size of the input layer, because the model aims to get the input back as an output. The encodings layer size is set to 2 because I want to bring the features in a 2-dimensional space.

Python Code:

Importing Libraries
Loading Data, Missing Value Treatment, Numerical Features Selection, and Normalization
Encoder-Decoder Model Building
Model Compilation
Initial Loss
Model Training (Learning weights and biases from training data, through backpropagation)
Post-training loss

Post-training loss is much lesser than initial loss because the model has learned optimal weights and biases by getting trained on training data.

Training loss and validation loss curve

The model has started converging as the epochs increase.

Let's now visualize the learned representation. So when the input data is passed to the encoder component, it returns a 2-dimensional feature representation.

Encodings
2-D visualization of feature representation

The task is to reduce dimensions in a way such that the reduced representation represents the original data. Here the task is not to identify clusters. hence I am not trying to interpret clusters from the above visualization.

Note: Different representation is produced by the model on each run because initial weights and biases are initialized with different values at each run.

Problem with undercomplete autoencoder

An autoencoder is useless when it reconstructs its inputs exactly. Meaning it memorizes training data as it is without learning any useful structure or pattern or information from data.

Under complete autoencoder restricts the model from memorizing input data, by limiting the number of neurons in the hidden layer and the size of encoder, decoder components.

In the above example, the input layer dimension was 42 (after preprocessing data and selecting only numerical features), I used lesser neurons (32–2 -32) in the hidden layers, and also the capacity of the encoder and decoder components was very limited. Used only 3 hidden layers. This is one way to ensure that model is not simply memorizing the exact input data.

Another way to limit autoencoder from being simply memorizing input data is by adding a regularizationion term in the loss function. I will demonstrate this in my next story.

--

--