Embedding Layers in Deep Learning

Multi-dimensional descriptions

First a little background for context - Deep Learning differs from other machine learning algorithms in that it loosely mimics the function & structure of the brain using artificial neural networks. It’s been most notably used for Computer Vision & Natural Language Processing.

A lesser known application for it is with structured or tabular data. Think - a spreadsheet with thousands upon thousands of rows of information on say, housing sales, or hotel room bookings, or whatever your spreadsheet-lovin’ heart desires.

So what’s an embedding layer within a Neural Network?

It’s amazing is what it is. No really….take a look at this.

Let’s imagine you have some data on properties, and you have a field that looks like this:

You want to predict how much these properties will sell for and this field is a crucial part of the puzzle. But first you have to figure out how to turn these descriptions into numbers (all the information fed into a Machine Learning algorithm or Deep Learning network has to be numerical). How would you do it? It’s tempting to just label each one 1, 2, 3, 4…

But that makes no sense because it implies there is some inherent value to each Property Type, but there isn’t– e.g. Derelict properties are not inherently ‘greater’ than Commercial Offices.

The traditional route is called ‘One Hot Encoding’ which reformats the table like this:

It turns it into a 1 Dimensional issue, it either is or it isn’t a Residential House/Commercial Office. We can work with that but there are issues. Imagine we had 10000 or more property types. That’s a lot of data to process. But also, crucially it isn’t very rich in meaning.

It doesn’t tell us anything about the fact that a Residential House and a Residential Tenement have similarities (both are residential). How the jiggins do you capture that? You introduce space as a concept. If that sounds a bit sci-fi – picture it like this:

In fact, you can represent a feature in hundreds of dimensions, but let’s not burst a neuron imagining it. Instead, have a look below at a 3D visualisation of NLP using the same embedding process to represent words – it makes intuitive sense right? Embedding allows a network to capture how concepts exist in relation to one another

But how do we figure out this spatial relationship?

The good news is we don’t have to sit around deciding how close one feature is to another - the network just does it.

Here’s how - let’s say we’re going to predict the Sale Price of these properties, and we assign each of these Property Types 3 numbers each (we can picture them as X,Y and Z coordinates in space if you like). We begin by setting each of these numbers randomly.

Our network runs and tries to predict the Sale Price. Initially this is about as accurate as tossing a coin in the air – but once it spits out an answer it can then determine how far wrong it was from the true answer and tweak all of our Property Type’s numbers to try and make a better prediction. It will keep running and reducing its error margin until it hits it’s optimum

This process automatically groups the Product Type’s (and any other feature you can think of) by similarities in different dimensions.

Let’s sum it up

If you have a lot of reliable data with categorical features then deep learning with embedding layers is your friend. You spend less time on feature engineering and tweaking parameters, you don’t need to be an expert in the space and you can get great results. You can also take your learned embeddings and use them in other ML algorithms as input.

Embedding Layers automatically improve your models because they make your features richer in meaning, it’s giving you insight into relationships in your data. You might find some of what comes out if it surprising or maybe it just confirms what you suspected. Either way, at this point no other technique for handling categoricals is able to provide this kind of nuance.