Introduction to Graph Embedding

Louis Wang
3 min readDec 5, 2018

--

Graph:

Graphs, such as social networks, user-item networks, occur naturally in various real-world applications. Just using nodes and edges and their properties, we can find the relationship between many things: basketball players and teams, film and publishers, startup and fundings. Therefore, Graph analysis has been attracting increasing attention in the recent years due the ubiquity of networks in the real world. For example, in a e-commerce website, we can observe the customer-product network to find customer similarities, item similarities, and customer’s affinity on different products. Graph analysis tasks can be broadly abstracted into the following three categories:

(1) node classification.

(2) link prediction.

(3) clustering.

Node classification is to classify the label of nodes based on other labeled nodes and the structure of the network. There are broadly two categories of approaches — methods which use random walks to propagate the labels, and methods which extract features from nodes and apply classifiers on them.

Link prediction is a task of predicting missing links or links that are likely to occur in the future. For example, in a user-item network, link prediction is to predict whether a customer will click or purchase an item in the future. Approaches for link prediction include similarity based methods, maximum likelihood models, and probabilistic models.

Clustering problem is related to find similar entity within the network and group them together. Clustering methods include attribute based models and methods which directly maximize the inter-cluster distances.

Traditionally we just use the original graph adjacency matrix to build models to solve graph-based problems. However, it is definitely a problem if we have billions of nodes and edges. Recently, inspired by Google’s word2vec, the methods based on representing networks in vector space, while preserving their properties, have become widely popular. The embeddings are input as features to a model and the parameters are learned based on the training data.

Graph Embedding:

An embedding maps each node to a low-dimensional feature vector and tries to preserve the connection strengths between vertices. Here are broadly three types of graph embedding methods:

(1) Factorization based.

(2) Random Walk based.

(3) Deep Learning based.

The Factorization based methods, which are directly inspired by classic techniques for dimensionality reduction, represent the connections between nodes in the form of a matrix and factorize this matrix to obtain the embeddings. The methods include Locally Linear Embedding(LLE), Laplacian Eigenmaps(LE), Cauchy Graph Embedding(CGE), Structure Preserving Embedding(SPE), Graph Factorization(GF). GraRep, and HOPE.

In Random Walk based approach, the graph embedding is generated by constructing a matrix based on the nodes and edges of a given graph.

The Deep learning based methods (my favorite one) is developed with the recent success in neural networks. Unlike the factorization methods, which computes the embeddings using linear functions on the graph, deep learning methods allow us to compute non-linear functions on the graphs, which can lead to improved performance.However, deep learning based methods may be more difficult to set up than factorization or random walks graph embedding methods. Some examples for deep learning graph embedding methods include using an auto-encoder to generate a low-dimensional representation of the data (SDNE), using graph convolutional networks (GCN) , using deep neural networks for generating graph representations (DNGR), and using a graph convolutional network encoder and an inner product decider to evaluate the performance of variational graph auto-encoders(VGAE).

Useful links to explore Graph Embedding Models:

--

--

Louis Wang
Louis Wang

Written by Louis Wang

Machine Learning Engineer @ Snap. I write and talk about ML, and others.

No responses yet