Google has open sourced its Embedding Projector, a web application that gives developers a way to visualize data that’s being used to train their machine learning systems.
Embedding Projector is part of TensorFlow, the machine learning technology behind some popular Google services like image search, Smart Reply in Inbox and Google Translate. The company released TensorFlow to the open source community last year in order to spur more development activity in the field.
In a technical paper, Google researchers described the Embedding Projector as an interactive visualization tool that developers can use to interpret machine-learning models that rely on what are known as “embeddings.”
“With the widespread adoption of ML systems, it is increasingly important for research scientists to be able to explore how the data is being interpreted by the models,” Google engineer Daniel Smilkov said in Google’s open source blog.
Embeddings are basically mathematical vector representations of different facets of data like images, words and numerals. It is way to translate data into a format that machine- learning algorithms can process.
For example, in order to train a machine-learning model to predict the meaning of text, developers might create a word embedding comprised of a large collection of words, each represented by a numerical value in 3D space and in close proximity to words with contextual similarity.
“Machine learning researchers and developers often need to explore the properties of a specific embedding to understand the behavior of their model,” Google researchers explained in a recent technical paper.
“An engineer who creates an embedding of songs for a recommendation system might want to verify that the nearest neighbors of 70’s era rock band Led Zeppelin’s “Stairway to Heaven” include “Whole Lotta Love” and not “Let It Go” from Frozen”.
An efficient way to do this type of analysis, according to the researchers is to visualize embeddings, using a tool like Embedding Projector. The tool allows developers to navigate through 3D and 2D views of their data to see how well their machine learning algorithms are doing at interpreting it.
Embedding Projector gives developers three different ways of enabling 3D and 2D views of a data set. The tools allows developers to explore data for the most influential data elements and make sure than an embedding preserves the original meaning of the data, according to Google.
With this week’s announcement Embedding Projector is now available to anyone as a standalone web application or integrated into the TensorFlow platform.
Users can upload their own data into the tool or explore its capabilities by using sample datasets provided by Google. The goal is to give the research community and developers a way to explore machine-learning applications and to refine them. It will also give developers a way to better understand how machine-learning algorithms interpret data sets, the researchers said.