Ours is the decade of democratization: it’s getting easier and easier to create awesome applications and models with less and less experience. But even though Machine Learning is getting simpler to write, models tend to keep their utility in code. For ML to be truly democratized, accessible and understood, it needs to be visual.
What is visual Machine Learning?
Visual Machine Learning is modeling that uses visual elements to communicate architecture, process, or results. A simple example might relate to a model that recognizes handwriting as text: we can test it by outputting results in code, but a visual output might overlay the recognized text over the handwritten text. This is a simple example, but it’s just the start.
There are visuals all over the growing Machine Learning community. The most basic and useful iterations so far have been visual explainer posts: Machine Learning is hard to understand at first, and visual aids have become a popular way of making the discipline more accessible. A classic example is this viral post from R2D3 that uses interactive data visualizations to explain how ML models work. Gene Kogan from OpenFrameworks built a tool to visualize layers of a Neural Network, too.
(One of R2D3’s visualizations for ML classification boundaries)
Another exciting area where Machine Learning meets visuals is visual training: practitioners are using visual interfaces to train new types of models that might have a harder time with a static training set. Google created an app that guesses what you’re drawing as you’re drawing it, and it’s called “Quick, Draw!” The idea is that it becomes more of a game, and it seems pretty fun. Google is also working on a model that generates drawings based on your speech.
The message is pretty clear: visuals are a growing part of modern Machine Learning. But why? What do interactive, visual outputs and explanations offer over the normal blog post or text output?
Why Machine Learning needs to be visual
The answer is simple: humans are visual creatures.
Our brains react in deeper, subconscious ways to visuals than to lower dimensional media.
Research shows that people tend to get a lot more out of visual content than textual content:
To some degree, people can visual learners instead of textual learners The overwhelming majority of information our brain processes is visual Some studies may show that people retain 80% of what they see, but only 20% of what they read
Research in this area is obviously subjective, but the gist is directionally correct: we have a soft spot for visuals. The more we can take advantage of multiple methods of understanding and communication, the more accessible what we create can be, and Machine Learning is no exception.
Technology is becoming more visual and camera centric, and Machine Learning can latch on to this exciting change.
Of all of the reckonings that technology has caused over the past few decades, the shift to visual media (video, infographics, etc.) may be one of the most impactful. Media companies have been shutting down by the dozen and rushing to “pivot to video” – forward thinking organizations have been focusing on visual content and the new avenues it allows for communication.
(Apple’s Lego AR demo from their 2018 WWDC)
This is part of why AR and VR are so promising – their current issues aside – and why apps like Snapchat and Instagram effectively now rule media.
Visual prototyping can give us quicker and more tangible ideas about our results.
It’s not exactly crystal clear how to evaluate ML models, especially when you’re dealing with unsupervised learning or more nebulous project goals. If you can connect your ML models – either the output or the process and architecture – to something visual, it might help you quicker understand what’s right, what’s wrong, and what your model is “thinking.”
Machine Learning for Creatives
One of the (unsurprising) areas that visual ML has taken off in is for creatives: people are creating awesome visual art through algorithms, and there are two areas that are particularly noteworthy (and cool).
Algorithms have been used to generate visual patterns for as long as they’ve been around, but things are getting even more exciting. Deep Learning purports to represent more complex and nuanced relationships between variables, and that shows through in some of the wild and beautiful things that artists have created with it.
(This piece of art was showcased at the DeepDream art show in San Francisco)
Google (surprise!) has been involved heavily in this area with their Magenta project: it’s built on TensorFlow (Google’s open source and dominant Deep Learning framework) and makes it easier to train and create visual models. There are already a bunch of cool projects using Magenta, like Magic Sketchpad: users can start drawing doodles, and a model will attempt to finish it for them.
We’re still at the early stages of what Augmented Reality (AR) is going to enable, but it’s already an exciting medium to showcase and use Machine Learning in. There are a couple of ways to look at this:
Machine Learning powers a lot of the algorithms that build AR, like depth sensing and 3D rendering AR can give a visual representation to model outputs: instead of outputting text, models can build 3D objects in AR like filters
Machine Learning can also act on inputs from cameras:
Facial recognition can go beyond classification to map depth points for advanced filters (Snapchat!) and security applications (see Google’s early implementation) Cameras can be use as inputs for visual search, like Pinterest’s visual shopping and discovery tools
Early stuff is rough around the edges, but it shows the huge potential that visual Machine Learning promises.
Visual Machine Learning companies and tools
A great way to get involved and hands on with visual ML models is to try them out yourself! There are a few tools that are beginner-friendly but expert level powerful, and you might want to check them out.
Lobe is a drag and drop tool that lets users create and ship ML models without writing a ton of code. Being able to visualize the layers of your Neural Net along with any preprocessing steps can be helpful for explaining things to stakeholders and keeping track of your work.
Flo is an iPhone app that lets you edit and create videos through speech. You can tell Flo what you want to see in your video as well as choose items through a visual input, and their models will stitch it together.
Asteroid is a fast and simple prototyping tool to make interactive AR apps for iPhone in seconds. You can connect visual inputs like images to pre-built ML models, and pipe outputs to 3D object or other AR visuals. The best part is that you can drop your prototype code into XCode when you’re ready to move to production.
(Asteroid’s visual editor)