Reinforcement learning is going mainstream. Here’s what to expect.

Ryan Gross
Vice President, Chicago Office

I was at the AWS re:MARS conference in Las Vegas last week, and the theme of the week was how the combination of Machine Learning, Automation, and Robotics (sometimes in Space) will shape the future. Many people may think that the star of the show was Robert Downey Jr, but in my mind it’s Simulation & Reinforcement Learning, which showed up in nearly every keynote speech at the conference:

Day 1: Using Reinforcement Learning, Boston Dynamics robots have learned to do backflips, jump up onto ledges and lift data. Disney Imagineering has taken this to the next level with humanoid robots performing death-defying stunts.

Day 2: Amazon uses simulation to train models on difficult scenarios in their Go Stores. Amazon fulfillment center robots are trained to sort packages using Reinforcement Learning. Alexa uses simulated interactions to automatically learn the flow of conversations. Amazon Drone delivery uses simulated data to train the model for detecting people below the drone. Companies like Insitro have started to use RL to solve biomedicine problems by generating bio-interaction data.

Day 3: Andrew Ng calls out Meta Learning, where 100s of different simulators are used to build more generalizable Reinforcement Learning agents, as a “next big thing” in AI. Self driving car companies Zoox and Aurora use RL & meta-leraning to address the complexities of driving in urban environments. DexNet is attempting to build a massive dataset of 3D models by to help with grasping problems using simulations. Jeff Bezos agrees with Daphne Koller that RL bioengineering will huge in 10 years.

To summarize all of the above:

If the tasks in a field can be accurately simulated, Reinforcement Learning will dramatically improve the state of the art in that field over the next few years.

Where does physics come in?

If you read my earlier posts, you’ll know that my first daughter is four years old. This puts her firmly into the “Why” stage of life, where her brain shifts from simply learning tasks to wanting to understand everything about the world. Being the huge nerd that I am, I’ve always joked that I’ll be happy to take her all the way down the rabbit hole to physics every time she asks “why”. As it turns out, my parents were wrong, and there is a point where even a four-year-old gets bored asking “why?” and move on to something more interesting like coloring or pretending to read a book. Here is a typical exchange:

Created using http://cmx.io

What does any of that have to do with data science?

I recently watched Jeff Dean’s talk on the state of deep learning from this year’s Google I/O conference. He mentioned that neural networks had been trained to approximate the results of physics simulators and come to results 300,000x faster, with the potential for researchers to test 100M molecules over lunch.

Image Source: Jeff Dean presentation at Google I/O 2019

This is a huge advancement, as it should allow us to use the same reinforcement learning techniques that were the star of the show at re:MARS to address a huge new range of problems. Prior to these advancements, the cycle time of running a full physics simulator for each potential reward would simply take too long for RL to ever reach a reward state. Now, RL should be able to learn the physical properties of molecules that will optimize the intended properties for chemical engineers.

Image Source

Given that everything can be reduced to physics, I am starting to imagine a world where it will be possible to build many more solutions from first principles. Going into the conference I had assumed that Biology was out of reach for simulations for years, but I just learned that there are already companies like Insitro starting to tackle this problem today.

Other recent developments should only serve to accelerate us to the future state where RL can be used in “higher level” sciences like psychology:

1. Raw compute power:

Google also released a private beta of their T3 TPU Pods, with over 100 Petaflops of processing power that is custom-built to run neural network training architectures. With that kind of power, tasks like materials analysis can be learned quickly. Additionally, Google has started using RL to design the chips themselves, which should lead to additional improvements over time.

2. Better reusability:

DeepMind is working on multi-layer network architecture where an initial RL agent selects the proper downstream network for a given task. This type of RL agent could be trained to break down a high level task into components and solve multiple tasks using transfer learning.

3. Better generalization:

The aforementioned MetaLearning techniques are improving the ability of RL agents to adapt to scenarios that they have not seen before.

4. Better optimization:

The Lottery Ticket Hypothesis paper from MIT has shown that Neural Networks can be further compressed by finding the “Winning Ticket” paths and then retraining using only these paths.

5. Better training data generation:

Interfaces like AutoCad’s Generative Design can help designers / engineers discover the specifications that need to be supplied so the RL agent moves in the right direction. Self driving car companies generate new training scenarios each time that a person has to take over.

What should you do about it?

Image Source

First off, you should go learn about Reinforcement Learning: There are many great tutorials and courses that can teach you the intuition and inner workings of RL, so we’ll keep it brief here: RL agents get an interpreted sate of their environment, choose an action that affects the environment, observe new environmental state, and repeat the process. If that action leads to a positive outcome, the agent is given a reward, and is more likely to choose the same set of actions given similar states in the future.

This is repeated for many episodes, and eventually, the agent becomes very good at getting rewards (and therefore at the task we were training it for). One of the best ways to supplement this experience with something hands-on is to use the AWS Deep Racer, which is a scaled down race-car that provides a simulated environment, a RL training setup, and a physical piece of hardware corresponding to the simulation. You simply play with the reward function to train your racer agent. 

Image Source

Second, you should start actively searching ways to simulate systems in your business that could be optimized. Any existing simulators are an excellent place to start, but new simulators will likely be hugely impactful. Again AWS has a service called RoboMaker in this area, but there are many alternatives, most of which are based on Open API Gym

Finally, you should be on the lookout for new companies that are riding this technology wave. I expect that there will eventually be a series of open-source simulators developed to build upon each-other, with deep neural networks used to compress the key information that can be learned at each layer. Before that there will likely be proprietary solutions that leapfrog the state of the art in many fields. Over Time, this will unlock massive gains in scientifically based fields such as pharmaceuticals, material science, medicine, downstream rel="noopener noreferrer" oil & gas, and many others.

Article originally published on Ryan's Medium page.

Ryan Gross

About the Author

Ryan Gross Vice President, Chicago Office
Mr. Ryan Gross brings his passion for technology, problem-solving, and team-building together to build quality solutions across many industries. He has taken a generalist approach in solutions ranging from native mobile apps to enterprise web applications and public APIs. Most recently, he has focused on how the cloud enables new application delivery mechanisms, developing and applying a Continuous Experimentation development approach to build a cloud-native IOT data ingestion and processing platform. Ryan strongly believes in a virtuous cycle of technology enabling process and team efficiency to build improved technology.

More Perspectives

Perspective
by Kent Norman and Kevin Moormann
I’ll be Home in Five Story-points
Perspective
by Kerry Stover
I Know You Believe You Understood What You Think I Said...
Perspective
by Adrian Kosiak
Lessons from Dior on Becoming a Premium Brand
Perspective
by Margaret Rogers
Failing Fast Is Fine — As Long As You’re Failing Well, Too
Perspective
by Allison Esenkova
Wearing Heels to the Top
Perspective
by David Watson
Work Life Balance
Perspective
by Sean McCall
4 Ways Sports Can Benefit Careers
Perspective
by Russell Clarkson
Stand Up for Good Presentations
Perspective
by Sean McCall
Forget Coffee: Energize Your Work Morning
Perspective
By Alexandria Johnson
The Hottest Thing at SXSW You Learned it in Kindergarten
Perspective
by Bruce Ballengee
Developing the Individual
Perspective
by Lori Dipprey
Why Performance Review are Here to Stay at Pariveda
Perspective
by Mike Strange
4 Reasons to Leverage the Power of Small Teams
Perspective
by David Watson
The Benefits of Working in Teams
Perspective
by Sean McCall
The Architecture of a Selfless Team
Perspective
by Nathan Hanks
What it Means to be in the People Business
Perspective
by Bruce Ballengee
Unleashing the Power of Humility
Perspective
by Russell Clarkson
Mark Your Exits
Perspective
by Russell Clarkson
Is your Ecosystem a pipeline or a platform?
Perspective
by Bruce Ballengee
Teaching Roots Run Deep
Perspective
by Samantha Nottingham
Brandsparency: Who Builds Brands These Days?
Perspective
by Dimitrios Vilaetis
Business Capabilities: a Spotlight for Strategic Decision Making
Perspective
by Derrick Bowen
Stop Complaining About Changing Requirements
Perspective
by Tim Hurst
Unlocking Marketing ROI Analytics
Perspective
by Brian Duperrouzel
Hippos and Cows are Stopping Your Machine Learning Projects
Perspective
by Jack Warner
Building Smart Deployment Pipeline
Perspective
by Marc Helberg
3 Ways You Can Begin to Take Patient Experience More Seriously
Perspective
by Ryan Gross
What Does It Really Mean to Operationalize a Predictive Model
Perspective
by Tom Cunningham
The Sound of the Future: Predictive Analytics in the Music Industry
Perspective
by Kent Norman
Limit, to Win It - How Putting Limits on Your Team Will Allow Them to Do More
Perspective
by Sophie Zugnoni
Did You Catch Machine Learning Fever?
Perspective
by Susan Paul
Capabilities as Building Blocks
Perspective
by Susan Paul
When Corporate Culture Undermines IT Success
Perspective
by Margaret Rogers
Identifying the Value of Nonprofit Customer Experience
Perspective
by Margaret Rogers
Why an Agile Mindset is at the Root of an Excellent Guest Experience
Perspective
by Collins DeLoach
What does Cloud Transformation mean to IT?
Perspective
by Mike Strange
Untangling and Understanding DevOps
Perspective
by Clayton Rothschild
Blockchain in an Enterprise Setting
Perspective
by Mike Strange
DevOps: A Practical Framework for Technology Process and Organizational Change
Perspective
by Julio Santos
Context as Currency
Perspective
by Oussama Housini
Why DevOps?
Perspective
by Dave Winterhalter
Data in the Dugout
Perspective
by Mike Strange
Can We Predict the Future?
Perspective
by Julio Santos and Jon Landers
How Customer Context and Smarter Algorithms will Power the Next Level of Experiences and Engagement
Perspective
by Victor Diaz
6 Things to Consider when Choosing a Data Visualization Library
Perspective
<p>by Brian Duperrouzel</p>
Post Cloud and the Lonely CIO
Perspective
by Marc Helberg
How AI Will Affect Your Patient’s Experience
Perspective
by Phillip Manwaring
Let Serverless Solve the Tech Problems You Don't Have
Perspective
by Mike Strange
Bigger is Not Necessarily Better
Perspective
by Sean Beard and Brian Orrell
Life After Mobile
Perspective
by Margaret Rogers
How to Tell the Hype from the Digital Reality
Load More