Unsupervised training, why it would and wouldn’t work

When AlphaGo announced AlphaGo Zero in 2017, it took the world by storm. AlphaGo had already beaten the best of the world’s Go players and was now beaten by AlphaGo Zero 100 games to 0 after 3 days of unsupervised training. The victory of AlphaGo Zero over AlphaGo demonstrated the power and benefits of unsupervised training. It showed how a machine can work, without human intervention and without training data, working independently to figure out the best way to approach a given problem, and the KPI for success.

Since then I have gotten many questions from my customers: Can chatbots and email bots be trained unsupervised? Can digital employees figure out on their own how to best perform their job?

It’s a fact that unsupervised training is a major research area within Simplifai. Within a year, I am confident that we will achieve this. However, the market still has a way to go in order to commercialize AI solutions that are not in need of AI trainers.

The reason is simple. AlphaGo Zero played itself several million times. Which means that AlphaGo Zero has failed millions of times, and by this getting the necessary feedback through the failures automatically, thus adjusting its playing in order to be better. This begs the question: How many businesses would allow their AI solution to fail millions of times before it finally becomes “good”?

Unsupervised training does not require data, but it still needs feedback. In the game of chess and Go, the feedback is simple, you either win or you lose. However, in the business world, there could be several different outcomes, thereby posing challenges with providing feedback automatically. Businesses will also have trouble providing “real” feedback for the bot to learn from, let alone millions of feedbacks. This is going to be a major bottleneck of commercializing unsupervised learning in the foreseeable future.

Knowing that unsupervised learning will only work in the following settings:

  • Situations where an objective, clear KPI exists (like ROI on investment portfolio)
  • Situations where it is possible to gather feedback without disturbing business operations

With that in mind, I can imagine most of the number-crunching cases, like managing an investment portfolio, or predicting the lifespan of machinery, will be achievable through unsupervised learning. On the other hand, in NLU-related cases such as chat and email communication, it would be much more difficult to implement unsupervised training, simply because customer relations are too business-critical to allow a bot to go through a process of trial and error.

What do you think?


Erik Leung

Erik Leung,
SImplifai AS