Why is MENTAL cool?

An insane capability to generalize

In our demo one can see that we trained Mr. Character only on Latin Alphabet. Why didn’t we use a richer data set?

This is because MENTAL has an insane capability to generalize.

MENTAL is founded in the theory of practopoiesis (Nikolić 2015)

Typically, one shot-training tasks for character recognition use the Omniglot training set, which consists of a total of 50 alphabets and 1623 different characters. A more challenging task and more useful technology is if AI can learn from a single data set. We posed that as the second one-shot challenge to our technology. Please read our “How does it work?” section for more details on our three-way one-shot challenge. To take on this challenge, we trained Mr. Character on the EMNIST data set that consist of only 47 different characters (build off of Latin alphabet and Arabic digits) having 2800 examples per character. That way we tested the generalization capabilities of our AI in the conditions that are much more human-like. No human needs be exposed to dozens of alphabets before being able to generalize to novel characters. Rather, a human is exposed to a limited set of symbols but a somewhat larger number of examples of those. This is how a useful AI should work. And this is exactly how our technology works.

Being able to learn to generalize from only 47 rather than from 1623 characters makes the generalization capabilities of Mr. Character 1623 / 47 = 34 times more effective than ‘competition’. When taking into account rotations of Omniglot character set (90°, 180°, 270°) that are often performed to even further expand the number of characters, this ratio becomes even bigger—Mr. Character becomes more than 100 times more effective.

Here is how MENTAL scores against other solutions to one-shot learning:

In the above graph, we compare MENTAL to other proposals to address one-shot learning: Hierarchical Bayesian (Lake et al. 2013), Siamese neural networks (Koch et al. 2015), Memory-Augmented Neural Network (Santoro et al. 2016), Matching Networks (Vinyals et al. 2016).

As you can see, MENTAL works with a lot less variety in the training data. Hence, we dare stating that, in that respect, Mr. Character is two orders of magnitude more capable learner to generalize than are some other approaches considered to be state-of-the-art in one-shot learning.

Note that there is nothing that specializes our technology for learning characters. The technology can be applied to any problem.

But we are not done explaining how capable MENTAL is in generalizing. MENTAL can generalize to one-shot performance capability from data sets that contain already quite a small number of examples. Below you can see the training performance of a one-shot generalization test with only seven example categories. An agent was trained to generalize from Arabic digits 0 to 6, to one-shot learning of digits 7, 8, and 9. It achieved an impressive performance already after a single training epoch on three example images. 1000 new images were then recognized with an accuracy of ~80% correct:

Our technology can generalize from scarce data sets. There is no a pre-requirement for a special data sets such as Omniglot in order to take advantage of MENTAL technology.

Also, MENTAL performs much better than transfer learning. The reason is that MENTAL imposes inductive biases, while transfer learning does not. It is the correct inductive biases that make it so powerful. MENTAL learns the inductive biases suitable for a given domain.

If this doesn’t impress you, check out the following. Once trained to learn fast, MENTAL agents expand the new knolwedge quite widely. The domain within which the new problems can stay is surprisingly general. For example, after having trained Mr. Character on Latin letters, he was not only able to learn other writing systems in a single shot; he was able to learn in one shot acronyms like these ones:

and simple object drawings like these ones:

Don’t believe it? Check out the demo and challange Mr. Character with your own drawings.

MENTAL is the ideal enabling technology for AutoML.


Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In ICML Deep Learning Workshop (Vol. 2).

Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science350(6266), 1332-1338.

Nikolić, D. (2015). Practopoiesis: Or how life fosters a mind. Journal of theoretical biology373, 40-61.

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). One-shot learning with memory-augmented neural networks. arXiv preprint arXiv:1605.06065.

Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. In Advances in Neural Information Processing Systems (pp. 3630-3638).

Is this MENTAL thing scalable?

Our technology is generally applicable, it is easy to scale, and is easy to implement.


Applicability: If you can train a deep neural network to perform a certain task, you can apply MENTAL technology to generalize from that task to related problems. And you can apply it to any problem for which deep neural networks have been shown to work well. You can even use the same network topology that is known to work well… just need to apply our “knowledge blanket” on top of it. The knowledge blanked does not even need to be applied to deep learning. It can be applied to any other machine learning tool. That’s how generally applicable it is.


Scalability: If you want to apply MENTAL technology to existing deep learning applications, you can expect linear increase in memory requirements and in training time. That is, it scales with O(N). And that linear increase in resource demands will be by a small factor: For example, to convert a classical DNN into a MENTAL neural network (MNN), you may only require something like 2 to 3 times more memory and also about 2 to 3 times longer training time.

Moreover, after the MNN has generalized, it will perform one-shot learning of new classes practically instantaneously—in only a small number of epochs.

Finally, the inference has the best scaling properties of all. It always scales by the factor of 1.0. That is, it will require exactly the same resources as the original DNN network—not a byte of additional memory and not a millisecond of additional computational time. This means that the technology can easily run on mobile devices: your laptop, your car, your phone, your smartwatch.


Ease of implementation: MENTAL does not require an obscure specialized framework for machine learning. It can simply be implemented for example in TensorFlow. In fact, once a MENTAL agent has been trained to generalize, a classical deep neural net already implemented in TensorFlow can be converted into a one-shot learner with only a few lines of code.

Is there a business model behind the platform MENTAL?

Yes. MENTAL is an ideal enabling technology for AutoML.

We plan to make the future of machine learning much easier for developers of various machine learning applications. Instead of having to train models from scratch or by using transfer learning, our technology makes it possible to deliver much more efficient learners for various common tasks in machine learning. We envisaged a line of future products such as Ms. Thoughtread, Ms. Klang, Mr. What’s that, Lady Futuretelling, and others.

As we offer a generally applicable intelligence solution, we expect that in the future, there will be much less need to engage an expansive team of machine learning experts to engineer specialized machinery for every new AI problem—as is the case today. Rather, our trained agents will do much of the intelligence work in the background requiring a simple download. This will make the solution much cheaper, and the development time much shorter.

Even creating new agents will not pose much demand on human resources. A direct proof is the fact that our own team at RobotsGoMental is very small and yet, we have achieved an insane generalization performance.

We plan to address the AI market by walking away from large expansive teams producing unique super-engineered highly-specialized solutions. Rather, we want to enable AI developers to take maximal advantage of general intelligence principles. This will enable them to spend more time creatively working on the human side of AI, much less time being spent on re-inventing machine learning solutions, and training them … and then achieving inferior performance anyway.

We can’t wait to create our first base of subscribers to our smart agents that will be even cleverer than Mr. Character is.