How Artificial Intelligence Develops Intelligence
Artificial intelligence doesn't just collect data; it learns step by step, mirroring human methods—from reading and repetition to independent problem-solving.
AI models such as ChatGPT, Gemini, Claude, DeepSeek, and Copilot have quickly evolved beyond simple tools for chatting, retrieving information, and editing text. Increasingly, they serve as powerful assistants for complex tasks, including advanced mathematical analysis, large-scale data processing, and programming. Their rapid development raises an important question: how is it possible for AI to learn and advance so quickly?
How Do Humans Learn?
To understand how AI learns, it is useful first to examine how humans acquire and deepen their knowledge. When approaching a new subject, we typically begin by studying literature and other reliable sources of knowledge, which help us grasp key concepts and build a fundamental understanding of the field. The quality and accessibility of these resources significantly influence the speed and effectiveness of our learning, as well-structured material is easier to integrate into a broader mental framework. The goal of this initial phase is to establish a strong theoretical foundation that allows for further development and successful application of knowledge in practice.
Once we have a solid grasp of the basics, we move on to solving specific problems that systematically guide us through problem-solving processes. In this phase, we follow established methods provided by teachers or textbooks, enabling us to develop analytical skills and learn the correct approaches to tackling challenges. Additionally, as we encounter various tasks, we adapt our strategies, enhancing our ability to think critically and creatively. This transition from theory to practice is crucial, as it ensures that knowledge is not merely abstract but becomes a dynamic tool for solving real-world problems in diverse situations.
The third stage of learning involves independent problem-solving, where we are given only the final result without a step-by-step solution. This requires us to develop appropriate strategies and find optimal approaches beyond merely repeating previously learned methods. Some problems may not be solvable using conventional approaches, necessitating the creation of innovative and more effective solutions. In such cases, creative thinking, adaptability, and the ability to experiment with different solutions are essential. Additionally, by verifying the correctness of our methods against known outcomes, we refine our problem-solving approaches, gaining deeper insights and developing adaptable strategies for tackling even more complex challenges.
Three Stages of AI Learning
These three fundamental and interconnected learning stages, essential in human learning, also form the basis for training neural networks. Just as humans gradually build knowledge through theoretical understanding, practical application, and independent problem-solving, neural networks follow a similar process.
1. Foundational Learning (Pretraining)
In the first stage of AI learning, large language models analyze vast amounts of carefully curated and structured data, including scientific papers, literary works, journalistic articles, online discussions, and forum posts. This phase, known as foundational learning (pretraining), functions like extensive "independent reading": the model learns to recognize and understand patterns and relationships within data, much like a student who first thoroughly studies theory from textbooks.
This learning process often relies on self-supervised learning approaches, where the model gradually develops the ability to predict the next word or, more precisely, the next "token" in a text. During this phase, the model constructs a rich internal representation of linguistic structures, concepts, and styles, allowing it to generate coherent, grammatically correct, and stylistically appropriate text. Based on an initial sentence, it can "predict" the most likely continuation of the text by drawing on contextual patterns observed in vast datasets.
However, at this stage, foundational AI models primarily function as statistical predictors of language patterns rather than as reliable sources of factual information. Their responses are often well-formed stylistically but not necessarily accurate or meaningful. To enhance their precision and relevance, further training is required.
2. Task-Specific Learning (Fine-Tuning)
Although models learn language comprehension and general pattern recognition in the first phase, theoretical knowledge alone is often insufficient for solving specific tasks. After foundational training, neural networks refine their broad theoretical knowledge by learning how to apply it practically in conversations and problem-solving. This phase, known as fine-tuning, involves training on large sets of practical examples where questions or prompts are paired with verified correct answers.
Similar to learning mathematics, where we first master theory and then practice with concrete problems, neural networks in this second phase gain experience by studying previously solved problems and answered questions. This enables them to learn the correct approach to solving tasks and formulating responses.
Fine-tuning requires significantly less time than foundational learning since its primary focus is on optimizing responses for specific applications. The result is AI models that can effectively communicate and answer questions. However, their responses may still be unreliable when dealing with topics not well covered in their training data.
3. Independent Problem-Solving (Reinforcement Learning)
In the third learning phase, neural networks actively experiment with different approaches to problem-solving and continuously optimize their solutions. This method allows them not only to reinforce existing patterns but also to develop entirely new strategies that may surpass human intuition and approaches. The role of feedback (e.g., correct solution of tasks or "rewards" and "punishments") is crucial in this process, as it serves as a measure of success for the model and helps it identify the most effective tactics. This method, known as reinforcement learning, enables AI to refine its decision-making based on iterative feedback.
A key advantage of this approach is the ability to adapt to unforeseen situations and challenges not present in the training data. Instead of relying solely on a static set of correct answers, the model autonomously explores and tests new possibilities. This creates a dynamic learning process where the model adjusts its hypotheses and selects optimal strategies based on its environment and received feedback.
Through independent problem-solving, neural networks move beyond mere replication of learned patterns and develop highly flexible, creative, and contextually relevant problem-solving methods. This advanced learning stage is essential for tackling complex challenges across various domains. By combining exploratory learning with existing knowledge, neural networks not only refine known solutions but also discover new ones. Thus, they can develop entirely original problem-solving approaches that differ from the methods they were trained on.
AlphaGo and Move 37
The first major breakthrough of this kind of "innovative" learning occurred in 2016, during the legendary Go tournament, when the artificial intelligence AlphaGo from the company DeepMind competed against the top Korean player Lee Sedol. In the second game, AlphaGo executed the now-famous move 37, which astonished experts and Go enthusiasts worldwide. The move was so unconventional and outside established human strategies that it was initially dismissed as a mistake. However, it later proved to be a brilliant play, placing Lee Sedol in a difficult position from which he could not recover. Visibly shaken, he admitted to underestimating AI's ability to surpass human strategies.
AlphaGo did not derive move 37 from analysing past games but developed it through extensive self-play, where it experimented with different strategies and identified the most successful ones based on feedback. By playing millions of games against itself, the model refined entirely new tactics unknown to human players. Move 37 marked a turning point in AI development, demonstrating that systematic independent learning could enable AI to surpass human capabilities.
Just as AlphaGo once proved that a neural network could master a strategic game beyond human expertise, today, large language models use the same methods to improve their abilities in solving complex problems. The third phase of AI learning, involving independent problem-solving through reinforcement learning, has been particularly refined and optimized by researchers in China during the development of the DeepSeek-R1 model.