Nanochat Trains a GPT-2 Level Model using self-developing agents

The development of AI is accelerating rapidly. Advances in hardware, software improvements, and better data sets now allow training that once took weeks to be completed in hours. A recent update from AI researcher Andrej Karpathy clearly shows this change: the open source Nanochat project can now train a GPT-2 model in one environment with 8× NVIDIA H100 GPUs for about two hours, down from three last month.
Even more impressive, the agent AIs made 110 code changes in 12 hours, improving validation loss without slowing down training. In this article, we look at how automated AI systems can reshape the way AI research and model training is done.
What is Nanochat?
Andrej Karpathy developed Nanochat to provide a complete basic language model training system that serves as an end-to-end solution. This project aims to show how developers can build a complete ChatGPT-style application using a small and understandable codebase as their foundation. Nanochat offers two main advantages with its design because it eliminates the need for multiple complex dependencies while maintaining complete system visibility.
The framework covers the whole life of the training and use of the language model:
- Tokenizer training
- Pre-training of the base model
- Intermediate training with conversational datasets
- Supervised fine tuning
- Promoting reinforced learning
- Interface and dialog
With its total code length of 8000 lines, it all pipes into one of the most accessible LLM training programs available today.
How does the AutoResearch System work?
The AutoResearch framework establishes a research loop that allows AI agents to improve the codebase through their continuous process of testing and validation. The program acts as an automated research engineer that performs tests to study its performance.
The app works in the following steps:
- Implementation of the repository
The agent starts with an existing project repository (for example, Nanochat). The program creates a test environment that includes the complete codebase through the codebase cloning process.
- Branch Creation
The agent creates a new test branch that allows it to run tests on changes without risking any disruption to the main codebase.
- Proposal for Code Modification
The agent analyzes the cache and suggests possible improvements through its analysis function, which consists of four main components.
- Training loop configuration
- Development of data set preprocessing
- Hyperparameter tuning
- Tweaks for model properties
- Automated Experimentation
The system automates the generation of modified code to support model training and testing activities. It records metrics such as:
- Loss of validation
- Training speed
- Use of resources
- Performance Evaluation
The program makes direct comparisons between the current results and the established baseline performance of the model. The new version shows higher performance than its previous version, which qualifies as a system upgrade.
- Automatic Integration
The program performs an automatic merge of guaranteed improvements to the main code branch.
- Continuous research loop
The agent establishes a continuous research cycle that allows the development of an automated research system that improves itself through continuous operation.

The program can generate multiple code improvements ranging from a few tens to hundreds using its autonomous, human-free workflow.
Setup and installation
The framework can be set up to conduct independent research in the area.
- Extract the Repository
git clone
cd autoresearch- Setting the Environment
python -m venv venv
source venv/bin/activate- Enter dependencies
pip install -r requirements.txt - Configure API Keys
export OPENAI_API_KEY="your_api_key_here" - Start the Independent Agent
python main.py GPT-2 2 Hour Training Pass
The Nanochat project achieved its most significant recent success with its achievement of faster GPT-2 model training times. The following information shows the training time and hardware used to complete the task:
- Training period: ~ 3 hours
- Hardware: 8× NVIDIA H100 GPUs
Training time dropped to about two hours with the same hardware setup. The improvements seem small, but machine learning research benefits from faster training cycles because it allows researchers to complete experiments at higher speeds.
Researchers can test more ideas, write faster, and get improvements sooner. The following configurations served as key components that enabled this success:
1. Switching to the NVIDIA ClimbMix dataset
The most significant performance improvement resulted from changing the training dataset. Previous research has analyzed the following data sets:
Training tests showed a decrease in training when these datasets were used.
Nanochat got better results when it started using the NVIDIA ClimbMix dataset because it required less tuning work. The research shows an important lesson about the development of AI. Data quality can matter as much as architectural modeling.
The right dataset selection will lead to a significant improvement in both training efficiency and model testing results.
2. FP8 Precision Training
A second development breakthrough allowed for FP8 precision training within the system. FP8 (8-bit floating point) allows GPUs to perform calculations faster while maintaining sufficient accuracy for neural network training.
- The benefits of FP8 training bring the following benefits to users:
- The program performs tensor calculations at high speed.
- The program requires less memory bandwidth for its operation.
- The system gains better output performance from its graphics processing unit.
- The program provides educational institutions with affordable training costs.
The most effective way to improve performance on a wide range of AI workloads involves choosing precise levels that give good results.
3. Training Pipeline Development
The Nanochat training pipeline has received many enhancements beyond dataset preparation and FP8 optimization. The system has received many improvements including better data loading pipelines and improved training loops as well as improved GPU utilization and refined cluster planning.
The combination of small performance improvements from each improvement resulted in a noticeable decrease in training duration.
AI Agents Now Powering Nanochat
The Nanochat ecosystem has reached its most exciting point because AI agents are working to improve project development through automated project development. Karpathy created a testing system that allows AI agents to improve the codebase through automated testing instead of performing manual tests for development.
The app works in the following steps:
- Agent creates a new feature branch.
- The agent suggests changes and performance enhancements.
- The system performs the test automatically.
- The system includes updates when changes lead to better results.
The system generated output in 12 hours including:
- 110 transcoding
- The system has reduced authentication losses since 0.862415 to 0.858039
- The program has maintained the existing training schedule
The system establishes a continuous testing process that allows rapid implementation of test results leading to system improvements. The program operates as a research entity that works through its own development process.
The Future of Open-Source AI
Nanochat is also part of a broader movement toward open source AI infrastructure. Engineers from different countries create and develop AI systems through their collaborative efforts independent of large corporate laboratories. Open source LLM projects offer several advantages:
- Open AI development
- social interaction enables rapid innovation
- New researchers find it easy to enter this field
Future hardware improvements and development of training pipelines will enable small teams to match the capabilities of large AI laboratories.
The AI ecosystem will experience an explosion of creativity and experimentation as a result of this development.
The conclusion
The recent success of Nanochat proves that the development of AI has reached a rapid pace of development. The ability to train the GPT-2 energy model within two hours using current computer technology qualifies as an outstanding achievement.
The most important advances in technology stem from the development of AI agents capable of performing system improvements without human input. The independent research loopholes that exist now in their context will allow researchers to develop research programs that will produce greater results.
Frequently Asked Question
IA. Nanochat is an open source project by Andrej Karpathy that demonstrates a complete end-to-end pipeline for training and implementing a ChatGPT-style language model.
A. Nanochat can train a GPT-2 level model in about two hours using a single node with 8 NVIDIA H100 GPUs.
A. Autonomous AI agents change test code, run tests, and integrate improvements automatically, generating more than 100 configurations while minimizing validation losses.
Sign in to continue reading and enjoy content curated by experts.



