As 2023 draws to a close, it has undoubtedly been the year of generative AI. But there have been important developments beyond the ChatGPT hype. Here are some key papers you may have missed.
With the launch of ChatGPT in late 2022 and the subsequent release of GPT-4 in March 2023, the profound impact of the widespread availability of large language models in particular, and so-called foundation models in general, became clear. This was the year of generative AI for text, audio, and video in general.
In addition to the rapid spread of ChatGPT, there were significant developments such as the first competitive open-source language models and several new AI start-ups, including European companies such as Mistral, which released the best open-source language model currently available, Mixtral 8x7B, before the end of the year.
Generative agents and robot cats
With work such as Generative Agents from Stanford University and Google, researchers have also demonstrated applications beyond the usual text and coding tasks. The team created a Sims-inspired sandbox environment in which 25 AI agents, each given a brief description of their profession and personality, interacted autonomously. The AI agents exhibited believable individual and emergent social behavior, including planning and attending a Valentine’s Day party. The work showed how agents based on language models can interact with each other and produce interesting results. This idea has been picked up by other research and open-source projects over the year, such as Auto-GPT and BabyAGI, and has been greatly simplified by OpenAI with the Assistant API.
Foundation models such as GPT-4 have also been used in robotics, where some progress has been made. Examples include Google’s Robotic Transformer 2 (RT-2) and RoboCat. RT-2 is an AI model for robot control that learns from both robot and web data. The model can process text and image input and use its extensive knowledge of the web to perform tasks for which it has not been explicitly trained. In over 6000 robot tests, RT-2 has shown almost twice the success rate of its predecessor in untrained tasks. RoboCat, on the other hand, is an AI agent that generates its training data to improve robot control. Other groups and companies, such as Nvidia with its multimodal VIMA model, have also used Foundation models in robotics.
DreamerV3 and FunSearch
There have also been important results in the field of reinforcement learning. One example is DreamerV3, which can be applied to very different problems without any adaptation. The team showed how DreamerV3 learns to mine diamonds in Minecraft without human models. Earlier this year, Deepmind also showed AdA, short for Adaptive Agent, an example of what Deepmind calls Foundation Reinforcement Learning Model. AdA followed the classic recipe of foundation models and was trained on tasks with huge amounts of data – in this case in a simple simulation. AdA was significant because it showed that scaling in reinforcement learning can lead to models that perform better on other tasks.
My highlight: the continued adoption of deep learning and other methods in various scientific fields. Deepmind has developed AlphaTensor, a new algorithm for fast matrix multiplication. The company also gave an insight into the latest version of the AlphaFold protein structure prediction system, showing that the new version overcomes many of the weaknesses of the previous version and opens up new possibilities for computational structure prediction. In addition, Google Deepmind demonstrated FunSearch, the first use of a code-generating language model combined with an evolutionary search algorithm to find a previously unknown solution to a mathematical problem.
OthelloGPT, Q-Star, and AI Act
2023 was also the year of AI regulation and global warnings of existential risks, which may also have helped to stimulate research into better understanding the inner workings of large language models. There were interesting papers such as OthelloGPT, the much-criticized Microsoft paper on the “sparks” of AGI in GPT-4, and a Google paper on the phenomenon of grokking. The field of prompt engineering also provides insights into language models. François Chollet’s interpretation of prompt engineering as the search for the right vector program and Promptbreeder, which shows that prompting is likely to become more automated in the future, is worth mentioning.
The year ended with a rumour about Q-Star, in which existing fears, the AGI hype, and Sam Altman’s short-term expulsion combined to create a veritable rabbit hole.
In 2024, there is likely to be less speculation and more negotiation: Ongoing court cases around fair use for AI training – such as the recent New York Times lawsuit – will show how society deals with the technology. This debate will also take place in the EU, where key players agreed on the EU AI Act before the end of the year. The details will be decided over the next year and will have a major impact on the AI market in Europe.