Google’s multimodal AI model Gemini will compete with OpenAI’s GPT-4 starting this fall and will also be available to AI app developers.
This is reported by The Information, citing an anonymous person involved in the development of Gemini.
Gemini is “a group of large AI models,” the source said, suggesting that, similar to OpenAI, Google could use GPT-4’s approach to model architecture consisting of multiple AI expert models with specific capabilities. It could also mean that Google wants to make Gemini available in different sizes, which is likely to be cost-efficient.
Gemini reportedly can generate images as well as text. Since Gemini has also been trained on YouTube video transcripts, it could also be able to generate simple videos, similar to RunwayML Gen-2 or Pika Labs. Gemini is also said to have significantly improved coding capabilities.
Google plans to gradually integrate Gemini into its products, such as the Bard chatbot and Google Docs or Slides. Later this year, Gemini will also be available to external developers in the Google Cloud.
Huge launch requires huge staffing
According to The Information, at least two dozen executives are involved in developing the model. The Gemini team, which consists of Google Brain and Deepmind, is said to include several hundred employees.
Google Deepmind was recently merged and is still finding the right balance, such as remote work policies and the technology used to train the models, according to The Information. Deepmind reportedly abandoned its ChatGPT competitor, codenamed “Goodall” and based on an unannounced model called “Chipmunk,” in favor of Gemini.
The Gemini team is led by Deepmind founder Demis Hassabis, with support from two Deepmind executives, Oriol Vinyals and Koray Kavukcuoglu, and former Google Brain chief Jeff Dean. Even Google founder Sergey Brin is involved in the development of Gemini, reportedly helping to train and evaluate the model.
The training materials for Gemini are closely monitored by Google’s legal department. For example, the development team has had to remove training data from copyrighted books. According to The Information’s source, Gemini was also inadvertently trained on “offensive” content, which likely led to a (partial) re-training of the model.
Gemini was officially unveiled in May. Earlier rumors said the model would have at least a trillion parameters. Training is said to use tens of thousands of Google’s TPU AI chips.
Gemini CEO Demis Hassabis said in late June that Gemini will be “combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models. We also have some new innovations that are going to be pretty interesting.”