Google has officially announced the launch of the large-scale Gemini project in the field of generative artificial intelligence. According to the tech giant, it is the largest, most innovative and most powerful proprietary AI model ever developed and has been built entirely from the ground up to be multimodal. Gemini comes to market as a full-fledged competitor to GPT-4, outperforming OpenAI’s leading AI model in 30 of 32 common types of testing, and supports the widest possible range of interactions with different types of information – the technology is trained to summarize and seamlessly understand, process and combine text content, images , audio, video and code, and even simultaneously.
Gemini is optimized to run on any device category, from multi-server data center environments to local operations on smartphones. With a score of 90%, Gemini Ultra is the world’s first AI language model to outperform human experts in the Massive Multitask Language Understanding (MMLU) benchmark: one of the most popular AI benchmarking methods for identifying problem-solving abilities. It is based on a combination of world knowledge from 57 subjects and includes mathematics, physics, history, medicine, law and humanities. For comparison, for GPT-4 this figure is 86.4%. Additionally, Gemini Ultra achieved a top score of 59.4% in the MMMU (Massive Multi-discipline Multimodal Understanding) benchmark, which covers multidisciplinary tasks that require complex deliberate reasoning. Here GPT-4 is now in second place with 56.8%.
The debut version of Gemini 1.0 is immediately scalable and can be adapted to be used in three scenarios: Gemini Ultra is a flagship model for extreme complex tasks in data centers and corporate services, Gemini Pro is a universal model for most standard tasks, Gemini Nano is a basic effective model for implementation in gadgets . One of the main features of the model is the ability to intelligently and easily extract key information from hundreds of thousands of documents through high-speed reading and filtering, which will help achieve significant breakthroughs in global research.
Other Gemini features include:
- best natural image analysis (82.3%) without the help of optical character recognition (OCR) systems;
- advanced generation and explanation of program code in the popular languages Python, Java, C++ and Go;
- reduction of delay by 40% in search;
- unified multimodality – processing multiple input data from scratch within one context window, without requiring access to separate special models;
- reliable and accelerated deployment on the supercomputing AI infrastructure of Google Cloud TPU v4, v5e and the new generation v5p.
Google said it has implemented a “finely tuned” Gemini Pro model for Bard – it is already available to users of the voice assistant in English in 170 countries and this is the most significant update since its release. The Pixel 8 Pro smartphone is the first in the line to support Gemini Nano (you need to install the December Pixel Feature Drop). In early 2024, the latest model will be added to the rest of the ecosystem, including Google’s search engine, Chrome browser, Duet AI office assistant and advertising platform, in addition to the announcement of the advanced Bard Advanced based on Gemini Ultra. Starting December 13, developers and enterprise customers will be able to get Gemini Pro through the Gemini API in Google AI Studio and Cloud Vertex AI, and Android developers will be able to preview Gemini Nano through AI Core for native apps.
Google CEO Sundar Pichai called Gemini “a huge leap forward, a major milestone in the development of AI and the beginning of a new era at Google.”