Google I/O 2024: Top announcements

Google’s take on AI and its integration!

Google pulled out all the stops at its I/O 2024 event, unveiling a flurry of upgrades like the latest Gemini and Gemma models, new AI tricks for Android, and a range of new features for its various apps. Here are the top announcements made at the Google I/O event:

1) Gemini 1.5

Google introduced the new Gemini 1.5 Flash language model, with a sleeker build compared to the popular Gemini 1.5 Pro model, without compromising on speed or efficiency. Designed for quick response times, this AI dynamo is primed for high-frequency tasks, according to Google.

Equipped with multimodal reasoning capabilities, the Gemini 1.5 Flash has a context window of 1 million tokens on Google AI Studio and Vertex AI. Google has also hinted at a 2 million token context window for developers and Google Cloud customers, though access will initially be through a waitlist. Trained using a technique called “distillation,” Google claims that the Gemini 1.5 Flash excels in a variety of tasks including summarisation, chat applications, image and video captioning, as well as data extraction from lengthy documents and tables.

2) Gemma 2.0

The Gemma 2.0 marks the next evolutionary leap in AI, featuring enhanced performance and efficiency. This new model, powered by TPUs and GPUs, packs 27 billion parameters. Joining the Gemma lineup is Google's inaugural vision language model, PaLI-3, set to debut alongside Gemma 2.0 in June.

3) Ask Photos

Google is stepping up its game with a Gemini AI-powered upgrade to Google Photos that lets you summon a specific picture with a single prompt. Dubbed ‘Ask Photos,’ this new feature taps into Gemini's multimodal capabilities to grasp the context and subject of a picture and fetch the requested image.

For example, you can ask Gemini about a particular event, object, or person, and the AI assistant will scour your gallery to find the relevant images. Let's say you're thinking back on your daughter Lucia’s early swimming milestones. Now, you can ask Photos: “When did Lucia learn to swim?”

You can also make a more complex request like: “Show me how Lucia’s swimming has progressed.” Here, Gemini goes beyond a basic search, recognising various contexts — from pool laps to ocean snorkelling, even picking up on the text and dates on her swimming certificates. Photos then compiles it all into a neat summary, allowing you to relive those cherished memories. Ask Photos is set to roll out this summer, with more capabilities on the way.

4) AI in Android

Google’s bringing a host of AI capabilities to Android, courtesy of Gemini. One such feature is Circle to Search, which has been popular on Pixel and Samsung devices. Additionally, Google announced that Gemini Nano, its lightweight on-device large language model, will gain multimodal capabilities. This means it will be able to comprehend information from sights, sounds, and spoken language, not just text.

Google also announced a new scam detection feature for Android that listens to phone calls and identifies language patterns commonly used by scammers, like requesting money transfers. If suspicious activity is detected, the feature interrupts the call and prompts you to hang up. Google emphasises that this feature works locally on the device, ensuring privacy as phone calls are not sent to the cloud for analysis.

Additionally, Google is enhancing Gemini by allowing the AI assistant to overlay on top of different apps on the screen. This feature enables users to use Gemini for tasks while actively using other apps. For example, users can ask specific questions about a YouTube video or drag Gemini-generated images directly into Gmail.

5) AI search

Google is also enhancing its search experience with AI, thanks to a new specialised Gemini model. This upgrade introduces quick summaries for search topics through AI overviews, a feature previously in experimental stages under Google Search Generative Experience (SGE). Furthermore, Google is rolling out a revamped search results page organised by AI, categorising results under distinct AI-generated headlines. Initially launched for dining and recipe searches in the US, this feature will later expand to other categories such as movies, music, books, hotels, and shopping.

6) Project Astra

Project Astra is a new AI agent designed to rival OpenAI. In a video demonstration, Google showcased Astra's abilities to identify objects in a room, explain specific parts of code, determine its location by analysing the view from a window, locate the user's glasses, and even suggest creative names for a dog. The presentation also hinted at Project Astra's potential integration with smartphones or smart glasses, suggesting a significant Gemini-powered overhaul for Google Lens in the future.

7) Veo

Google took on OpenAI's Sora with the launch of its text-to-video generation model, Veo. Veo is designed to create high-quality, 1080p resolution videos in various cinematic and visual styles. What sets Veo apart is its ability to understand cinematic terms like "timelapse" or "aerial shots of landscape," giving you more control over the video creation process.