What is Google Gemini? Demystifying Google's Smartest AI Yet

Robert.Stark · January 2024

Demystifying Google's Smartest AI Yet.jpg

Say hello to Gemini, Google's bold multimodal AI. Gemini is all about seeing, hearing, and understanding the world as we do. Using a revolutionary multimodal approach, it can integrate data from text, code, images, audio, and even video to achieve next-level comprehension.

Early demos showcase Gemini's versatility in answering questions by analyzing audio and generating novel ideas from images. It's the beginning of AI that doesn't just compute but connects and communicates.

Strap in – Gemini plunges us into the next frontier of fluid human-AI communication. We'll unpack what Gemini is and how it ticks and share resources to get you started.

What is Google Gemini?

https://www.youtube.com/watch?v=_TVnM9dmUSk

Gemini is Google's powerful, new AI model that can understand text, images, and audio. As a Large Multimodal Model (LMM), Gemini can complete complex tasks in computer code, math, and physics.

Think of it as the Swiss army knife of the digital world. Multimodal means Gemini isn't limited to only text input. It works in multiple modalities, so it can actually understand and respond to audio and video questions, too. Imagine asking a question by showing a video, and Gemini gets it.

How can I access Google Gemini?

Google's recent release of Gemini, a trio of powerful AI language models, ignites excitement and curiosity. Here are four ways to tap into Google's Gemini:

Google Bard: Your AI Chatbot: While not the full version, Gemini Pro powers Google Bard. It answers your questions, crafts stories, and even creates haikus.
Google Pixel 8: AI at Your Fingertips: Own a Pixel 8? Gemini answers questions, composes emails, and even helps you capture the perfect Instagram caption.
Google AI Studio: The Tinkerer's Playground: This user-friendly platform unlocks Gemini's potential in your hands. Experiment with prompts, train it on specific data sets and tailor it to your preferences. Adjust the response "temperature" and safety settings to elicit additional creative responses.
Vertex AI Studio: For developers and corporations, Vertex AI Studio unlocks the ultimate AI playground. Craft custom models, analyze vast data sets, and push the boundaries of what's possible all on Google's cloud.
Duet AI: Duet AI is an AI-powered assistant who can help with writing, creating images and analyzing spreadsheets. In early 2024 will roll out Gemini for Google Workspace in early 2024.

What are Google Gemini Nano, Pro, and Ultra?

Gemini isn't a monolith; it's optimized for three different usage scenarios or sizes:

1. Gemini Nano - compact AI for mobile use

Gemini Nano is designed explicitly for smartphones like the Pixel 8. Ideal for everyday tasks, Gemini Nano enables efficient AI processing directly on your phone and works offline.

It's ideal for mobile apps and can offer smart suggestions in messaging apps or summarizing articles. There are two versions of Gemini Nano to balance performance and mobile efficiency:

Gemini Nano-1 (1.8 billion parameters): This smaller version is great for everyday smartphone use, balancing AI smartness with device efficiency.
Gemini Nano-2 (3.25 billion parameters): A more advanced option offering enhanced capabilities for complex tasks on mobile devices.

More parameters allow handling higher-level reasoning - Gemini scales from day-to-day assistance to advanced mobile AI.

2. Gemini Pro - high-performance AI

Gemini Pro, running in Google's data centers, is designed for high-capacity tasks. It's the force behind Google Bard and handles complex queries with deep understanding and fast response times. Gemini Pro Vision also accepts images and videos as input and generates text as output across 38 languages.

According to Google, Gemini Pro outperforms OpenAI GPT-3.5 in six core benchmarks and is more effective at:

Brainstorming
Writing
Summarizing
Coding

Google hasn't officially revealed the exact parameter count of Gemini Pro, but it's likely in the same ballpark as GPT-3.5 (175B parameters).

AI development platforms:

You can customize Gemini Pro to your own needs in AI in 2 ways:

Google AI Studio - easy: Google AI Studio is a free, web-based tool for quick Gemini development. It offers up to 60 requests per minute, perfect for developing and testing AI prompts. It provides templates to integrate seamlessly across different development environments. Google maintains user privacy by de-identifying your data and may review your interactions to enhance product quality.
Vertex AI - advanced, managed: Vertex AI on Google Cloud steps in when projects demand more sophistication and personalization. You can tune it with your company's data to train custom AI models. Vertex AI supports building advanced search and conversational agents in a user-friendly setup, ensuring your data and IP remain secure.

3. Gemini Ultra - Google's AI leap into the future

Gemini Ultra represents the leading edge of Google's AI capabilities - its most advanced and largest model. But it's not yet available for general use.

Google claims it outshines even GPT-4 in most academic benchmarks. Specifically, it shines in MMLU (massive multitask language understanding) tests, scoring an impressive 90.0%. It's a fancy way of saying it's good at everything from math to law to ethics. According to Google, it boasts human-surpassing intelligence across diverse domains.

But Gemini Ultra remains cloaked in secrecy, undergoing fine-tuning and safety checks before venturing into the public sphere. Google envisions its integration into a next-generation Bard Ultra, potentially arriving in early 2024.

It's promising, but until it's out for real-world use, it's like a mystery box of AI potential. We haven't seen Gemini Ultra in action yet.

How much information can Google Gemini handle?

All Gemini models can process and remember up to 32,768 tokens at once. Think of it like this: a token is usually a word so that these models can handle a sequence of information as long as 130 pages in a single task. It allows them to understand and respond to long, detailed queries effectively.

In contrast, Open AI's standard GPT-4 model offers 8,000 tokens. But GPT-4 Turbo has 128,000 tokens - 300 pages of text in a single prompt.

What's the difference between ChatGPT and Google Gemini?

ChatGPT and Google Gemini both use generative AI but approach tasks differently.

ChatGPT - text-centric with extensions

ChatGPT, particularly its latest version powered by GPT-4, primarily deals with text. While it can handle audio inputs and outputs, it does so through separate models like Whisper for speech-to-text and another for text-to-speech conversions. Similarly, ChatGPT creates text prompts for image generation, which Dall-E 2, a different model, processes into visuals. Essentially, ChatGPT's core is all about text.

Google Gemini - natively multimodal

In contrast, Gemini is a 'natively multimodal' model. It's built from the ground up to process various data types directly – text, audio, images, and video. It doesn't rely on separate models for different data types. Gemini's approach represents a significant shift to integrate real-world sensory information more intuitively.

The data difference

GPT-4 excels in text, learning from around 500 billion words. Gemini's multimodal nature allows it to tap into a vast new pool of training data from images, audio, and videos. It could mark a pivotal step in AI development, leading to more organic and natural ways to interact with AI.

Which is more current: ChatGPT or Gemini?

In the fast-moving world of AI, how current the information an AI model uses can make a huge difference. Let's compare how ChatGPT and Gemini AI stack up in terms of staying updated.

ChatGPT: fixed window

ChatGPT's training is like a snapshot of the internet up to a certain point. For the GPT-3.5 model, this 'snapshot' was in September 2021. But, OpenAI has been giving its models periodic updates, with GPT-3.5 later getting info up to January 2022 and the newer GPT-4 Turbo extending to April 2023.

For paying customers, ChatGPT plugins allow you to search the internet with Bing, pulling in current information, including recently deceased celebrities.

Gemini AI: frequent updates

Google's Gemini AI takes a different approach, constantly evolving its database with regular updates and a vast collection of texts and codes. There isn't an explicit cut-off date, but it can't pull in the latest news or trends in real-time.

Google AI Studio and Vertex don't have internet access, so they can't get the latest news from the web.

This difference in data freshness and internet integration shapes how each AI interacts with the world, making each uniquely suited for different types of tasks and queries.

Was Google's Gemini video faked?

https://www.youtube.com/watch?v=UIZAiXYceBI

Google's recent Gemini demo video seems like alchemy as it understands hand signals, follows magic tricks, and sorts pictures of planets. The problem is that the video isn't real.

Gemini can't process and respond to video data in real-time. It's not like it can chat back. You can view the carefully tuned text prompts with still images that show the awkwardness of working with Gemini.

Google's been playing catch-up in generative AI since early this year, trying to catch up with OpenAI's ChatGPT. But the video is more movie magic than a reflection of Gemini's capabilities.

The dawn of intuitive AI

As AI rapidly advances, we're moving beyond text-based chat to models that echo our real-world experiences. ChatGPT and Gemini provide a glimpse into more intuitive machine intelligence on the horizon - one that truly comprehends the world as we do.

Gemini's natively multimodal approach aims to unlock new frontiers of understanding, powering more natural interaction. Imagine a digital assistant who doesn't just handle your dinner plans but also shares a moment, watching the sunset with you.

The age of reasoning machines draws nigh. So buckle up - Gemini's just the start of a wild ride ahead! Here's hoping we don't crash.