The world of AI is constantly evolving at a breakneck pace, with vendors and researchers racing to surpass each other with new technologies, capabilities and performance milestones. Google has just released a version Gemini 1.5 Pro to compete with GPT-4o.
Join the channel Telegram belong to AnonyViet 👉 Link 👈 |
Google is racing to keep up and possibly surpass OpenAI. In December 2023, Google announced the Gemini multimodal LLM family and has been continuously improving it since then. Model Gemini 1.5 Pro was first announced in preview in February 2024. The model was then publicly demonstrated and significantly expanded at the Google I/O conference in May 2024.
What is Gemini 1.5 Pro?
Following the successful launch of the Gemini 1.0 generation in December 2023 with Ultra, Pro and Nano versions, Google DeepMind introduced the Gemini 1.5 Pro upgrade in February 2024. Compared to the previous generation, Gemini 1.5 Pro delivers superior processing performance and longer context understanding. However, in the initial phase, this version is only limited to developers and large enterprises through Google AI Studio and Vertex AI.
Gemini 1.5 Pro is a multimodal artificial intelligence (AI) model developed by Google DeepMind that powers next-generation AI services on the Google platform and for third-party developers.
Main feature:
- Process text, images, audio and video.
- Multimodal reasoning for text generation, question answering, and content analysis.
Advantage:
- Large context processing capacity (up to 1 million tokens)
- Higher performance and lower cost than previous models (Gemini 1.0 Ultra)
Development history:
- December 2023: Launch of Gemini 1.0 with Ultra, Pro and Nano versions.
- February 2024: First preview of Gemini 1.5 Pro.
- April 2024: Gemini 1.5 Pro public preview.
- May 2024: Google announces improvements to Gemini 1.5 Pro.
Application:
- Translation
- Program
- Create multimodal content
- Data analysis
- And many other applications
Gemini 1.5 Pro is a powerful multimodal AI model with many potential applications in various fields. Thanks to its large context processing capabilities and high performance, Gemini 1.5 Pro promises to bring more innovative and effective AI solutions.
Compare Gemini versions
Feature | Gemini 1.0 Pro | Gemini 1.5 Pro | Gemini 1.0 Pro 001 (Tuning) |
Availability | API, Google AI Studio | API, Google AI Studio | API, Google AI Studio |
Model size | Big | Very big | Big |
Multimedia | Are not | Have | Are not |
Custom | Are not | Are not | Yes (Adjustable) |
Latency | Medium | High | Medium |
Expense | Short | High | Medium |
In case of used | Great for general tasks, simple chatbots | Suitable for complex tasks, multimedia processing, and content creation | Suitable for specialized tasks requiring high precision |
Note:
- This comparison table is based on current information and may change as Google updates Gemini versions.
- “Latency” refers to the model's response time.
- “Cost” refers to the cost of using the API.
Choose the appropriate Gemini version:
- Gemini 1.0 Pro: Suitable for simple chatbot applications that require low latency and low cost.
- Gemini 1.5 Pro: Suitable for applications that require multimedia processing, complex content creation, and accept high latency and high costs.
- Gemini 1.0 Pro 001 (Tuning): Suitable for specialized applications requiring high precision and customizable models.
How to use Gemini 1.5 Pro for free on Google AI Studio
Google AI Studio is an environment that allows programmers to write, run, and test command prompts using Google's Gemini model. Additionally, if you want to use the Gemini API, you can get your API Key from within Google AI Studio. To use Google AI Studio to activate Gemini 1.5 Pro for free, visit the page: https://aistudio.google.com/app/prompts/new_chat. If you visit for the first time, you will see a message to access Prompt or get API, if you just want to chat with Gemini 1.5 Pro then select New Prompt
Then log in with your Google account and select Model as Gemini 1.5 Pro.
You can then start giving commands to the AI to do what you want. How to use Google AI Studio you can see below:
Introducing Google AI Studio
If you're familiar with OpenAI's Playground, Google AI Studio is similar. Let's take a look at the basic user interface as shown below:
Regardless of which mode you choose, “Run Settings” will be the same.
- Model: Currently, Google offers three different models, Gemini 1.0 Pro, Gemini 1.5 Pro and Gemini 1.0 Pro 001 (Tuning). Each of these models has its own benefits. For example, Gemini 1.5 Pro allows users to insert images in addition to video, audio, and other files. You can learn more about Google's LLM in their documentation.
- Temperature: This variable controls the “creativity level” of the model. By increasing this value, the model will select tokens with lower statistical likelihood when generating responses. The best way to understand the impact of this variable is to experiment yourself and see how the output changes.
- Stop Sequence: This variable causes the model to stop generating tokens when a specific word/phrase is generated. For example, if my stop string is “world” and I ask the model to say “Hello world”, then the output generated will be “Hello”. This means that the stopping sequence will never be displayed/generated by the model.
Security settings
Due to the nature of Large Language Models (LLM), the response can sometimes be unpredictable. While steps have been taken to ensure the model generates appropriate responses, Google created this management tool to ensure programmers have greater control over the output.
- Top K: This variable determines whether the model will choose the most likely next tokens. The higher the Top K value, the easier the model is to predict.
- Top P: This setting is available in “Advanced settings” in the lower right corner. This variable affects the number of tokens the model considers when generating a response. The Top P value determines the randomness of the model output.
Different modes of Google AI Studio
Currently, Google AI Studio offers three distinct modes when creating with the Gemini API. These options can be selected by clicking “Create New” in the top left corner, as shown below.
Each of these modes is intended to address a specific use case.
- Chat Prompt: commonly used in chatbots such as ChatGPT and Google's Gemini chatbot. It is used to answer user questions in a conversational manner. This is where you can customize the chatbot to speak or act a certain way. Want a friendly customer service chatbot? A sarcastic chatbot talking to you? This is where you will tell the model to behave in a certain way.
- Freeform Prompt: This type of prompt is used for open-ended responses. Creative writing
- Structured Prompt: The unique feature of this prompt is that the user needs to provide examples (sample data) of the desired queries and responses.
Multimodal features in Google AI Studio
One of the most unique features of Google AI Studio is the ability to use different file types in the work environment. These include images, videos, audio, and files from Google Drive. This makes it easier for developers to test whether their ideas work and how to fix any errors. For example, if we use Chat Reminders from above and add a video that we want to summarize, the model will access the inserted video and perform the task. Let's look at an example.
In this case, we used a five-minute video introducing different dinosaur fossils. When required, the model can interact with the video and generate a summary of its content. Use cases of this type are not possible in other AI testing platforms (e.g. OpenAI).