Hey builders! Stop typing, and start interacting! We are moving beyond the text box. The future of AI is all about immersive, real-time experiences. To celebrate multimodal AI, we’re challenging you to build the next generation of agents that can help you see, hear, speak, and create in the Gemini Live Agent Challenge.
Build Multimodal AI agents in the Gemini Live Agent Challenge
Why join?
-
Hands-on learning with Gemini Live API: This is your shot to build the future of immersive AI agents on Google Cloud. We have everything you need to get started: Quickstarts, tutorials, access to the Agent Development Kit (ADK), and webinars hosted by our experts.
-
Showcase your skills: You’ll have the opportunity to break out of the traditional “text box” paradigm. Choose from three exciting categories—The Live Agent, The Creative Storyteller, or The UI Navigator—to demonstrate the power of your solution .
Think you have what it takes to win?
Build a solution to showcase your multimodal agent and you could potentially win a share of $80,000 in prizes:
-
Overall grand prize: A trip to Google Cloud Next ’26 in Las Vegas (includes tickets, a travel stipend, and a chance to present on stage), $25,000 in USD, $3,000 in Google Cloud Credits for use with a Cloud Billing Account, virtual coffee with a Google Cloud team member, and the potential opportunity to be featured on our social channels.
-
Category winners: A trip to Google Cloud Next ’26 in Las Vegas (includes tickets), $10,000 in USD, $1,000 in Google Cloud Credits for use with a Cloud Billing Account, virtual coffee with a Google Cloud team member, and the potential opportunity to be featured on our social channels.
-
Subcategory winners: $5,000 in USD and $500 in Google Cloud Credits for use with a Cloud Billing Account
-
Honorable mentions: $2,000 in USD and $500 in Google Cloud Credits for use with a Cloud Billing Account
Dig into Multimodal AI
Your mission is to build and deploy an AI agent on Google Cloud that utilizes multimodal inputs and outputs. We want you to go beyond the traditional text-in/text-out approach.
Whether you are building a real-time translator or a visual web navigator, your agent should interpret the world around it. Here is some inspiration:
-
The live agent: Build an agent we can talk to naturally that handles interruptions gracefully. Think real-time translators, vision-enabled tutors, and more.
-
The creative storyteller: Blend text, images, audio, and video into one seamless experience. Imagine building an interactive storybook or a full marketing asset generator in a single workflow.
-
The UI navigator: Create a helping hand that interprets visual screens. Maybe you want to create a universal web navigator or a visual QA tester that performs actions based on user intent.
Crucial note: Your project must use a Gemini model (like Gemini 3 or Nano Banana) and the Gen AI SDK or Agent Development Kit (ADK). Lastly, you must use at least one Google Cloud service, such as Firestore, CloudSQL, Cloud Run, or Vertex AI.
Ready to start building?
Head over to our hackathon website to register, watch the kickoff video, and review the official rules. Submissions are open until March 16, 2026.



