Building a conversational agent in BigQuery using the Conversational Analytics API

Bringing data into BigQuery centralizes your information, but the real challenge is making that data accessible. Often, technical barriers separate the people with questions — from execs to analysts — from the answers they need.

With the Conversational Analytics API, powered by Gemini, you no longer need intricate systems to get insights. The API is engineered to help you build context-aware agents that can understand natural language, query your BigQuery data, and deliver answers in text, tables, and visual charts.

Now, you can build any solution that can interface with the API. For example, you can integrate it with the Agent Development Kit (ADK) to build a multi-agent systems, or to implement these data strategies:

Self-service triage for operations: Give teams like Support and Sales an agent that answers data questions instantly. Instead of filing a ticket to ask, “Why did signups drop last week?”, they get the answer immediately.
Differentiate your SaaS product: Differentiate your platform by embedding a powerful chat interface directly into your platform. Let your customers query and visualize their own usage data using plain English.
Dynamic reporting: Move beyond static PDFs. Automate the core reporting function and enable stakeholders to ask nuanced, follow-up questions for deeper investigation, effectively replacing report versions with real-time conversation.

In this post, we’ll share ways to build a conversational agent in BigQuery using the Conversational Analytics API.

Step One: Configure and create the agent

The deployment of a Data Analytics Agent involves configuring its access, context, and environment before making the final creation call.

In our included example, the Python SDK is used, but the Conversational Analytics API supports many other languages, depending on your preference and environment.

Initialize the client and define BigQuery sources

Begin by instantiating the necessary client (DataAgentServiceClient) to interact with the API. This client is used in conjunction with explicit BigQueryTableReference objects, which authorize the agent’s access to specific tables (defined by project_id, dataset_id, and table_id). These individual references are then aggregated into a DatasourceReferences object under the bq field.

code_block: <ListValue: [StructValue([(‘code’, ‘from google.cloud import geminidataanalyticsrnrn# Set project-specific variables (client, location, project IDs)rndata_agent_client = geminidataanalytics.DataAgentServiceClient()rnlocation = “global”rnbilling_project = “your-gcp-project-id”rndata_agent_id = “google_trends_analytics_agent”rnrn# Define the BigQuery table sourcesrnbq_top = geminidataanalytics.BigQueryTableReference(rn project_id=”bigquery-public-data”, dataset_id=”google_trends”, table_id=”top_terms”rn)rnbq_rising = geminidataanalytics.BigQueryTableReference(rn project_id=”bigquery-public-data”, dataset_id=”google_trends”, table_id=”top_rising_terms”rn)rndatasource_references = geminidataanalytics.DatasourceReferences(rn bq=geminidataanalytics.BigQueryTableReferences(table_references=[bq_top, bq_rising]))’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d896d60>)])]>

Set the agent context

Construct the context object by bundling the system_instruction (defining the agent’s behavior/role) and the datasource_references (defining its permitted data access). This complete Context is then nested within the DataAnalyticsAgent structure of the final DataAgent object.

While you can provide a string based system instruction, we recommend that you use the more robust context object to provide instruction to the agent. The object can still be provided with additional system instructions to help provide supplemental guidance.

code_block: <ListValue: [StructValue([(‘code’, ‘# Set the context using our system_instruction stringrnpublished_context = geminidataanalytics.Context(rn system_instruction=system_instruction,rn datasource_references=datasource_referencesrn example_queries=example_queriesrn)rnrndata_agent = geminidataanalytics.DataAgent(rn data_analytics_agent=geminidataanalytics.DataAnalyticsAgent(rn published_context=published_contextrn ),rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d8b0880>)])]>

Create the agent

Call data_agent_client.create_data_agent. This request includes the parent resource path (projects/{billing_project}/locations/{location}), the unique data_agent_id, and the fully configured data_agent object to complete the deployment.

code_block: <ListValue: [StructValue([(‘code’, ‘# Create the agentrndata_agent_client.create_data_agent(request=geminidataanalytics.CreateDataAgentRequest(rn parent=f”projects/{billing_project}/locations/{location}”,rn data_agent_id=data_agent_id,rn data_agent=data_agent,rn))’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d8b0910>)])]>

Your agent now exists and is defined by that published_context.

Step two: Creating a conversation (stateful vs. stateless)

The Conversational Analytics API can handle conversations in two ways:

Stateless: You send a question and the agent’s context. You must manage the conversation history in your own application and send it with every new request.
Stateful: You create a “conversation” on the server. The API manages the history for you. This is what allows users to ask follow-up questions.

We’ll configure a stateful conversation. We create a conversation object associated with our new agent.

code_block: <ListValue: [StructValue([(‘code’, ‘def setup_conversation(conversation_id: str):rn data_chat_client = geminidataanalytics.DataChatServiceClient()rn conversation = geminidataanalytics.Conversation(rn agents=[data_chat_client.data_agent_path(rn billing_project, location, data_agent_id)],rn )rn request = geminidataanalytics.CreateConversationRequest(rn parent=f”projects/{billing_project}/locations/{location}”,rn conversation_id=conversation_id,rn conversation=conversation,rn )rn try:rn # Check if it already existsrn data_chat_client.get_conversation(name=data_chat_client.conversation_path(rn billing_project, location, conversation_id))rn except Exception:rn response = data_chat_client.create_conversation(request=request)rn print(“Conversation created successfully.”)rnrnconversation_id = “my_first_conversation”rnsetup_conversation(conversation_id=conversation_id)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d8b09d0>)])]>

Step three: Create a streaming chat loop

To allow for interactive analysis, we implement a function, stream_chat_response, to manage the conversation flow. The Data Analytics Agent API is designed to return a response as a stream, which is crucial for delivering updates on the agent’s progress in real-time.

A typical response stream can include distinct components, such as:

Schema: Confirmation of table resolution.
Data (query): The generated SQL query (excellent for debugging and transparency).
Data (result): The resulting data structure (e.g., a Pandas-like DataFrame).
Chart: A Vega-Lite JSON specification for data visualization.
Text: The final, synthesized natural language summary.

Defining the function

The function is defined to accept the user’s question. Inside, we initialize the DataChatServiceClient and define a simple flag (chart_generated_flag) to track if a chart needs to be rendered after the stream completes. The user’s question is wrapped in a Message object, which is required for the API request.

code_block: <ListValue: [StructValue([(‘code’, “def stream_chat_response(question: str):rn data_chat_client = geminidataanalytics.DataChatServiceClient()rn chart_generated_flag = [False] # Flag to help with visualizationrn rn # Format the user’s question into an API-ready Message objectrn messages = [rn geminidataanalytics.Message(rn user_message=geminidataanalytics.UserMessage(text=question)rn )rn ]”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d465d00>)])]>

Processing the stream

The ConversationReference is essential as it ties the current request to the stateful conversation and links it back to the specific data_agent we created earlier. Once the request object is fully assembled with the parent path, messages, and reference, we call data_chat_client.chat.

We then iterate over the returned stream. A utility function, show_message, is used here to parse and appropriately format the different response types (Text, Chart, Data) for the user. Finally, if the chart_generated_flag was set during the stream, a post-processing utility (preview_in_browser) handles the rendering of the visualization.

code_block: <ListValue: [StructValue([(‘code’, ‘# Reference the stateful conversation and the created Data Agentrn conversation_reference = geminidataanalytics.ConversationReference(rn conversation=data_chat_client.conversation_path(rn billing_project, location, conversation_idrn ),rn data_agent_context=geminidataanalytics.DataAgentContext(rn data_agent=data_chat_client.data_agent_path(rn billing_project, location, data_agent_idrn ),rn ),rn )rn rn # Prepare the chat requestrn request = geminidataanalytics.ChatRequest(rn parent=f”projects/{billing_project}/locations/{location}”,rn messages=messages,rn conversation_reference=conversation_reference,rn )rn rn # Process the streaming responsern stream = data_chat_client.chat(request=request)rn for response in stream:rn # ‘show_message’ is a utility function that formatsrn # and prints the different response types (text, data, chart)rn show_message(response, chart_generated_flag)rnrn # If a chart was generated, ‘preview_in_browser’rn # is a utility to save and serve it as HTMLrn if chart_generated_flag[0]:rn preview_in_browser()’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d465160>)])]>

Step four: Talk to the agent

Asking questions

Now for the payoff. We can use our stream_chat_response function to have a conversation.

Checking the context

Let’s start by seeing if the agent understands its own context.

Python

code_block: <ListValue: [StructValue([(‘code’, ‘question = “Hey what data do you have access to?”rnstream_chat_response(question=question)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d465190>)])]>

The agent will respond with a summary of the top_terms and top_rising_terms tables, using the descriptions we provided in the system_instruction.

Natural language to SQL to Chart

Now for a complex query. Notice we ask for a chart in plain English.

Python

code_block: <ListValue: [StructValue([(‘code’, ‘question = “What are the top 20 most popular search terms last week in NYC based on rank? Display each term and score as a column chart”rnstream_chat_response(question=question)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d4652e0>)])]>

The agent will stream its process:

It will show the SQL query it generated to hit the top_terms table, filtering by dma_name = ‘New York NY’ and the most recent week.
It will print the resulting data as a table.
It will generate a Vega chart specification.
The preview_in_browser utility will serve this as an index.html file, showing a column chart.

The stateful follow-up

This is where the stateful conversation (Step 2) pays off.

Python

code_block: <ListValue: [StructValue([(‘code’, ‘question = “What was the percent gain in growth for these search terms from the week before?”rnstream_chat_response(question=question)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f091d465130>)])]>

The agent remembers “these search terms” refers to the results from Question 2. It will generate a new query, this time INNER JOIN-ing the top_terms and top_rising_terms tables (as guided by our join_instructions) to find the percent_gain for that same list of terms.

Step five: Managing the agent

For a more in depth lifecycle management of the agent and messages, visit the Conversational Analytics API documentation page for the many various API requests you can make (HTTP / Python). You will find information on how to manage agents, how to invite new users to collaborate via the SetIAM, GetIAM APIs, and more.

Pro tip: Bridge the gap between data and people

By providing clear system instructions and schema descriptions, you can build an agent that is more than just conversational, as it becomes a domain expert. This interactive approach moves beyond static dashboards to provide truly accessible data analysis.

Building a conversational agent in BigQuery using the Conversational Analytics API

Step One: Configure and create the agent

Step two: Creating a conversation (stateful vs. stateless)

Step three: Create a streaming chat loop

Step four: Talk to the agent

Step five: Managing the agent

Get started

Related Posts