Cloud Computing

Migrating to Google Cloud’s Application Load Balancer: A practical guide

Migrating your existing application load balancer infrastructure from an on-premises hardware solution to Cloud Load Balancing offers substantial advantages in scalability, cost-efficiency, and tight integration within the Google Cloud ecosystem. Yet, a fundamental question often arises: “What about our current load balancer configurations?” Existing on-premises load balancer configurations often contain years of business-critical logic for

Migrating to Google Cloud’s Application Load Balancer: A practical guide Read More »

Near-100% Accurate Data for your Agent with Comprehensive Context Engineering

Agentic workflows are already used for initiating action. To be successful, agents typically need to combine multiple steps and execute business logic reflective of real-life decisions. But, as developers rush to deploy these autonomous agents, they are slamming into a wall: the compounding error problem of accuracy. To understand why agentic workflows require near-100% accuracy

Near-100% Accurate Data for your Agent with Comprehensive Context Engineering Read More »

Create Expert Content: Local Testing of a Multi-Agent System with Memory

In support of our mission to accelerate the developer journey on Google Cloud, we built Dev Signal: a multi-agent system designed to transform raw community signals into reliable technical guidance by automating the path from discovery to expert creation. In part 1 and part 2 of this series, we established the essential groundwork by standardizing the

Create Expert Content: Local Testing of a Multi-Agent System with Memory Read More »

A developer’s guide to architecting reliable GPU infrastructure at scale

Editor’s note: This blog post outlines Google Cloud’s GPU AI/ML infrastructure reliability strategy, and will be updated with links to new community articles as they appear. As we enter the era of multi-trillion parameter models, computational power has transitioned from a utility to a mission-critical strategic asset. To meet relentless training demand, organizations are no

A developer’s guide to architecting reliable GPU infrastructure at scale Read More »

Guardrails at the gateway: Securing AI inference on GKE with Model Armor

Enterprises are rapidly moving AI workloads from experimentation to production on Google Kubernetes Engine (GKE), using its scalability to serve powerful inference endpoints. However, as these models handle increasingly sensitive data, they introduce unique AI-driven attack vectors — from prompt injection to sensitive data leakage — that traditional firewalls aren’t designed to catch. Prompt injection

Guardrails at the gateway: Securing AI inference on GKE with Model Armor Read More »

How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads

Cloud Run has long provided developers with a straightforward, opinionated platform for running code. You can easily deploy request-driven web applications using Cloud Run services, or execute run-to-completion batch processing with Cloud Run jobs. However, as developers build more complex applications, like pipelines that process continuous streams of data or distributed AI workloads, they need

How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads Read More »

Build a multi-tenant configuration system with tagged storage patterns

In modern microservices architectures, configuration management remains one of the most challenging operational concerns. Two gaps emerge as organizations scale: handling tenant metadata that changes faster than cache TTL allows, and scaling the metadata service itself without creating a performance bottleneck. Traditional caching strategies force an uncomfortable trade-off: either accept stale tenant context (risking incorrect

Build a multi-tenant configuration system with tagged storage patterns Read More »