Salesforce Admins are always on the lookout for innovative ways to enhance workflows and deliver more value to your organizations. Today, we’re excited to introduce a groundbreaking feature in Agentforce that will revolutionize how you interact with data: multimodality. This new capability allows Salesforce’s Foundational Large Language Models (LLMs) to process various types of input, including images and PDFs, alongside traditional text—right within your Salesforce org. With multimodal artificial intelligence (AI), admins now have more flexibility than ever to build smarter, more contextual AI solutions for their teams and users.
Let’s explore what this means for you and how it can transform your daily tasks.
Understanding multimodality and its value
Multimodality refers to the ability of LLMs to analyze and interpret different types of data inputs such as text, images, PDFs, voice, and more—at the same time. This capability opens up a world of possibilities for everyday consumers and businesses alike. By integrating multiple forms of data, LLMs can provide richer, more contextually aware insights and actions. For example, an LLM can read a text description, analyze an accompanying image, and synthesize the information to offer a comprehensive response.
Examples of LLMs that support multimodality include OpenAI’s GPT-4, which can process both text and images, and Google’s Gemini, which can handle text, images, and even video inputs. These advanced models are designed to understand and generate human-like responses across different types of data, making them incredibly versatile and powerful tools for various applications. Now, Salesforce is opening this technology to admins to bring new and powerful solutions to their users.
Introducing image and PDF modality in Prompt Builder
The new image and PDF input feature in Prompt Builder is now live and marks the first milestone on our journey to bring multimodal capabilities to Agentforce. This feature enables LLMs to analyze files attached to Salesforce records, perform grounded reasoning with Retrieval-Augmented Generation (RAG), and provide rich insights—all natively within your Salesforce environment.
You can now:
- Use the Files related list of an object to include visual content in prompts.
- Build flexible prompt templates that accept images and PDFs as input.
- Automate workflows based on visual content from users, clients, or field teams.
And this is just the beginning. Voice input support is on the horizon, further expanding how users can interact with your Agentforce-powered solutions.
Real-world applications: Where multimodality adds value
Admins are constantly having to ask, “How specifically does this relate to my users’ needs?” Below are a few examples of industry use cases where image modality in Agentforce can provide value and benefit to organizations.
In the banking and finance sector, a banking advisor can analyze images of damaged property for insurance claims, quickly assessing and documenting the extent of damage, which speeds up the claims process and improves customer satisfaction. This capability allows advisors to provide more accurate and timely updates to their clients, enhancing the overall customer experience.
In field service repair, a field technician can upload and analyze images of equipment malfunctions, receiving immediate diagnosis and repair instructions, which reduces downtime and improves service efficiency. This ensures that technicians can address issues on the first visit, reducing the need for follow-up appointments and increasing customer satisfaction.
In manufacturing, a quality control manager often receives detailed inspection reports from suppliers in PDF format. With the new PDF modality in Agentforce, the manager can upload these reports directly into Salesforce, where the LLM analyzes the document; extracts key metrics such as defect rates, tolerance levels, and rejected units; and generates a concise summary. This eliminates the need for manual data entry, speeds up quality reviews, and ensures that critical insights are immediately available to drive timely decisions.
Common use cases: Incident reports and hand-written meeting notes
Let’s look at an example. Kiran Singh, a customer at Cumulus Bank, reports an auto-related incident and sends an image of the collision. As her banking advisor, you can use Agentforce to retrieve and analyze the image, providing a detailed description of the scene. This not only streamlines the process but also ensures that all relevant information is captured and stored in Salesforce for future reference.
Imagine another scenario where a sales rep attends a client meeting and takes hand-written notes. With the new image modality feature in Agentforce, the rep can simply take a photo of the notes and upload it to Salesforce. Agentforce will analyze the image, log the meeting details directly within Salesforce, create follow-up tasks based on the notes, and also draft a follow-up email, right off the Salesforce mobile app. This not only saves time but also ensures that all critical information is accurately captured and actionable items are not missed.
Broaden your data sources with multimodality
The introduction of multimodality in Agentforce gives admins the ability to easily add media files to their AI solutions, allowing the same files your users rely on to become part of your Agentforce solution. It marks a significant milestone in our journey to enhance the Agentforce platform. By leveraging image inputs, you can streamline workflows, improve data accuracy, and deliver better outcomes for your users. Stay tuned for more updates as we continue to expand multimodality support to include voice and more.
Ready to experience the power of multimodality? Explore the new image modality feature in Prompt Builder today by signing up for an Agentforce Developer Edition org and see how it can transform your workflows. Join the conversation in our community forums, and share your experiences and insights with fellow Salesforce Admins.
Let’s embrace this new era of digital labor and drive business success together!
Resources
- Salesforce Help: Add File Inputs to a Flex Prompt Template
- Salesforce Help: Grounding with File Inputs
- Salesforce Admins site: Agentforce Home
- Trailhead: Become an Agentblazer Champion
- Trailhead: Quick Start: Create a Prompt Builder Flex Template
- Trailhead: Quick Start: Agent Actions
- Salesforce Admins Blog: Prompt Like a Pro: AI Prompt Writing for Admins
The post Turn Images & PDFs Into AI-Powered Insights With Agentforce appeared first on Salesforce Admins.