How to Use the Python Connector for Data Cloud

The ability to efficiently access data wherever it resides is crucial when building visual data models, performing analytical operations, or building machine learning models. The Data Cloud Python Connector abstracts Data Cloud’s Query APIs to help developers quickly authenticate and access data within Data Cloud.

In this blog post, we’ll delve into the key features of the Python Connector for Data Cloud v1.0.15, and provide practical examples and code snippets to help you get started.

Prerequisites

Python 3.7 or greater
The pip package management tool
OpenSSL to generate private keys and Certificate Signing Requests (CSR)
Visual Studio Code (VS Code)
VS Code Python extension (For additional details on installing extensions, see Extension Marketplace)

Set up your Salesforce environment

For your Python code to authenticate with Data Cloud, you’ll need a connected app and a valid user in Salesforce. For our example, we’ll be using an OAuth 2.0 JWT Bearer flow. This is best suited to server-to-server communications since it doesn’t require someone to log in interactively.

Step 1: Create a certificate and private key

For our Python application to authenticate to Salesforce, we need to create a certificate. Certificates provide a secure way to authenticate applications to Salesforce. The private key ensures that only authorized applications can generate valid JWTs.

For detailed instructions, check out the Salesforce DX Developer Guide. The output of the steps in the developer guide will yield a server.crt and the server.key that we’ll use later in this post, so keep them on hand.

Step 2: Create a connected app in Salesforce

The connected app provides a framework that enables an external application (in this case, our Python application) to integrate with Salesforce and Data Cloud using APIs and standard protocols, such as OAuth and OpenID Connect.

Log in to your Salesforce org and navigate to Setup → App Manager. Click New Connected App.
Select Create an External Client App, then Continue.

Under the Basic Information section, enter the following:

External Client App Name
Data Cloud Python App

API Name
Data_Cloud_Python_App

Contact Email
<your email address>

Distribution State
Local

Description
Connected application for Python

Under the API section, check the Enable OAuth checkbox.
Enter the value https://localhost.com for the Callback URL.
Select the following OAuth Scopes:

Manage user data via APIs (api)
Perform requests at any time (refresh_token, offline_access)
Manage Data Cloud profile data (cdp_profile_api)
Perform ANSI SQL queries on Data Cloud data (cdp_query_api)

Under the Flow Enablement section, select the Enable JWT Bearer Flow checkbox.
Use the Upload Files button to upload the server.crt self-signed certificate we created earlier.

Under the Security section, de-select all options.
Click Create.
On the Policies sub-tab, click Edit.

Expand the OAuth Policies section.
Under the Plugin Policies section, modify Permitted Users to Admin approved users are pre-authorized.
Under Select Profiles, select System Administrator. Here you can add any profiles or permission sets for the user you’ll be using in your Python app.

Under the App Authorization section, modify the Refresh Token Policy to Refresh token is valid until revoked.
For IP Relaxation, select Relax IP restrictions.

Click Save.

Step 2: Retrieve the Consumer Key and Secret

Now that the Connected App is created, we can retrieve the consumer key.

On the Settings sub tab under OAuth Settings → App Settings, click Consumer Key and Secret.

On the page displayed, click copy for the Consumer Key and save the details for later.

Set up your Python Environment

Step 1: Install a Python interpreter

Along with the Python extension, you need to install a Python interpreter. Which interpreter you use is dependent on your specific needs but some guidance is provided in the Visual Studio documentation.

Step 2: Start VS Code in a workspace folder

Create a folder to store your project called data-cloud-demo through the operating system UI, then open VS Code and use VS Code’s File > Open Folder to open the project folder.

Step 3: Create a virtual environment

A best practice among Python developers is to use a project-specific virtual environment. Once you activate that environment, any packages you then install are isolated from other environments.

Open the Command Palette (⇧⌘P), start typing the Python: Create Environment command to search, and then select the command.

The command presents a list of environment types, Venv or Conda. For this example, select Venv, then select your interpreter.

Create your Python source code

Step 1: Add the Salesforce private key to your project folder

From the File Explorer toolbar, select the New File button on the data-cloud-demo folder.
Name the file salesforce.key and copy and paste the private key from server.key created earlier. Your private key can be used to access your Salesforce environment and you must never share it. Immediately put it on .gitignore (or equivalent) and use a secret manager to securely store sensitive data to adhere to your company’s security policies for production use.

Step 2: Install the Salesforce Data Cloud Connector and PyYAML

Install the CDP Python Connector from the PyPI (Python Package Index) repository using the following command.

pip install salesforce-cdp-connector

Upon successful installation, you’ll see the following message: Successfully Installed salesforce-cdp-connector-<version>.

Then install a YAML parser that we can use to read configuration files.

pip install pyyaml

Upon successful installation, you’ll see the following message: Successfully Installed pyyaml-<version>.

Step 3: Create a config file

We’ll store the parameters needed in a config file as a best practice to avoid hard-coding them later.

From the File Explorer toolbar, select the New File button on the data-cloud-demo folder.
Name the file config.yaml and add your Salesforce details.

salesforce:
login_url: <the salesforce login url e.g. https://login.salesforce.com >
username: <your salesforce username>
connected_app:
client_id: <the client id you created earlier for the connected app>

Step 4: Create a Python file

From the File Explorer toolbar, select the New File button on the data-cloud-demo folder.
Name the file data-cloud.py, and VS Code will automatically open it in the editor.

Step 5: Create a Connection object

The Connection object handles the authentication to Data Cloud. It provides support for username and password flow, OAuth Web Server Flow, and OAuth JWT Bearer Flow. In this post, we’re using the JWT flow using the connected app that we created earlier.

login_url
Salesforce org url

client_id
The consumer key copied from your connected app

username
The username of the person to authenticate as

private_key
The private key used when creating the connected app

The Connection object will automatically create a JWT token and use the private key to encode the payload. It will also automatically exchange the Salesforce access token it receives for a Data Cloud token that can be used to invoke its APIs. For details on the prerequisites required to access Data Cloud resources, check out the Data Cloud Reference Guide.

In the data-cloud.py file add the following code:

from salesforcecdpconnector.connection import SalesforceCDPConnection
import yaml

# Read the configuration file with client_id, login_url and username
with open(“config.yaml”, “r”) as ymlfile:
config = yaml.load(ymlfile, Loader=yaml.Loader)

# Read the private key used to encode the JWT assertion added in Step 1
fd = open(‘./salesforce.key’);
private_key = fd.read()

# Pass in the connected app client id, username and private key
connection = SalesforceCDPConnection(
login_url=config[“salesforce”][“login_url”],
client_id=config[“salesforce”][“connected_app”][“client_id”],
username=config[“salesforce”][“username”],
private_key=private_key
)

# Output the Connection object details
print(connection)

Your code should look like this:

Step 6: Retrieve data

The Python Connector for Data Cloud has three ways to fetch data: fetchone(), fetchall(), and get_pandas_dataframe(). You can substitute the queries in the examples with data lake objects, data model objects, or calculated insight objects from your environment.

Let’s take a look at each of these.

fetchone()

Create a cursor object to execute queries. When a query is executed, the cursor passes that query to Data Cloud which fetches the results.

This method retrieves the first row of a query.

# Create a cursor object
cursor = connection.cursor()

# Execute the query – substituting a valid DMO, DLO and CIO
cursor.execute(‘SELECT * FROM Animal__dlm’)

# Fetch one row
row = cursor.fetchone()

# Check a row is present and output result
if row is None:
print(“No records found.”)
else:
print(row)

# Close the cursor and connection
cursor.close()
connection.close()

fetchall()

Create a cursor object to execute queries. When a query is executed, the cursor passes on that query to the Data Cloud to fetch the results.

# Create a cursor object
cursor = connection.cursor()

# Execute the query – substituting a valid DMO, DLO and CIO
cursor.execute(‘SELECT * FROM Reservation__dlm’)
rows = cursor.fetchall()

# Check if rows are present and output result
if rows:
print(rows)
else:
print(“No records found.”)

# Close the cursor and connection
cursor.close()
connection.close()

get_pandas_dataframe()

Pandas is a powerful Python library designed specifically for data manipulation and analysis. It provides high-performance, flexible data structures and a wide range of tools for data cleaning, transformation, and analysis. It’s widely adopted by data scientists since it integrates well with other libraries like NumPy and Matplotlib, making it easier to perform statistical analysis, data visualization, and machine learning tasks.

A DataFrame is a fundamental data structure in Pandas, and it is essentially a two-dimensional structure with columns that can hold different data types.

The get_pandas_dataframe() method allows developers to retrieve results from Data Cloud into this structure directly.

# Get the data as a Pandas DataFrame
df = connection.get_pandas_dataframe(‘SELECT Type_c__c, Age_c__c, Breed_c__c FROM Animal__dlm’)

# Output first 5 rows of DataFrame if records are found
if df.shape[0] == 0:
print(“No records found.”)
else:
print(df.head())

# Close the connection
connection.close()

Let’s update the code to execute a SQL query for a data model object called Animal__dlm and use the ability to immediately put the results into a Panda DataFrame.

In the data-cloud.py file add the following code:

# Return key Data Cloud fields from a data model object into a Pandas DataFrame
df = connection.get_pandas_dataframe(‘SELECT Type_c__c, Age_c__c, Breed_c__c FROM Animal__dlm’)

#Output the first 5 rows (default behavior)
if df.shape[0] == 0:
print(“No records found.”)
else:
print(df.head())

# Close the connection
connection.close()

Here we’re using the Pandas DataFrame head() method. This returns a specified number of rows from the top of the DataFrame. The head() method returns the first five rows if a number is not specified. Note: The column names will also be returned in addition to the specified rows.

Run your Python code

Step 1: Run Python file

To run your Python project, click the play button in the top-right of your VS Code editor. The button opens a terminal window and runs data-cloud.py.

Alternatively, you can run your code using the following command:

python3 data-cloud.py (macOS/Linux) or python data-cloud.py (Windows):

Step 2: Verify output

After you run your Python file, you can see the output from the query.

With only a few lines of code, we have successfully connected to Data Cloud and queried key records for use in our application.

Conclusion

The Python Connector for Data Cloud is a powerful tool that simplifies the process of interacting with Data Cloud APIs from Python applications. The connector simplifies authentication with Data Cloud and provides simple methods to retrieve data.

With the ability to easily fetch key data from your data model objects, data lake objects, and calculated insights, you can create visual data models, perform powerful analytical operations, and build powerful machine learning models.

Resources

Documentation: CDP Python Connector
Documentation: Create a Private Key and Self-Signed Digital Certificate
Documentation: Data Cloud Reference Guide: Getting Started
Video: Bring Your Own Model to Data Cloud with Google Vertex AI
codeLive: How to Bring Your Own Model with Model Builder

About the author

Dave Norris is a Developer Advocate at Salesforce. He’s passionate about making technical subjects broadly accessible to a diverse audience. Dave has been with Salesforce for over a decade, has over 35 Salesforce and MuleSoft certifications, and became a Salesforce Certified Technical Architect in 2013.

The post How to Use the Python Connector for Data Cloud appeared first on Salesforce Developers Blog.