Visualizing Correlation: How to Add Correlation Coefficient to Scatter Plots in Python

Learn how to add correlation coefficient to scatter plots in Python with ease! This article covers two methods - using seaborn and matplotlib - and provides step-by-step code examples. Improve your data visualization skills today!

Updated October 18, 2023

In data analysis and visualization, it is often useful to show the strength of the relationship between two variables. One way to do this is by adding a correlation coefficient to a scatter plot. In Python, you can use the matplotlib library to create a scatter plot and add a correlation coefficient label. In this article, we will demonstrate how to do this using code examples.

Code Demonstrations

Importing Libraries

First, let’s import the necessary libraries:

import matplotlib.pyplot as plt
from scipy.stats import pearsonr

The matplotlib library is used for creating the scatter plot and other visualizations, while the scipy.stats module is used to calculate the correlation coefficient.

Creating a Scatter Plot

Next, let’s create a scatter plot using the matplotlib library:

# Create a scatter plot
plt.scatter(x, y)

Here, x and y are the two variables you want to visualize the relationship between. The resulting scatter plot will look something like this:

Scatter Plot

Calculating the Correlation Coefficient

To calculate the correlation coefficient, we can use the pearsonr function from the scipy.stats module:

# Calculate the correlation coefficient
corr_coef = pearsonr(x, y)

The pearsonr function takes two arrays as input and returns a tuple containing the correlation coefficient and the p-value of the correlation. The correlation coefficient ranges from -1 to 1, with values closer to 1 indicating a positive correlation and values closer to -1 indicating a negative correlation.

Adding the Correlation Coefficient Label

Finally, let’s add the correlation coefficient label to the scatter plot:

# Add the correlation coefficient label
plt.text(0.5, 0.5, f"Correlation Coefficient: {corr_coef[0]}", ha="center")

Here, we use the text function from the matplotlib.pyplot library to add a text label to the scatter plot. The ha="center" argument centers the label horizontally.

The resulting scatter plot with the correlation coefficient label will look something like this:

Scatter Plot with Correlation Coefficient Label

Conclusion

In this article, we demonstrated how to add a correlation coefficient label to a scatter plot in Python using the matplotlib and scipy.stats libraries. This can be a useful tool for visualizing the strength of the relationship between two variables and quickly identifying patterns in your data.

Hey! Do you love Python? Want to learn more about it?
Let's connect on Twitter or LinkedIn. I talk about this stuff all the time!