BriteWire uses data science to drive intelligent marketing decisions. When making a presentation to a new customer, I like to include a quick primer on Data Correlation to help them understand how powerful it can be in revealing trends, patterns, and characteristics of success for their business.
When two sets of data are strongly linked together we say they have a High Correlation. The classic textbook example used to demonstrate correlated data is height / weight data sets for human beings. The data indicates that a person’s weight is dependent on how tall they are, with shorter people tending to weigh less than taller people.
BriteWire uses correlation in many ways. We first use it to validate data sources. This is done by using known data values to check if they correlate with the data set we are analyzing but have not yet validated. This is best demonstrated using some simple data sets with known public data.
The following chart (figure 1) is a data set that measures visitor interest in Yellowstone National Park correlated with Glacier National Park.
Both of these parks publish Visitation Statistics, so we can use these known values to validate the data set we are working with. Visitors per year to Yellowstone National Park are approximately just over 3 million per year, with visitors to Glacier National Park averaging slightly lower at just over 2 million. Analyzing the published data in more detail reveals that there is seasonality, with the summer months being the period of highest visitation.
Looking at the chart we see that the data set indicates interest in Yellowstone National Park is higher than interest in Glacier National Park, and that interest peaks during the summer months. The data set correlates with the published data from the National Parks.
Scatter Plots are often used to visualize correlated data because they indicate how strong the correlation is. In the next chart (figure 2) web wearch activity for Yellowstone National Park and Glacier National Park are displayed in a scatter plot.
The scatter plot in figure 2 is showing a Strong Positive Correlation between Web Search Activity for Yellowstone National Park, and Glacier National Park. A user searching for information on Yellowstone National Park also appears to be searching for information on Glacier National Park. This makes sense because they are both located in the state Montana, and many people are interested in visiting both during their vacation.
As a result, your content marketing strategy should group these two subjects together, and perhaps cross link / cross promote between them. If you are a tour opporator perhaps you create a travel package that includes visiting both parks. These are very basic take-a-ways, but you get the idea.
A week correlation is easy to visualize with a scatter plot. The last chart(figure 3) is the scatter plot for web search activity for Yellowstone National Park and Katy Perry.
Not surprisingly the scatter plot indicates a week correlation between these two data sets.
In addition to validating data sets, Data Correlation can be used in many different ways to drive intelligent decisions for marketers, including social marketing strategy, content marketing, and interpreting data sets derived from Buzz Monitoring.
We will explore Data Correlation in greater detail in future articles as we explore these topics and more.