You may find Part 1 and Part 2 interesting before heading into part 3, below.
Interpreting Statistical Output and P-Values
To recap, you own Pearson’s Pizza and you’ve hired your nephew Lloyd to run the establishment. And since Lloyd is not much of a math whiz, you’ve decided to help him learn some statistics based on pizza prices.
When we left off, you and Lloyd realized that, despite a strong correlation and high R-Squared value, the residual plot suggests that predicting pizza prices from toppings will become less and less accurate as the number of toppings increase:
Looking back at the original scatterplot and software output, Lloyd protests, “But the p-value is significant. It’s less than 0.0001.”
Doesn’t a small p-value imply that our model is a go?
A crash course on hypothesis testing
In Pearson Pizza’s back room, you’ve got two pool tables and a couple of slot machines (which may or may not be legal). One day, a tall, serious man saunters in, slaps down 100 quid and challenges (in an British accent), “My name is Ronnie O’Sullivan. That’s right. THE. Ronnie O’Sullivan.”
You throw the name into the Google and find out the world’s best snooker player just challenged you to a game of pool.
Then something interesting happens. YOU win.
Suddenly, you wonder, is this guy really who he says he is?
Because the likelihood of you winning this pool challenge to THE Ronnie O’Sullivan is slim to none IF he is, indeed, THE Ronnie O’Sullivan (the world’s best snooker player).
Beating this guy is SIGNIFICANT in that it challenges the claim that he is who he claims he is:
You aren’t SUPPOSED to beat THE Ronnie O’Sullivan – you can’t even beat Lloyd.
But you did beat this guy, whoever he claims to be.
So, in the end, you decide this man was an impostor, NOT Ronnie O’Sullivan.
In this scenario:
The claim (or “null hypothesis”): “This man is Ronnie O’Sullivan” you have no reason to question him – you’ve never even heard of snooker
The alternative claim (or “alternative hypothesis”): “This man is NOT Ronnie O’Sullivan”
The p-value: The likelihood you beat the world’s best snooker player assuming he is, in fact, the real Ronnie O’Sullivan.
Therefore, the p-value is the likelihood an observed outcome (at least as extreme) would occur if the claim were true. A small p-value can cast doubt on the legitimacy of the claim – chances you could beat Ronnie O’Sullivan in a game of pool are slim to none so it is MORE likely he is not Ronnie O’Sullivan. Still puzzled? Here’s a clever video explanation.
Some mathy stuff to consider
The intention of this post is to tie the meaning of this p-value to your decision, in the simplest terms I can find. I am leaving out a great deal of theory behind the sampling distribution of the regression coefficients – but I would be happy to explain it offline. What you do need to understand, however, is your data set is just a sample, a subset, from an unknown population. The p-value output is based on your observed sample statistics in this one particular sample while the variation is tied to a distribution of all possible samples of the same size (a theoretical model). Another sample would indeed produce a different outcome, and therefore a different p-value.
The hypothesis our software is testing
The output below gives a great deal of insight into the regression model used on the pizza data. The statistic du jour for the linear model is the p-value you always see in a Tableau regression output: The p-value testing the SLOPE of the line.
Therefore, a significant or insignificant p-value is tied to the SLOPE of your model.
Recall, the slope of the line tells you how much the pizza price changes for every topping ordered. Slope is a reflection of the relationship of the variables you’re studying. Before you continue reading, you explain to Lloyd that a slope of 0 means, “There is no relationship/association between the number of toppings and the price of the pizza.”
Zoom into that last little portion – and look at the numbers in red below:
Panes | Line | Coefficients | ||||||
Row | Column | p-value | DF | Term | Value | StdErr | t-value | p-value |
Pizza Price |
Toppings |
< 0.0001 |
19 |
Toppings |
1.25 |
0.0993399 |
12.5831 |
< 0.0001 |
intercept |
15 |
0.362738 |
41.3521 |
< 0.0001 |
In this scenario:
The claim: “There is no association between number of toppings ordered and the price of the pizza.” Or, the slope is zero (0).
The alternative claim: “There is an association between the number of toppings ordered and the price of the pizza.” In this case, the slope is not zero.*
The p-value = Assuming there is no relationship between number of toppings and price of pizza, the likelihood of obtaining a slope of at least $1.25 per topping is less than .01%.
The p-value is very small** — A slope of at least $1.25 would happen only .01% of the time just by chance. This small p-value means you have evidence of a relationship between number of toppings and the price of a pizza.
What the P-Value is NOT
- The p-value is not the probability the null hypothesis is false – it is not the likelihood of a relationship between number of toppings and the price of pizza.
- The p-value is not evidence we have a good linear model – remember, it’s only testing a relationship between the two variables (slope) based on one sample.
- A high p-value does not necessarily mean there is no relationship between pizza price and number of toppings – when dealing in samples, chance variability (differences) and bias is present, leading to erroneous conclusions.
- A statistically significant p-value does not necessarily mean the slope of the population data is not 0 – see the last bullet point. By chance, your sample data may be “off” from the full population data.
The P-Value is Not a Green Light
The p-value here gives evidence of a relationship between the number of toppings ordered and the price of the pizza – which was already determined in part 1. (If you want to get technical, the correlation coefficient R is used in the formula to calculate slope.)
Applying a regression line for prediction requires the examination of all parts of the model. The p-value given merely reflects a significant slope — recall there is additional error (residuals) to consider and outside variables acting on one or both of the variables.
Ultimately, Pearson’s Pizza CAN apply the linear model to predict pizza prices from number of toppings. But only within reason. You decide not to predict for pizza prices when more than 5 toppings are chosen because, based on the residual plot, the prediction error is too great and the variation may ultimately hurt long-term budget predictions.
In a real business use case, the p-value, R-Squared, and residual plots can only aid in logical decision-making. Lloyd now realizes, thanks to your expertise, that using data output just to say he’s “data-driven” without proper attention to detail and common sense is unwise.
Statistical methods can be powerful tools for uncovering significant conclusions; however, with great power comes great responsibility.
—Anna Foard is a Business Development Consultant at Velocity Group
*Note this is an automatic 2-tailed test. Technically it is testing for a slope at least $1.25 AND at most -$1.25. For a 1-tailed test (looking only for greater than $1.25, for example) divide the p-value output by 2. For more information on the t-value, df, and standard error I’ve included additional notes and links at the bottom of this post.
**How small is small? Depends on the nature of what you are testing and your tolerance for “false negatives” or “false positives”. It is generally accepted practice in social sciences to consider a p-value small if it is under 0.05, meaning an observation at least as extreme would occur 5% of the time by chance if the claim were true.
More information on t-value, df:
For those curious about the t-value, this statistic is also called the “critical value” or “test statistic”. This value is like a z-score, but relies on the Student’s t-distribution. In other words, the t-value is a standardized value indicating how far a slope of “1.25” will fall from the hypothesized mean of 0, taking into account sample size and variation (standard error).
In t-tests for regression, degrees of freedom (df) is calculated by subtracting the number of parameters being estimated from the sample size. In this example, there are 21 – 2 degrees of freedom because we started with 21 independent points, and there are two parameters to estimate, slope and y-intercept.
Degrees of freedom (df) represents the amount of independent information available. For this example, n = 21 because we had 21 pieces of independent information. But since we used one piece of information to calculate slope and another to calculate the y-intercept, there are now n – 2 or 19 pieces of information left to calculate the variation in the model, and therefore the appropriate t-value.