The year was 2002. It was the first time I ever stood in front of a classroom of “grown-ups”. The students didn’t know me or care who I was — a TA at LSU covering a College Algebra class. The topic was logarithms. Specifically, an introduction to logs as the inverse of an exponent. I may have been slightly older than the median age of those students and I was terrified, nervous, and profusely sweating.

Up until that moment, I thought deep content knowledge was the secret sauce of teaching. But in the 17 years of experiences that followed, I’ve learned how much MORE there is to teaching than merely knowing your stuff. Student buy-in is the key to student engagement and, ultimately, student learning.

As a corporate trainer I’ve found adults are no different from kids in how they learn and how they engage. It all depends on the trainer’s ability to read the room and adapt as needed.

I compiled the list below after 17 years of total instruction including college algebra and statistics, high school math and AP Statistics, and corporate training for data analysts. I’m sure I will update this list in the future, but at this point, reflecting on my own feedback and observing other trainers, these are the top points I find trainers miss.

Please note this list is not exhaustive and assumes you follow the basic tenants of instruction such as: knowing your audience, knowing your content, preparing x 10, taking breaks every 60 – 75 minutes, beginning promptly after a break, being approachable, avoiding excessive talk and rabbit holes, Rule of 3, minimizing powerpoint, summarizing major points, etc. So here you go – 5 ways to improve student engagement.

“It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change.”

– Charles Darwin

I’ve entered a classroom to discover the WiFi down. Many times. There have been a few unannounced fire drills, some medical emergencies, that time the projector bulb blew — all derailing my meticulously-planned lesson. But the show must go on!

If something goes wrong, keep calm but think on your feet. Focus on keeping the students engaged first. So if the students have working laptops and the only problem is a projector (or your laptop), get them started. Walk around the room teaching the concepts you’d planned to teach from the front. Ask the students questions, have them come up with solutions.

Tech completely down? Ask yourself, “What is the goal of this class?” It’s not easy to teach a tech-driven class with a whiteboard, but it can be done (and here’s hoping the WiFi will be up soon). In fact, if you teach a workshop using a software or only technology, I would urge you to get in the habit of adding in low-tech activities for those “just in case” moments.

Pro Tip: Oftentimes you can lead the students to an a-ha moment or two, then request IT support at the next break.

According to The Oxford Review, adaptability in the workplace is related to one’s emotional intelligence and emotional resilience. And, of course, mindset.

Lastly, being adapatable also means being coachable. Everyone gets frustrated from negative evaluations/feedback at times. But try to step back and ask yourself if you could have improved the delivery. Making tweaks to your performance based on student feedback can help YOU in the long run. Being coachable does NOT mean you give up confidence. You are the professional, but all great professionals learn from feedback and reflection.

It’s okay to admit you don’t know the answer to a question. Saying, “I’ll find out and get back to you” is not a weakness. What’s not okay is making up an answer. “Fake it til you make it” is NOT a mantra of teaching. Especially when you have Google.

“When you tell a lie, you steal someone’s right to the truth.”

– Khaled Hosseini

If you’re worried about questions, I recommend giving everyone sticky notes at the beginning of the class. Encourage students to ask questions. If a question comes up that is not relevant to the topic at hand OR if you don’t know the exact answer, ask the student to write down their question. Create a space on the wall for participants to stick these questions up (some trainers call this a “parking lot”) and on breaks, take some time to research and answer the questions. I would strongly recommend you DON’T take class time to do said research.

And oh the mistakes I’ve made when teaching. Some embarrassing. It happens. And it is important to own those mistakes, especially if you can turn it into a “teaching moment.” For example, I once observed a new math teacher square a binomial incorrectly. A COMMON mistake among students. So instead of (x+2)^2 = (x+2)*(x+2), she (without thinking about it) squared both terms, making (x+2)^2 = x^2 + 4. Whoops. A big mistake in the math world, but really not a big deal when she stopped herself, realizing her mistake, and laughed about it. Then explained that was an example of what NOT to do.

Squaring a binomial incorrectly may not be your mistake, but you will make one (probably many). But to err is human. And there is plenty of research out there to suggest an HONEST teacher is a TRUSTWORTHY teacher. People LIKE honest teachers, especially when it comes to their own flaws.

“Go the extra mile. It’s never crowded.”

-Author Unknown

Names are powerful. Dale Carnegie once said, “A person’s name is, to him or her, the sweetest and most important sound in any language.” When someone takes the time to learn and use your name, you feel important. Which means using a person’s name in conversation is the quickest way to connect with them on a personal level — and therefore promotes positive classroom engagement.

Generally, people also enjoy talking about themselves. Which is a great way to learn their name. On the very first day, after I introduce myself, I give participants the opportunity to introduce themselves and say a few words. You probably already do this. And I use this as an opportunity to learn their name — I write it down then say their name aloud (so they hear it AND to help me remember). Creating a blank seating chart ahead of time is always helpful – this way I can jot down the name and an interesting fact while they speak, creating a reference for later in the course.

My friend and colleague Ryan Nokes remembers names much better than I do, impressing his classes by learning every name immediately! After preliminary introductions he says each person’s name, first to last person, and states their name (without notes). And then Ryan does it again at the start of the next day. People enjoy hearing their own names and are pleased when you remember them later

I teach hands-on courses and encourage constant interaction. When calling on students, I use their first name, careful not to just point to them. When talking to them one on one, I use their name. And by the way, please use the name they gave you. NOT their government name. FYI, I cringe when people call me, “Annamarie.”

Many articles have been published around the power of names. If you aren’t sure about the power of using a name, start here.

“Nothing happens until something moves”

-Albert Einstein

Moving around the room, when done correctly, increases participant engagement.

When I teach, I rarely sit down. Moving around the room allows me to interact with each student one-on-one and check for their understanding. This proximity also allows the more reluctant talker/questioner to ask their burning question when they know the entire class won’t hear them. And, dare I say it: **Moving around the room keeps you in control**.

This is why you hear grade school teachers say they never sit down. K-12 teachers use physical proximity to manage their classrooms. Being interested in each student’s learning promotes positive behaviors and keep students on task. In the same way, walking around keeps adults out of their inbox. And you won’t ever hear me criticize a training participant about their email/phone use in class (despite it being a bit irritating — I mean, you DID sign up to be here) because when they expect me to move their direction, they self monitor and correct these behaviors themselves, often apologizing.

Note: I do try to give every group/person an equal amount of “attention” without lingering anywhere too long.

Educational research also promotes student movement around the room. So when delivering instruction, I like to create activities that make students/groups visualize data by hand – on a white board or big 3M sticky poster. Or even a post-it. “Around the room” activities could also include giving other groups positive feedback or presenting a new discovery in their data.

“We are more powerful when we empower each other.”

-Unknown

After years of hoopla over the concept of “tracking students”, this tip might surprise you. How many times have you heard someone say, “Pair a low with a high?” And, while this strategy could work in certain courses and situations, it is, overall, an outdated practice.

Imagine. You have a grasp of the basics of a particular data visualization tool and use it weekly. A colleague in the same course has only installed the software that morning. Your instructor teams you up so you can “help” your colleague. How does this make you feel? At first, it might feel rewarding — you know the answers! However, in many situations the person doing the “helping” ends up feeling like they didn’t grow in their domain while the person being helped can eventually feel inadequate and frustrated.

No matter how well we market a course (“beginner”, “advanced”, etc) there will always be a heterogeneous group of abilities when I walk in to start instruction. And this is the way it always is — K12 or corporate training. So I can either roll my eyes and teach the outline as prescribed, pacing the middle of all abilities, or I can help all learners by differentiating my instruction a bit.

I’m not talking about group projects here — I only mean seating participants of like abilities near each other for an improved user experience. Of course in a training situation, this arrangement has more to do with experience levels — of which they can self-sort. I generally ask students “new” to the software to sit in the front, and others to sit behind them. That’s all that is usually needed.

But let’s look at the origins of this thought: When used appropriately, “flexible grouping” — pairing and grouping students based on need — can aid student learning on both ends of the experience spectrum. This can be homogeneous or heterogeneous groups. And* how* you utilize it matters. If you must pair high/low, do it only for a short time. (Because I’ve had students ask me, “Am I the dumb one or the smart one?”)

In the long run, research suggests pairing/seating students with similar abilities/experience in a domain (or software) engages all students. And if done correctly, actually improves their learning experience and accelerates their growth. How? If you’re already moving around the room (see #4 above), then it should make sense that you can tailor your instruction much easier to pairs/groups of similar background knowledge than if they are scattered around the room. Think about it — when you are helping a group of participants who are relatively new to the topic or software, you can give a “you try” practice problem to enrich or even accelerate the other group to work on their own, and vice versa. Peer pairing/grouping on similar experience levels also encourages those students to develop a deeper understanding of the topic together, rather than the back-and-forth waiting that occurs when unlike abilities are grouped.

Personally, I mix up my delivery — some whole group instruction, some partner work, maybe an activity in a group, and solo work. Pairing them up encourages dialog about the concepts while they work through a challenge. Groups can offer multiple points of view. Solo work helps the student think through the problem on their own. Since my classes are always hands-on I incorporate the process of I do, we do, you do. But I do start with seating like experience levels together.

Being a teacher (or instructor, or coach) does require multiple skill sets including: entertainer, orchestra conductor, problem-solver, mind reader, therapist, referee, and cheerleader. However, promoting student engagement (teens and adults alike) goes beyond preparing a “fun lesson.” Student engagement results from student buy-in. And student buy-in results from the little things that create a positive atmosphere.

I’m going to add to this list over time. Do you have any suggestions on how you promote adult student engagement?

]]>That’s right, on a cold January day I loaded up two buses of teens and piloted them to an adventure at our local Stars and Strikes. And this is no small feat — this particular trip developed out of months of planning, and after years proclaiming that I will never ever ever ever EVER coordinate my own field trip for high school kids. I mean, you should SEE the stack of paperwork. And the level of responsibility itself made me anxious.

So there I was, field trip money in one hand, clipboard in another: Imagine a caffeinated Tracy Flick. But thanks to the help of two parent chaperones and the AP Psychology teacher (Coach B), we ran the smoothest data-related field trip modern education has ever known.

Statistics textbooks are full of canned examples and squeaky clean data that often have no bearing on a students’ interests. For example, there is an oh-so-relatable exercise computing standard error for D-glucose contained in a sample of cockroach hindguts. In my experience I’ve learned when students can connect to the data, they are able to connect to the concept. We’re all like that, actually — to produce/collect our own data enables us to see what we otherwise would have missed.

(I can assure you confidence intervals constructed from D-glucose in coachroach hindguts did little for understanding standard error.)

The real world is made up of messy data. It’s full of unknowns, clerical errors, bias, unnecessary columns, confusing date formats, missing values; the list goes on. Laser Tag was suggested to me as a way to collect a “large” amount of data in a relatively short amount of time. And because of the size of the dataset, it required the student to input their own data — creating their own version of messy data complete with clerical errors. From there they’d have to make sense of the data, look for patterns, form hypotheses.

- Students entered their data into a Google doc — you can find the complete data here.
- Each partner team developed two questions for the data: One involving 1-variable analysis, another requiring bivariate analysis.
- The duos then had to explore, clean, and analyze all 47 rows and 48 columns.
*At this point in the school year, students had been exposed to data up to about 50 rows, but never had they experienced “wide” data.* - Analyses and presentations required a visualization, either using Excel or Tableau.

Methodology: Each student was randomly assigned to a team using a random number generator. Teams of 5 played each other twice during the field trip. The teams were paired to play each other randomly. If, by chance, a team was chosen to play the same team twice, that choice would be ignored and another random selection would be made until a new team was chosen.

Before each game, I recorded which student wore which laser tag vest number. From the set-up room (see above picture), I could view which vest numbers were leading the fight and which team had the lead. It was entertaining. As the students (and Coach B — we needed one more player for even teams) finished their games, score cards were printed and I handed each student their own personal results. The words, “DON’T lose this” exited my lips often.

Upon our return to school (this only took a few hours, to the students’ dismay), results were already pouring the into the Google doc I’d set up ahead of time.

The AP Statistics exam is held every year in May, hosted by The College Board. On the exam, students are expected to use a graphing calculator but have no access to a computer or Google. Exactly the opposite of the real world.

Throughout the course, I taught all analysis first by hand, or using the TI-83/84. As students became proficient, I added time in the computer lab to teach basic skills using Excel and Tableau (assignments aligned to the curriculum while teaching skills in data analysis). It was my goal for students to have a general understanding of how to use these “real world” analytics tools while learning and applying AP Statistics curriculum.

After the field trip, we spent three days in the computer lab – ample time to work in Tableau and Excel with teacher guidance. Students spent time exploring the 48-column field trip dataset with both Excel and Tableau. They didn’t realize it, but by deciding which chart type to use for different variables, they were actually reviewing content from earlier in the year.

Most faculty members had never heard of Tableau. At lunch one day I sat down with Coach B to demonstrate Tableau’s interface with our field trip dataset.

“What question would you ask this set of data?” I asked.

“A back shot is a cheap shot. I wonder who is more likely to take a cheap shot, males or females?”

So I proceeded to pull up a comparison and used box-and-whiskers plots to look for outliers. Within seconds, a large outlier was staring back at us within the pool of male students:

“Ha. I wonder who that was.” – Coach B

“That’s YOU.” – Me

From there, I created a tongue-in-cheek competitive analysis from the data:

I’ve been teaching since 2004. Over the years, this was probably the most successful project I’ve seen come through my classroom. By “successful”, I’m talking the proportion of students who were able to walk outside of their comfort zone and into a challenging set of data, perform in-depth analyses, then communicate clear conclusions was much higher than in all previous years.

At the end of the year, after the AP Exam, after grades were all but inked on paper, students still talked excitedly about the project. I’d like to think it was the way I linked a fun activity to real-world analysis, though it most likely has to do with getting out of school for a few hours. Either way, they learned something valuable.

One student, Abby, gave me permission to share her work adding, “This is the project that tied it all together. This was the moment I ‘got’ statistics.”

Interestingly, students were less inclined to suggest the female outlier of 2776 shots was a clerical mistake (which it was). I found there were two camps: Students who didn’t want to hurt feelings, and students who think outliers in the wild need no investigation. Hmmm.

When you teach, you learn.

Earlier I said the project was a success based on the students’ results. That’s only partially true; it was also a success because I grew as an educator. After years of playing by the rules I realized that sometimes you need to get outside your comfort zone. For me that was two-fold: 1) Sucking it up and planning a field trip and 2) Losing the old, tired TI-83 practice problems and teaching real-world analytics tools.

When I teach conditional probability, I tell my students to pay close attention to the vertical line in the formula above. Whenever they see it, they must imagine the loud baritone behind-the-scenes announcer voice from Bill Nye saying, “GIVEN!”

This symbol | always indicates we assume the event that follows it has already occurred. The formula above, then, should be read: The probability event A will occur given event B has already occurred.

A simple example of conditional probability uses the ubiquitous deck of cards. From a standard deck of 52, what is the probability you draw an ace on the second draw if you know an ace has already been drawn (and left out of the deck) on the first draw?

Since a deck of 52 playing cards contains 4 aces, the probability of drawing the first ace is 4/52. But the probability of drawing an ace** given** the first card drawn was an ace is 3/51 — 3 aces left in the deck with 51 total cards remaining. Hence,

Tests are flawed.

According to MedicineNet, a rapid strep test from your doctor or urgent care has a 2% false positive rate. This means 2% of patients who do not actually have Group A *streptococcus* bacteria present in their mouth test positive for the bacteria. The rapid strep test also indicates a negative result in patients who do have the bacteria 5% of the time — a false negative.

Another way to look at it: The 2% “false positive” result indicates the test displays a true positive in 98% of patients. The 5% “false negative” result means the test displays a true negative in 95% of patients.

It’s common to hear these false positive/true positive results incorrectly interpreted. **These rates do not mean the patient who tests positive for a rapid strep test has a 98% likelihood of having the bacteria and a 2% likelihood of not having it.** And a negative result does not indicate one still has a 5% chance of having the bacteria.

Even more confusing, but important is the idea that **while a 2% false positive does indicate that 2% of patients who do not have strep test positive, it does not mean that of all positives, 2% do have strep**. There is more to consider in calculating those kinds of probabilities. Specifically, we would need to know how pervasive strep is for that population in order to come close to the actual probability that someone testing positive has the bacteria.

Bayes’ Theorem considers both the population’s probability of contracting the bacteria* and* the false positives/negatives.

I know, I know — that formula looks INSANE. So I’ll start simple and gradually build to applying the formula – soon you’ll realize it’s not too bad.

Many employers require prospective employees to take a drug test. A positive result on this test indicates that the prospective employee uses illegal drugs. However, not all people who test positive actually use drugs. For this example, suppose that 4% of prospective employees use drugs, the false positive rate is 5%, and the false negative rate is 10%.

Here we’ve been given 3 key pieces of information:

- The prevalence of drug use among these prospective employees, which is given as a probability of 4% (or 0.04). We can use the complement rule to find the probability an employee doesn’t use drugs: 1 – 0.04 = 0.96.
- The probability a prospective employee tests positive when they did not, in fact, take drugs — the false positive rate — which is 5% (or 0.05).
- The probability a prospective employee tests negative when they did, in fact, take drugs — the false negative rate — which is 10% (or 0.10).

It’s helpful to step back and consider the two things are happening here: First, the prospective employee either takes drugs, or they don’t. Then, they are given a drug test and either test positive, or they don’t.

I recommend a visual guide for these types of problems. A tree diagram helps you take these two pieces of information and logically draw out the unique possibilities.

Tree diagrams are also helpful to show us where to apply the multiplication principle in probability. For example, to find the probability a prospective employee didn’t take drugs* and* tests positive, we multiply P(no drugs) * P(positive) = (.96)*(.05) = 0.048.

An important note: The probability of selecting a potential employee who did not take drugs *and* tests negative is not the same as the probability an employee tests negative GIVEN they did not take drugs. In the former, we don’t know if they took drugs or not; in the latter, we know they did not take drugs – the “given” language indicates this prior knowledge/evidence.

We can also use the tree diagram to calculate the probability a potential employee tests positive for drugs.

A potential employee could test positive when they took drugs *OR* when they didn’t take drugs. To find the probabilities separately, multiply down their respective tree diagram branches:

Using probability rules, “OR” indicates you must add something together. Since one could test positive in two different ways, just add them together after you calculate the probabilities separately:

P(positive) = 0.048 + 0.036 = 0.084

Which brings us to Bayes’ Theorem:

Let’s find all of the pieces:

- P(positive | no drugs) is merely the probability of a
*false positive =*0.05 - P(no drugs) = 0.96
- So we already calculated the numerator above when we multiplied 0.05*0.96 = 0.048
- We also calculated the denominator: P(positive) = 0.084

which simplifies to

Whoa.

This means, if we know a potential employee tested positive for drug use, there is a 57.14% probability they don’t actually take drugs — which is MUCH HIGHER than the false positive rate of 0.05. In other words, if a potential employee (in this population with 4% drug use) tests positive for drug use, the probability they don’t take drugs is 57.14%

How is that different from a false positive? A false positive says, “We know this person doesn’t take drugs, but the probability they will test positive for drug use is 5%.” While if we know they tested positive, the probability they don’t take drugs is 57%.

Why is this probability so large? It doesn’t seem possible! Yet, it takes into account the likelihood a person in the population takes drugs, which is only 4%.

*In math terms:*

*P(positive | no drugs) = 0.05* while *P(no drugs | positive) = 0.5714*

Which also means that if a potential employee tests positive, the probability they do indeed take drugs is lower than what you might think. You can find this probability by taking the complement of the last calculation: 1 – 0.5714 = 0.4286. OR, recalculate using the formula:

Back in October I posted a #DataQuiz to Twitter, with a Bayesian twist. Can you calculate the answer using this tutorial without looking at the answer (in tweet comments)?

*Hints: *

- Draw out the situation using a tree diagram
- What happens first? What happens second?
- What is “given”?

Stay tuned! Paul Rossman has a follow-up post that I’ll link to when it’s ready. He’s got some brilliant use case scenarios with application in Tableau.

]]>- Incorrectly interpreting a 99% interval as having a “99% probability of containing the true population parameter”
- Finding significance because “the sample mean is contained in the interval”
- Applying a confidence interval to samples that do not meet specific assumptions

Confidence intervals are like fishing nets to an analyst looking to capture the actual measure of a population in a pond of uncertainty. The *margin of error* dictates the width of the “net”. But unlike fishing scenarios, **whether or not the confidence interval actually captures the true population measure typically remains uncertain**. Confidence intervals are not intuitive, yet they are logical once you understand where they start.

So what EXACTLY, are we confident about? Is it the underlying data? Is it the result? Is it the sample? **The confidence is actually in the procedures used to obtain the sample that was used to create the interval** — and I’ll come back to this big idea at the end of the post. First, let’s paint the big picture in three parts: The data, the math, and the interpretation.

As I mentioned, a confidence interval captures a “true” (yet unknown) measure of a population using sample data. Therefore, you must be working with sample data to apply a confidence interval — you’re defeating the purpose if you’re already working with population data for which the metrics of interest are known.

It’s important to investigate how the sample was taken and determine if the sample represents the entire population. *Sampling bias* means a certain group has been under- or over- represented in a sample – in which case, the sample does not represent the entire population. A common misconception is that you can offset bias by increasing the sample size; however, once bias has been introduced to the sample, a larger sample using the same procedure will ensure the sample is much different from the population. Which is NOT a representative sample.

- Excluding a group who cannot be reached or does not respond
- Only sampling groups of people who can be conveniently reached
- Changing sampling techniques during the sampling process
- Contacting people not chosen for sample

A *statistic* describes a sample. A *parameter* describes a population. For example, if a *sample* of 50 adult female pandas weigh an average of 160 pounds, the sample mean of 160 is known as the *statistic*. Meanwhile, we don’t actually know the average of all adult female pandas. But if we did, that average (mean) of the *population* of all female pandas would be the *parameter*. Statistics are used to estimate parameters. Since we don’t typically know the details of an entire population, we rely heavily on statistics.

Mental Tip: Look at the first letters! A

describes aStatisticand aSampledescribes aParameterPopulation

All confidence intervals take the form:

A common example here is polling reports — “The exit polls show John Cena has 46% of the vote, with a margin of error of 3 points.” Most people without a statistics background can draw the conclusion: “John Cena likely has between 43% and 49% of the vote.”

The “statistic” is merely our estimate of the true parameter.

The statistic in the voting example is the *sample* percent from exit polls — the 46%. The actual percent of the population voting for John Cena – the parameter – is unknown until the polls close, so forecasters rely on sample values.

A sample mean is another example of a statistic – like the mean weight of an adult female panda. Using this statistic helps researchers avoid the hassle of traveling the world weighing all adult female pandas.

With confidence intervals, there’s a trade off between precision and accuracy: **A wider interval may capture the true mean accurately, but it’s also less precise than a more narrow interval. **

The width of the interval is decided by the *margin of error* because, mathematically, it is the piece that is added to and subtracted from the statistic to build the entire interval.

How do we calculate the margin of error? You have two main components — a *t* or *z* value derived from the* confidence level,*and the *standard error*. Unless you have control over the data collection on the front end, the confidence *level* is the only component you’ll be able to determine and adjust on the back end.

* “Why can’t we just make it 100% confidence?”* Great question! And one I’ve heard many times. Without going into the details of sampling distributions and normal curves, I’ll give you an example:

Assume the “average” adult female panda weighs “around 160 pounds.” To be 100% confident that we’ve created an interval that includes the TRUE mean weight, we’d have to use a range that includes all possible values of mean weights. This interval might be from, say 100 to 400 pounds – maybe even 50 to 1000 pounds. Either way, that interval would have to be ridiculously large to be 100% confident you’ve estimated the true mean. And with a range that wide, have you actually delivered any insightful message?

Again consider a confidence interval like a fishing net, the width of the net determined by the margin of error – more specifically, the confidence *level *(since that’s about all you have control over once a sample has been taken). This means a LARGER confidence level produces a WIDER net and a LOWER confidence level produces a more NARROW net (everything else equal).

For example: A 99% confidence interval fishing net is wider than a 95% confidence interval fishing net. The wider net catches more fish in the process.

But if the purpose of the confidence interval is to *narrow* down our search for the population parameter, then we don’t necessarily want more values in our “net”. **We must strike a balance between precision (meaning fewer possibilities) and confidence. **

Once a confidence *level *is established, the corresponding t* or z* value — called an *upper* *critical value* — is used in the calculation for the margin of error. If you’re interested in how to calculate the z* upper critical value for a 95% z-interval for proportions, check out this short video using the Standard Normal Distribution.

This is the part of the margin of error you most likely won’t get to control.

Keeping with the panda example, if we are interested in the true mean weight for the adult female panda then the *standard error* is the standard deviation of the sampling distribution of sample mean weights. Standard error, a measure of variability, is **based on a theoretical distribution of all possible sample means**. I won’t get into the specifics in this post but here is a great video explaining the basics of the Central Limit Theorem and the standard error of the mean.

If you’re using proportions, such as in our John Cena election example, here is my favorite video explaining the sampling distribution of the sample proportion (p-hat).

As I mentioned, you will most likely NOT have much control over the standard error portion of the margin of error. But if you did, keep this **PRO TIP** in your pocket: **a larger sample size (****n****) will reduce the width of the margin of error without sacrificing the level of confidence.**

Back to the panda weights example here. Let’s assume we used a 95% confidence interval to estimate the true mean weight of all adult female pandas:

Typically the* confidence interval *is interpreted something like this: **“We are 95% confident the true mean weight of an adult female panda is between 150 and 165 pounds.” **

Notice I didn’t use the word probability. At all. Let’s look at WHY:

The *confidence level *tells us:, **“If we took samples of this same size over and over again (think: in the long run) using this same method, we would expect to capture the true mean weight of an adult female panda 95% of the time.”** Notice this IS a probability. A 95% probability of capturing the true mean exists BEFORE taking the sample. Which is why I did NOT reference the actual interval values. A *differen*t sample would produce a *different* interval. And as I said in the beginning of this post, we don’t actually know if the true mean is in the interval we calculated.

Well then, what IS the probability that my confidence interval – the one I calculated between the values of 150 and 165 pounds – contains the true mean weight of adult female pandas? Either 1 or 0. It’s either there, or it isn’t. Because — and here’s the tricky part — **the sample was already collected ****before**** we did the math. NOTHING in the math can change the fact that we either did or didn’t collect a representative sample of the population. OUR CONFIDENCE IS IN THE DATA COLLECTION METHOD – not the math.**

The numbers in the confidence interval would be

differentusing adifferentsample.

Let’s assume the density curve below represents the actual population mean weights of all adult female pandas. In this made up example the mean weight of all adult pandas is 156.2 pounds with a population standard deviation of 13.6 pounds.

Beneath the population distribution are the simulation results of 300 samples of *n* = 20 pandas (sampled using an identical sampling method each time). Notice that roughly 95% of the intervals cover the true mean — capturing 156.2 within the interval (the green intervals) while close to 5% of intervals do NOT capture the 156.2 (the red intervals).

Pay close attention to the points made by the visualization above:

- Each horizontal line represents a confidence interval constructed from a
*different*sample - The green lines “capture” or “cover” the true (unknown) mean while the red lines do NOT cover the mean.
- If this was a real situation, you would NOT know if your interval contained the true mean (green) or did not contain the true mean (red).

The logic of confidence intervals is based on long-run results — frequentist inference. **Once the sample is drawn, the resulting interval either does or doesn’t contain the true population parameter** — a probability of 1 or 0, respectively. Therefore, the confidence level does not imply the probability the parameter is contained in the interval. In the LONG run, after many samples, the resulting intervals will contain the mean C% of the time (where C is your confidence level).

So in what are we placing our confidence when we use confidence intervals? Our confidence is in the procedures used to find our sample. Any sampling bias will affect the results – which is why you don’t want to use confidence intervals with data that may not represent the population.

]]>That box-and-whisker plot (or, boxplot) you learned to read/create in grade school probably IS different from the one you see presented in the adult world.

The boxplot on the top originated as the Range Bar, published by Mary Spear in the 1950’s. While the boxplot on the bottom was a modification created by John Tukey to account for outliers. *Source: Hadley Wickham*

As a former math and statistics teacher, I can tell you that (depending on your state/country curriculum and textbooks, of course) you most likely learned how to read and create the former boxplot (or, “range bar”) in school for simplicity. Unless you took an upper-level stats course in grade school or at University, you may have never encountered Tukey’s boxplot in your studies at all.

You see, teachers like to introduce concepts in small chunks. While this is usually a helpful strategy, students lose when the full concept is never developed. In this post I walk you through the range bar AND connect that concept to the boxplot, linking what you’ve learned in grade school to the topics of the present.

In this example, I’m comparing the lifespans of a small, non-random set of animals. I chose this set of animals based solely on convenience of icons. Meaning, conclusions can only be drawn on animals for which Anna Foard has an icon. I note this important detail because, when dealing with this small, non-random sample, one cannot infer conclusions on the entire population of all animals.

Quartiles break the dataset into 4 quarters. Q1, median, Q3 are (approximately) located at the 25th, 50th, and 75th percentiles, respectively.

Finding the median requires finding the middle number when values are ordered from least to greatest. When there is an even number of data points, the two numbers in the middle are averaged.

Once the median has been located, find the other quartiles in the same way: The middle value in the bottom set of values (Q1), then the middle value in the top set (Q3).

The first and third quartiles build the “box”, with the median represented by a line inside the box. The “whiskers” extend to the minimum and maximum values in the dataset:

But without the points:

The Range Bar probably looks similar to the first box-and-whisker plot you created in grade school. If you have children, it is most likely the first version of the box-and-whisker plot that they will encounter.

Since the kid’s version of the boxplot does not show outliers, I propose teachers call this version, “The Range Bar” as it was originally dubbed, to not confuse those reading the chart. After all, someone looking at this version of a boxplot may not realize it does not account for outliers and may draw the wrong conclusion.

The only difference between the range bar and the boxplot is the view of outliers. Since this version requires a basic understanding of the concept of outliers and a stronger mathematical literacy, it is generally introduced in a high school or college statistics course.

The interquartile range is the difference, or spread, between the third and first quartile reflecting the middle 50% of the dataset. The IQR builds the “box” portion of the boxplot.

1.5*IQR is then subtracted from the lower quartile and added to the upper quartile to determine a boundary or “fences” between non-outliers and outliers.

Since no animals’ lifespans are below -5 years, it is not possible for a low-value outlier in this particular set of data; however, one animal in this dataset lives beyond 31 years – an outlier in higher values.

Here we find the modification on the “range bar” – the whiskers only extend as far as non-outlier values. Outliers are denoted by a dot (or star).

In an academic setting, I use boxplots a great deal. When teaching AP Statistics, they are helpful to visualize the data quickly by hand as they only require summary statistics (and outliers). They also help students compare and visualize center, spread, and shape (to a degree).

When we get into the inference portion of AP Stats, students must verify assumptions for certain inference procedures — often those procedures require data symmetry and/or absence of outliers in a sample. The boxplot is a quick way for a student to verify assumptions by hand, under time constraints. When coaching doctoral candidates through the dissertation stats, similar assumptions are verified to check for outliers — using boxplots.

- Summarizes variation in large datasets visually
- Shows outliers
- Compares multiple distributions
- Indicates symmetry and skewness to a degree
- Simple to sketch
- Fun to say

Unfortunately, boxplots have their share of disadvantages as well.

Consider:

A boxplot may show summary statistics well; however, clusters and multimodality are hidden.

In addition, a consumer of your boxplot who isn’t familiar with the measures required to construct one will have difficulty making heads or tails of it. This is especially true when your resulting boxplot looks like this:

Or this:

Or what about this?

- Hides the multimodality and other features of distributions
- Confusing for some audiences
- Mean often difficult to locate
- Outlier calculation too rigid – “outliers” may be industry-based or case-by-case

Over the course of the years, multiple boxplot variations have been created to display parts (or all) of the distribution’s shape and features.

Box-and-whisker plots may be helpful for your specific use case, though not intuitive for all audiences. It may be helpful to include a legend or annotations to help the consumer understand the boxplot.

No cheating! Without looking back through this post, check your own understanding of boxplots. Answer can be found on the #MakeoverMonday webinar I recorded with Eva Murray a couple weeks ago.

]]>

I decided to figure out how to create one in Tableau. Based on the types of cumulative frequency distributions I was used to when I taught AP Stats, I first determined I wanted the value of interest on the horizontal axis and the percents on the vertical axis.

Using a simple example – US President age at inauguration – I started with a histogram so I could look at the overall shape of the distribution:

From here I realized I already had what I needed in my view – discrete ages on the x-axis and counts of ages on the y-axis. For a wider range of values I would want a wider bin size, but in this situation I needed to resize bins to 1, representing each individual age.

Click on the green pill on the rows (the COUNT) and add a table calculation.

First choose “Running Total”, then click on the box “add secondary calculation”:

Next, choose “percent of total” as the secondary calculation:

Add drop lines…

…and CTRL drag the COUNT (age in years) green pill from the rows to labels. Click on “Label” on the marks card and change the marks to label from “all” to “selected”.

And there you have it.

Percentiles describe the position of a data point relative to the rest of the dataset using a percent. That’s the percent of the rest of the dataset that falls *below* the particular data point. Using the baby weights example, the percentile is the *percent* of all babies of the same age and gender weighing *less than *your baby.

Back to the US president example.

Since I know Barack Obama was 47 when inaugurated, let’s look at his age relative to the other US presidents’ ages at inauguration:

And another way to look at this percentile: 87% of US presidents were older than Barack Obama when inaugurated.

Thank you for reading and have an amazing day!

-Anna

]]>My favorite of all the means. Sometimes called *expected value*, or the *mean of a discrete random variable*.

When computing a course grade or overall GPA, the weighted mean takes into account each possible outcome and how often that outcome occurs in a dataset. A weight is applied to each possible outcome — for example, each type of grade in a course — then added together to return the overall weighted mean. And since Econ was my favorite course in college…

If you have an exam average of 80, quiz/homework average of 65 and lab average of 78, what is your final grade? (Hint: Don’t forget to change percentages to decimals.)

Weighted means are also effective for assessing risk in insurance or gambling. Also known as the *expected value*, it considers all possible outcomes of an event and the probability of each possible outcome. Expected values reflect a long-term average. Meaning, over the long run, you would expect to win/lose this amount. A negative expected value indicates a house advantage and a positive expected value indicates the player’s advantage (and unless you have skills in the poker room, the advantage is never on the player’s side). An expected value of $0 indicates you’ll break even in the long-run.

I’ll admit my favorite casino game is American roulette:

As you can see, the “inside” of the roulette table contains numbers 1-36 (18 of which are red, the other 18 black). But WAIT! Here’s how they fool you — see the numbers “0” and “00”? 0 and 00 are neither red nor black, though they do count towards the 38 total outcomes on the roulette board. When the dealer spins the wheel, a ball bounces around and chooses from numbers 1 thru 36, 0 AND 00 — that’s 38 possible outcomes.

Let’s say you wager $1 on “black”. And if the winning number is, in fact, black, you get your original dollar AND win another (putting you “up” $1). Unsuspecting victims new to the roulette table think they have a 50/50 shot at black; however, the probability of “black” is actually 18/38 and the probability of “not black” is 20/38″.

Here’s how it breaks down for you:

Just as in the grading example, each outcome (dollars made or lost) is first multiplied by its weight, where the weight here is the theoretical probability assigned to that outcome. After multiplying, add each product (outcome times probability) together. Note: Don’t divide at the end like you’d do for the arithmetic mean – it’s a common mistake, but easy to remedy if you check your work.

**Some Gambling Advice: **The belief that casino games adhere to some “law of averages” in the short run is called the Gambler’s Fallacy. Just because the ball on the roulette wheel landed on 5 red numbers in a row doesn’t mean it’s time for a black number on the next spin! I watched a guy lose $300 on three spins of the wheel because, as he exclaimed, “Every number has been red It’s black’s turn! It’s the law of averages!”

A Geometric mean is useful when you’re looking to average a factor (multiplier) applied over time – like investment growth or compound interest.

I enjoyed my finance classes in school, especially the part about how compound interest works. If you think about compound interest over time, you may recall the growth is exponential, not linear. And exponential growth indicates that in order to grow from one value to the next, a constant was multiplied (not added).

As a basic example, let’s say you invest $100,000 at the beginning of 4 years. For simplicity, let’s say the growth rate followed the pattern +40%, -40%, +40%, -40% over the 4 years. At the end of 4 years, you’ve got $70,560 left.

So you know your 4-year return on the investment is: (70,560 – 100,000)/100,000 = -.2944 or -29.44%. But if you averaged out the 4 growth rates using the arithmetic mean, you’d have 0%. Which is why the arithmetic mean doesn’t make sense here.

Instead, apply the geometric mean:

You drive 60 mph to grandma’s house and 40 mph on the return trip. What was your average speed?

**Let’s dust off that formula from physics class: speed = distance/time**

Since the speed you drive plays into the time it takes to cover a certain distance, that formula may clue you in as to why you can’t just take an arithmetic mean of the two speeds. So before I introduce the formula for harmonic mean, I’ll combine those two trips using the formula for speed to determine the average speed.

**The set-up **Distance doesn’t matter here so we’ll use 1 mile. Feel free to use a different distance to verify, but you’d be reducing fractions a good bit along the way and I’m all about efficiency. Use a distance of 1 mile for each leg of the journey and the two speeds of 40mph and 60 mph.

First determine the time it takes to go 1 mile by reworking the speed formula:

To determine the average speed, we’ll combine the two legs of the trip using the speed formula (which will return the overall, or average, speed of the entire trip):

The formula for the harmonic mean looks like this:

Where *n* is the number of 1-mile trips, in this example, and the rates are 40 and 60 mph:

If you scroll up and check out that last step using the speed formula (above), you’ll see the harmonic mean formula was merely a clean shortcut.

If you want more information about measures of center, check out the previous blog post — Mean, Median, and Mode: How Visualizations Help Measure What’s Typical

If your organization is looking to expand its data strategy, fix its data architecture, implement data visualization, and/or optimize using machine learning, check out Velocity Group.

]]>This post aims to review the basics of how measures of central tendency — mean, median, and mode — are used to measure what’s *typical*. Specifically, I’ll show you how to inspect distributions of variables visually and dissect how mean, median, and mode behave, in addition to common ways they are used. Ultimately it may be difficult, impossible, or misleading to describe a set of data using one number; however, I hope this journey of data exploration helps you understand how different types of data can effect how we describe what’s *typical*.

Fair enough — I too try to forget the teased hair and track suit years. But I do recall learning to calculate mean, median, mode, and range for a set of numbers with no context and no end game. The math was simple, yet painfully boring. And I never fully realized we were playing a game of Which One of These is Not Like the Other.

It wasn’t until my first college stats course that I realized descriptive statistics serve a purpose – to attempt to summarize important features of a variable or dataset. And mean, median, mode – the *measures of central tendency* – attempt to summarize the *typical* value of a variable. These measures of *typical* may help us draw conclusions about a specific group or compare different groups using one numerical value.

To check off that middle school homework, here’s what we were programmed to do:

Mean: Add the numbers up, divide by the total number of values in the set. *Also known as the arithmetic mean and informally called **the “average”.*

Median: Put the numbers in order from least to greatest (ugh, the worst part) and find the middle number. *Oh, there’s two middle numbers? Average them. Did you leave out a number? Start over.*

Mode: The number(s) that appear the most.

*Repeat until you finish the worksheet.*

Because we arrive at mean, median, and mode using different calculations, they summarize *typical* in different ways. The types of variables measured, the shape of the distribution, the context, and even the size of the set of data can alter the interpretation of each measure of central tendency.

We’re programmed to think in terms of an *arithmetic mean*, often dubbed the *average*; however, the geometric and harmonic means are extremely useful and worth your time to learn. Furthermore, when you want to weigh certain values in a dataset more than others, you’ll calculate a weighted mean. But for simplicity of this post, I will only use the *arithmetic mean* when I refer to the “mean” of a set of values.

Think of the mean as the balancing point of a distribution. That is, imagine you have a solid histogram of values and you must balance it on one finger. Where would you hold it? For all symmetric distributions the balancing point – the mean – is directly in the center.

Just like the median in the road (or, “neutral ground” if you’re from Louisiana), the median represents that middle value, cutting the set of values in half — 50% of the data values fall below and 50% lie above the median. No matter the shape of the distribution, the median is the measure of central tendency reflecting the middle position of the data values.

The mode describes the value or category in a set of data that appears the most often. The mode is specifically useful when asking questions about categorical (qualitative) variables. In fact, mode is the only appropriate measure of *typical* for categorical variables. For example: What is the most common college mascot? What type of food do college students typically eat? Where are most 4+ Year colleges and universities located?

Modes are also used to describe features of a distribution. In large sets of quantitative data, values are binned to create histograms. The taller “peaks” of the histogram indicate where more common data values cluster, called *modes*. A cluster of tall bins is sometimes called a *modal range*. A histogram having one tall peak is called *unimodal *while two peaks is referred to as *bimodal*. Multiple peaks = *multimodal*.

You may notice multiple tall peaks of varying heights in one histogram — despite some bins (and clusters of bins) containing fewer values, they are often described as *modes* or *modal ranges* since they contain *local maximums*.

The histogram above shows a distribution of heights for a sample of college females. The mean, median, and mode of this distribution are equal at about 66.5 inches. **When the shape of the distribution is symmetric and unimodal, the mean, median, and mode are equal.**

Now I want to see what happens when I add male heights into the histogram:

This histogram shows the distribution of heights of both male and female college students. It is symmetric, so the mean and median are equal at about 68.5 inches. But you’ll notice two peaks, indicating two modal ranges — one from 66 – 67 inches and another from 70 – 71 inches.

Do the mean and median represent the *typical* college student height when we are dealing with two distinctly different groups of students?

In a skewed distribution, the median remains the center of the values; however, the mean is pulled away from the median from extreme values and outliers.

For example, the histogram above shows the distribution of college enrollment numbers in the United States from 2013. The shape of the distribution is skewed to the right — that is, most colleges reported enrollment below 5,000 students. However, the “tail” of the distribution is created by a small number of larger universities reporting much higher enrollment. **These extreme outlying values pull the mean enrollment to the right of the median enrollment. **

Reporting an average enrollment of 7,070 students for colleges in 2013 exaggerates the *typical* college enrollment since most US colleges and universities reported enrollment *under* 5,000 students.

The median, on the other hand, is resistant to outliers since it is based on position relative to the rest of the data. The median helps you conclude that half of all colleges enrolled fewer than 3,127 students and half of the colleges enrolled more than 3,127 students.

Depending on your end goal and context, median may provide a better measure of *typical* for skewed set of data. Medians are typically used to report salaries and housing prices since these distributions include mostly moderate values and fewer on the extremely high end. Take a look at the salaries of NFL players, for example:

Are we to only report medians for skewed distributions?

- The median is not a good description of
*typical*for a very small dataset (eg, n<10, depending on context). - The median is helpful when you want to ignore (or lessen effects of) outliers. Of course, as Daniel Zvinca* points out, your data could contain significant outliers that you don’t want to ignore.

In school, our grades are reported as means. However, students’ grade distributions can be symmetric or skewed. Let’s say you’re a student with three test grades, 65, 68, 70. Then you make a 100 on the fourth test. The distribution of those 4 grades is skewed to the right with a mean of 75.8 and median of 69. Despite the shape of the distribution, you may argue for the mean in this situation. On the other hand, if you scored a 30 on the fourth test instead of 100, you’d argue for the median. With only 4 data points, the median is not a good description of *typical *so here’s hoping you have a teacher who understands the effects of outliers and drops your lowest test score.

Inserting my opinion: As a former teacher, I recognize that when averaging all student grades from an assignment or test, the result is often misleading. In this case, I believe the median is a better description of the typical student’s performance because extreme values usually exist in a class set of grades (very high or very low) and will affect the calculation of the mean. After each test in AP statistics, I would post the mean, median, 5 number summary and standard deviation for each class. It didn’t take long for students to draw the same conclusion.

**Ultimately, context can guide you in this decision of mean versus median but consider the existence of outliers and the distribution shape.**

By investigating a distribution’s physical features, students are able to connect the numbers with a story in the data. In quantitative data, *unusual features* can include outliers, clusters, gaps and “peaks”. Specifically, identifying causes of the multimodality of a distribution can build context behind the metrics you report.

When I investigated the distribution of college tuition, I expected the shape to appear skewed. I did not expect to find the smaller peak in the middle. So I filtered the data by type of college (public or private) and found two almost symmetric distributions of tuition:

The existence of the modes in this data makes it difficult to find a *typical* US college tuition; however, they did point to the existence of two different types of colleges mixed into the same data.

Now I’m not confident that one number would represent the *typical* college tuition in the U.S., though I can say, “The typical tuition for 4+ year colleges in the US for the 2013-14 school year was about $7,484 for public schools and $27,726 for private schools.”

Oh and did you notice the slight peaks on the right side of both private and public tuition distributions? Me too. Which prompted me to look deeper:

So here’s the thing: Summarizing a set of values for a variable with one numerical description of “center” can help simplify a reporting process and aid in comparisons of large sets of data. However, sometimes finding this measure proves difficult, impossible, or even misleading.

As I suggest to my students, visualizing the distribution of the variable, considering its context and exploring its physical features will add value to your overall analysis and possibly help you find an appropriate measure of typical.

*Special thank you to Daniel Zvinca for providing feedback for this post with his domain knowledge and extensive industry expertise.

]]>Since our primary audience tends to be those in data visualization, I used the regression output in Tableau to highlight the p-value in a test for regression towards the end. However, I spent the majority of the webinar discussing p-values in general because the logic of p-values applies broadly to all those tests you may or may not remember from school: t-tests, Chi-Square, z-tests, f-tests, Pearson, Spearman, ANOVA, MANOVA, MANCOVA, etc etc.

I’m dedicating the remainder of this post to some “rules” about statistical tests. If you consider publishing your research, you’ll be required to give more information about your data for researchers to consider your p-value meaningful. In the webinar, I did not dive into the assumptions and conditions necessary for a test for linear regression and it would be careless of me to leave it out of my blog. If you use p-values to drive decisions, please read on.

Cautions always come with statistical tests – those cautions do not fall solely on the p-value “cut-off” debate.

To publish your findings in a journal or use your research in a dissertation, the data must meet each condition/assumption before moving forward with the calculations and p-value interpretation, else the p-value is not meaningful.

Each statistical test comes with its own set of conditions and assumptions that justify the use of that test. Tests for Linear Regression have between 5 and 10 assumptions and conditions that must be met (depending on the type of regression and application).

Below I’ve listed a non-exhaustive list of common assumptions/conditions to check before running a test for linear regression (in no particular order).

- The independent and dependent variables are continuous variables.
- The two variables exhibit a linear relationship – check with scatterplot.
- No significant outliers present in the residual plot (AKA points with extremely large residuals) – check with residual plot.
- Observations are independent of each other (as in, the existence of one data point does not influence another) – test with Durbin-Watson statistic.
- The data shows homoscedasticity (which means the variances remains the same along entire line of best fit) – check the residual plot, then test with Levene’s or Brown-Forsythe’s tests.
- Normality – residuals must be approximately normally distributed – check using a histogram, normal probability plot of residuals. (In addition, a dissertation chair may require a Kolmogorov-Smirnov or Shapiro-Wilk test on the dependent and independent variables separately.) As sample size increases, this assumption may not be necessary thanks to the Central Limit Theorem.
- There is no correlation between the independent variable(s) and the residuals – check using a correlation matrix or variance inflation factor (VIF).

Note: Check with your publication and/or dissertation chair for complete list of assumptions and conditions for your specific situation.

Short answer: Yes. But I recommend learning how to interpret them and their limitations. Glancing over the list of assumptions above can give a good indication of how sensitive regression models are to outliers and outside variables. I’d also be hesitant to draw conclusions based on a p-value alone for small datasets.

I highly recommend looking at the residual plot (from webinar 1) to determine if your linear model is a good overall fit, keeping in mind the assumptions above. Here is a guide to creating a residual plot using Tableau.

]]>For simplicity, I hard-coded the residuals in the webinar by first calculating “predicted” values using Tableau’s least-squares regression model. Then, I created another calculated field for “residuals” by subtracting the observed and predicted y-values. Another option would use Tableau’s built in residual exporter. But what if you need a dynamic residual plot without constantly exporting the residuals?

Note: “least-squares regression model” is merely a nerdy way of saying “line of best fit”.

In this post I’ll show you how to create a dynamic residual plot without hard-coding fields or exporting residuals.

The formula for slope: [correlation] * ([std deviation of y] / [std deviation of x])

- correlation doesn’t mind which order you enter the variables (x,y) or (y,x)
- y over x in the calculation because “rise over run”
- be sure to use the “sample standard deviation”

The formula for y-intercept: Avg[y variable] – [slope] * Avg[x variable]

The formula for predicted y-variable = {[slope]} * [odometer miles] + {[y-intercept]}

- Here, we are using the linear equation,
*y*= m*x*+ b where*y*is the predicted dependent variable (output: predicted price)- m is the slope
*x*is the observed independent variable (input: odometer miles)- b is the y-intercept

- Since the slope and y-intercept will not change value for each odometer mile, but we need a new predicted output (y) for each odometer mile input (x), we use a level of detail calculation. Luckily the curly brackets tell Tableau to hold the slope and y-intercept values at their constant level for each odometer mile.

The formula for residuals: observed y – predicted y

Don’t forget to inspect your residual plot for clear patterns, large residuals (possible outliers) and obvious increases or decreases to variation around the center horizontal line. Decide if the model should be used for prediction purposes.

- The horizontal line in the middle is the least-squares regression line, shown in relation to the observed points.
- The residual plot makes it easier to see the amount of error in your model by “zooming in” on the liner model and the scatter of the points around/on it.
- Any obvious pattern observed in the residual plot indicates the linear model is not the best model for the data.

In the plot below, the residuals increase moving left to right. This means the error in predicting 4Runner price gets larger as the number of miles on the odometer increase. And this makes sense because we know more variables are affecting the price of the vehicle, especially as mileage increases. Perhaps this model is not effective in predicting vehicle price above 60K miles on the odometer.

To recap, here are the basic equations we used above:

For more on residual plots, check out The Minitab Blog.

]]>