Hypothesis tests (Error Types)

14 min readAug 2, 2021

Introduction

The concept of Type I and Type II errors and hypothesis formulation and testing now remember in the previous article we talked about what hypotheses are and how to set them up to pull data out of problems and then set up our null hypothesis and/or alternative hypothesis now remember the null hypothesis is what we assume to be true it’s the status quo it’s the information we are given or we again we assume to be the case the alternative is where we go next if we have to reject the null hypothesis so if we have to reject our assumption then we proceed on to the alternative now one thing we know by this point is that stats is never certain it’s never 100% perfect so the way this usually works as far as hypotheses go we set up our hypothesis.

Hypothesis testing occurs when statistical tests are used to determine whether an argument is true or false. The null hypothesis is the initial statement under consideration. Unless there is overwhelming evidence to the contrary, the null hypothesis is assumed to be true. One common example is assuming that two groups are statistically different from one another.

Steps for Experemintal analysis

we go out and collect the sample
we do some analysis on that sample
then reach some conclusion about our hypothesis so the question is does our conclusion from our analysis match the actual state of reality

So remember we’re basing that conclusion of some sample we took and then we either reject or fail to reject our null hypothesis but the question is does our conclusion match the actual state of reality now we know it’s not going to happen 100 percent of the time in that, in a nutshell, is Type I and Type II error does our conclusion and hypothesis test do our analysis match the actual state of reality now this is not just some academic distinction here in real life in real studies in real research the Type I and Type II errors can have literally life or death consequences and I’m not over exaggerating so in the example I’ll use in this article I’m going to show you that as startling as I can

Example for Type I and Type II error

This is a real-world example I call this the fire alarm hypothesis and unfortunately it’s something a lot of us have to deal with At some point in our lives or in something similar you know the, Then it goes something like this.

Let’s say you’re going down the hallway at work or school sort of doing your own thing everything is normal as usual but you encounter a sudden smell of smoke like something’s burning now

you know that may mean a serious fire is taking place in the building or it could be nothing serious maybe someone burned popcorn or something else in the microwave in the kitchen or cafeteria or wherever else you might be but the question is what do you do next you have to make some decision on what you’re going to do just take a look at what might happen if you think the smoky smell is nothing serious you may decide that your assumption that everything is normal is correct.

Therefore you will not pull the smoke alarm or the fire alarm so maybe someone burned something in the kitchen, of course, that’s annoying but it’s not a serious fire so you would just proceed on like everything was okay and of course, you will not pull the fire alarm for that now if you think the smoky smell is due to a serious fire you may reject your assumption that everything is normal and then you will pull the fire alarm so you have to reach some conclusion based on the evidence around you about the state of reality it’s either a not serious fire or it’s a very serious fire and then you have to decide

what you’re going to do so let’s look at what might go wrong in this situation so of course you smell the smoke and you think to yourself this is not one so you reject the assumption that everything is okay you reject your null hypothesis therefore you go ahead and pull the fire alarm so you rejected your null hypothesis that everything is normal and then you went ahead and pulled the fire alarm so

The building is evacuated and the fire department arrives to investigate now after the investigation it is determined that there was no serious fire you quote falsely pulled the fire alarm now here’s what we interpret that when you rejected your assumption that everything was okay so you look to the situation you said this is not normal I’m going to reject my assumption and everything is normal and then you proceeded so when he rejected your assumption that everything was okay when it really was okay because there was no fire you committed Type I error or in this case a false alarm.

So let’s talk about Type I error more formally, Typr I error is rejection of the assumption or rejection of the null hypothesis when it should not have been rejected so in the fire case you rejected your assumption that everything was normal but the reality was everything was okay so Type I error is rejecting the null hypothesis when it should not be rejected you can think of it as incorrectly rejecting the null hypothesis in this case a false alarm and again this is a real-world situation.

what else might go wrong let’s say you smell smoke and you think oh it’s probably someone who burned their lunch and the microwave no big deal so you go ahead and just proceed along your way

Therefore you did not reject your assumption that everything is okay you upheld your null hypothesis you just sort of concluded that’s not anything serious so you are not something serious so you went ahead and just went on your way but let’s say there is indeed a serious fire now no one is injured luckily but the entire building does burn to the ground so when you failed to reject your assumption that everything is okay when it really was not okay you committed type two error so let’s formally define this and then we’ll compare the two

So Type II error failure to reject the assumption or failure to reject the null hypothesis when it should have been rejected so incorrectly not rejecting the null hypothesis.

So think about Type I is when we reject the null hypothesis incorrectly Type II is when we fail to reject the null hypothesis incorrectly so in the first case we pull the fire alarm when there was no fire in the second case we did not pull the fire alarm even though there was a fire so again we’ll do some other examples but this sort of sets the stage for basically what Type I and Type II error.

Generally, real-world consequences of Type II error are much greater and more danger

Type I Errors — False Positives (Alpha)

There will almost always be a possibility of wrongly rejecting a null hypothesis when it should not have been rejected while performing hypothesis tests. Data scientists have the option of selecting an alpha (𝛼) confidence level threshold that they will use to accept or reject the null hypothesis. This confidence threshold, which is in other words a level of trust, is also the likelihood that you will reject the null hypothesis when it is actually valid. This case is a type I error, which is more generally referred to as a false positive.

In hypothesis testing, you need to decide what degree of confidence, or trust, for which you can dismiss the null hypothesis. If a scientist were to set alpha (𝛼) =.05, this means that there is a 5 percent probability that they would reject the null hypothesis when it is actually valid. Another way to think about this is that you would expect the hypothesis to be rejected once, simply by chance, if you repeated this experiment 20 times. Generally speaking, an alpha level of 0.05 is adequate to show that certain findings are statistically significant.

When you see a p-value that is less than your significance level, you get excited because your results are statistically significant. However, it could be a type I error. The supposed effect might not exist in the population. Again, there is usually no warning when this occurs.

Why do these errors occur?

It comes down to sample error. Your random sample has overestimated the effect by chance. It was the luck of the draw. This type of error doesn’t indicate that the researchers did anything wrong. The experimental design, data collection, data validity, and statistical analysis can all be correct, and yet this type of error still occurs.

The significance level is a standard that you set to determine whether your sample data are strong enough to reject the null hypothesis. Hypothesis tests define that standard using the probability of rejecting a null hypothesis that is actually true. You set this value based on your willingness to risk a false positive.

Using the significance level to set the Type I error rate

When the significance level is 0.05 and the null hypothesis is true, there is a 5% chance that the test will reject the null hypothesis incorrectly. If you set alpha to 0.01, there is a 1% false positive. If 5% is good, then 1% seems even better, right? As you’ll see, there is a trade-off between Type I and Type II errors. If you hold everything else constant, as you reduce the chance for a false positive, you increase the opportunity for a false negative.

Type II Errors: False Negatives

When you perform a hypothesis test and your p-value is greater than your significance level, your results are not statistically significant. That’s disappointing because your sample provides insufficient evidence for concluding that the effect you’re studying exists in the population. However, there is a chance that the effect is present in the population even though the test results don’t support it. If that’s the case, you’ve just experienced a Type II error. The probability of making a Type II error is known as beta (β).

What causes Type II errors?

Whereas Type I errors are caused by one thing, sample error, there are a host of possible reasons for Type II errors — small effect sizes, small sample sizes, and high data variability. Furthermore, unlike Type I errors, you can’t set the Type II error rate for your analysis. Instead, the best that you can do is estimate it before you begin your study by approximating the properties of the alternative hypothesis that you’re studying. When you do this type of estimation, it’s called power analysis.

To estimate the Type II error rate, you create a hypothetical probability distribution that represents the properties of a true alternative hypothesis. However, when you’re performing a hypothesis test, you typically don’t know which hypothesis is true, much less the specific properties of the distribution for the alternative hypothesis. Consequently, The true Type II error rate is usually unknown.

Type II errors and the power of the analysis

The Type II error rate (beta) is the probability of a false negative. Therefore, the inverse of Type II errors is the probability of correctly detecting an effect. Statisticians refer to this concept as the power of a hypothesis test. Consequently, 1 — β = the statistical power. Analysts typically estimate power rather than beta directly.

Why Don’t Statisticians Accept the Null Hypothesis?

To understand why we don’t accept the null, consider the fact that you can’t prove a negative. A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. It might exist, but your study missed it. That’s a huge difference and it is the reason for the convoluted wording. Let’s look at several analogies.

What Does Fail to Reject the Null Hypothesis Mean?

Accepting the null hypothesis would indicate that you’ve proven an effect doesn’t exist. As you’ve seen, that’s not the case at all. You can’t prove a negative! Instead, the strength of your evidence falls short of being able to reject the null. Consequently, we fail to reject it.

Failing to reject the null indicates that our sample did not provide sufficient evidence to conclude that the effect exists. However, at the same time, that lack of evidence doesn’t prove that the effect does not exist. Capturing all that information leads to convoluted wording!

What are the possible implications of failing to reject the null hypothesis? Let’s work through them.

First, it is possible that the effect truly doesn’t exist in the population, which is why your hypothesis test didn’t detect it in the sample. Makes sense, right? While that is one possibility, it doesn’t end there.

Another possibility is that the effect exists in the population, but the test didn’t detect it for a variety of reasons. These reasons include the following:

The sample size was too small to detect the effect.
The variability in the data was too high. The effect exists, but the noise in your data swamped the signal (effect).
By chance, you collected a fluky sample. When dealing with random samples, chance always plays a role in the results. The luck of the draw might have caused your sample not to reflect an effect that exists in the population.

Let’s visualize Type I and Type II errors and how they are related to each other.

we’re going to choose an alpha of 0.05 that’s called our significance level or our Type I error rate which we’ll talk about here in a bit but basically what we’re saying is that 95% of all the sample means we take are hypothesized to be in this blue region now remember our
null and alternative hypothesis so our null says if we take a sample and we take many samples and we have a sampling distribution that that distribution of samples should basically be right on top of this distribution that’s why the equal sign is there now the alternative says that that’s not the case.

So if we take samples and we have a sampling distribution that that sampling distribution will be either off to the side to the left below or off to the side to the right as compared to this hypothesized distribution let’s look at how that would actually work so let’s say we take our first sample and that’s
X bar sub 1 here that’s our first sample mean and as you can see it’s right there in the middle now how would this affect how we interpret our null and alternative hypothesis well in this case we would fail to reject the null hypothesis why would we reject the null hypothesis because that sample mean is right smack in the middle of our hypothesized mean so, therefore, it seems like the null hypothesis is supported it holds quote true so we would not reject it we would fail to reject the null hypothesis so let’s say we take another sample and it ends up right there what’s a little bit higher than sample one but it’s still within that blue region.

So again we would fail to reject the null hypothesis so maybe we take another sample and it ends up, here again, we would fail to reject the null hypothesis because it’s in that region sample four same story let’s say we come to sample five and look where that is well that is outside of our interval what would we do then, in that case, we would reject the null hypothesis so that’s so far outside of our hypothesized mean that we would reject the null hypothesis that this came from the same population as our hypothesized population there must be a difference between the two that’s how we take another sample six well again we would fail to reject that null hypothesis and sample seven we would fail to reject that one so what do we make of all these samples now we can say is probably that our actual population mean is about.

So there’s always that chance we’re going to take the oddball sample that is either well below or well above the hypothesized mean and in that case, we would end up rejecting the null hypothesis when we should not and that is Type I error so that is classic Type I error we reject the null hypothesis because by chance we get a sample that is too far away from the hypothesized population mean.

Now what about Type II error so we repeat the same process so here is a hypothesized population mean and we still have the same things we did before so now we take samples

in the previous picture we take 7 samples and all of them except sample 4 we can reject them but in sample 4 we fail to reject the null hypothesis when we should reject it.

So if we took a sample and it was by chance like sample four we would incorrectly accept the null hypothesis or fail to reject the null hypothesis and that is Type II error it’s the case when we should have rejected the null but we did not because again we got an oddball sample that in this case is sort of the higher end of the actual population and therefore it put it into that region and we failed to reject the null hypothesis when we should have now this is called beta and it is the probability of committing a Type II error now, unfortunately, it’s not as easy to compute as alpha is it contains it what varies depending on things like sample size and alpha level but just keep in mind that alpha our significance level is the probability of committing Type I error and beta is the probability of committing Type II error

let’s talk about the two-tailed test rejection region and one-tailed rejection region for both right and left tail so everything we have right now is the same as we have before our null is the same an alternative is the same and our alpha or significance level is the same so we’ll go ahead and put our curve here and what do we notice well our hypothesized population mean is there in the middle which we expect and we call this blue area the non-rejection region.

In one-tailed hypothesis testing the alpha value is in one side only while in two-tailed test we divide alpha value on both sides so each side equals to alpha /2

As you can see depending on the problem statement Type I error can be better than a Type II error or vice versa. The chances of committing these two types of errors are inversely proportional, decreasing Type I error rate increases Type II error rate, and vice versa. The risk of committing a Type I error is represented by your alpha level (p-value below which you reject the null hypothesis.). The commonly accepted α = .05 means that you will incorrectly reject the null hypothesis approximately 5% of the time. To decrease your chance of committing a Type I error, you can make your alpha (p) value more strict. Alternately, you can also increase the sample size.

Conclusion

Based on your use case/problem statement, Type I error can be better than Type II error or vice versa. Technically, it also depends a bit on the model’s overall accuracy. To control these types of errors, a variable alpha is used. Increasing the sample size can also reduce the risk and change the amount of these types of errors.

Reference

Brandon Foltz Playlist for Hypothesis Testing