Hypothesis Testing

7 min readJul 29, 2021

When using machine learning, we need to be able to trust our models and the predictions they make. We may use sample data to train our models. This sample data may make certain assumptions about a population.

Yet, if we have no way to test whether the assumptions represent a whole population or not, we will struggle to tell if our results are due to any statistical significance or just chance.

Statistical vs. Machine learning Hypothesis

Even though most of the concepts we will cover in this article are predominantly statistical, it is important to understand how the term hypothesis is perceived from either a purely statistical or machine learning perspective.

When carrying out statistical hypothesis tests, we attempt to calculate the critical value, which shall be covered later. We can refer to this critical value as an effect. The interpretation of the critical value is significant.

It determines the likelihood of observing the effect if observations do not have a relationship. The suggestion that the effect is real comes about if the likelihood mentioned above is minute. If the likelihood is large, the effect is likely not to be real.

In statistical hypothesis testing, there is no comment on the size of the effect. These tests are concerned with how likely the effect is present or absent in the population in consideration. This is based on the observed data samples.

Statistical hypotheses are thus based on identifying the relationships between observations. They are probabilistic explanations of these relationships.

Null and alternative hypotheses are denoted as H0andHaH0andHa, respectively.

In machine learning, a hypothesis involves approximating a target function and the performing of mappings of inputs to outputs. This approximation is known as function approximation. We approximate an unknown target function, which we assume exists.

This target function should best carry out the mapping of inputs to outputs on all possible observations existing in the problem domain. The notation in this context is (h) for hypothesis and (H) for a hypothesis set. To better understand a hypothesis in machine learning, this post will be of use.

Steps to test a hypothesis

A hypothesis test evaluates two statements about a population. The statements are mutually exclusive. The test concludes which statement best reflects the sample data. A hypothesis test helps us determine the statistical significance of a finding.

We say a finding is statistically significant when its likelihood of occurrence is very low, given the null hypothesis. This section describes the steps to test a hypothesis as we define the concepts involved in the testing process.

Establish hypotheses

The first step in testing a hypothesis is first defining the hypothesis. This is done by establishing both a null and alternative hypothesis. A null hypothesis can be thought of as a statement claiming no relationship between two measured events. It is an assumption made, which may be based on domain experience.

Scientists carry out experiments to retain or reject a null hypothesis based upon the nature of (or lack of) the relationship between occurrences. A null hypothesis is usually considered to be true until proven otherwise.

It is denoted as H0H0.

On the other hand, an alternative hypothesis results from the experiment that we hope to show. We want the alternative hypothesis to be true. It is the hypothesis that is the alternate of the null hypothesis. The image below shall aid in the understanding of these two types of hypotheses.

Significance Level

It’s the degree of significance within which we accept or reject the null hypothesis. 100% accuracy is not possible for accepting or rejecting a hypothesis, therefore we select a level of significance that is usually 5%.
This is usually denoted with alpha and generally, it is 0.05 or 5%, which suggests your output ought to be 95% confident to present a similar kind of result in each sample.

Type I error: When we reject the null hypothesis, though that hypothesis was true. Type I error is denoted by alpha. In hypothesis testing, the normal curve that represents the critical region is known as the alpha region.
Type II error: When we accept the null hypothesis but it is false. Type II error is denoted by beta. In hypothesis testing, the normal curve that represents the acceptance region is known as the beta region.

P-value

The P-value or calculated probability is the probability of finding the observed or more extreme results when the null hypothesis (H 0) of a study question is true — the definition of ‘extreme’ depends on how the hypothesis is being tested.

If your P-value is smaller than the chosen significance level, then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis.

Example: You have a coin and you don’t know whether that is fair or tricky so let’s decide the null and alternate hypothesis

H0: a coin is a fair coin.

H1: a coin is a tricky coin and alpha = 5% or 0.05

Now let’s toss the coin and calculate the p-value (probability value).

Toss a coin 1st time and the result is tail- P-value = 50% (as head and tail have equal probability)

Toss a coin 2nd time and result is tail, now p-value = 50/2 = 25%

and similarly, we tossed 6 consecutive times and got the result as P-value = 1.5% but we set our significance level as 95% means 5% error rate we allow, and here we can see that we are beyond that level i.e. our null- hypothesis does not hold good so we need to reject and propose that this coin is a tricky coin which is actually.

T-test

The t-test is defined as the statistical test that examines whether the population means of two samples greatly differ from one another, using t-distribution which is used when the standard deviation is not known and the sample size is small. It is a tool to analyze whether the two samples are drawn from the same population.

The test is based on a t-statistic that assumes that variable is normally distributed (symmetric bell-shaped distribution), mean is known and population variance is calculated from the sample.

The t-test is one in all tests used to aim the hypothesis testing in statistics.

Calculating a t-test requires three key data values. They include the difference between the mean values from each data set (called the mean difference), the standard deviation of each group, and the number of data values of each group.

ANOVA

Analysis of variance (ANOVA) can be defined as the statistical technique which is used to check if the means of two or more groups are significantly different from each other by analyzing variance. ANOVA checks the impact of one or more factors by comparing the means of various samples.

Another measure to compare the samples is named the t-test. When we have only two samples, t-test and ANOVA give the same results. However, using a t-test would not be reliable in cases where there are more than two samples because when you conduct multiple t-tests, you increase the chances of false positives.

The analysis of variance or ANOVA is a statistical inference test that lets you compare multiple groups at the same time.

Types of ANOVA

1. One-way ANOVA

One-way ANOVA is a hypothesis test within which only one categorical variable or single factor is taken into consideration. With the help of F-distribution, it enables us to compare the means of three or more samples. The Null hypothesis (H 0) is the equity in all population means while an Alternative hypothesis is a difference in at least one mean.

2. Two-way ANOVA

Two-way ANOVA examines the result of two independent factors on a dependent variable. It also studies the inter-relationship between independent variables influencing the values of the dependent variable, if any.

Compare p-value to the significance level to retain or reject the null hypothesis

To know whether to keep or reject the null hypothesis, we can compare our significance level to the p-value. Let’s assume our significance level is 5% (or 0.05). The smaller the p-value, the greater the evidence is favoring the alternative hypothesis.

If the p-value is less than the significance level we selected, we then reject the null hypothesis. This means that if the p-value is less than our 0.05 significance level, we accept that the sample we used supports the alternative hypothesis.