T-test vs. Z-test: When to Use Each

Tutorials

As a data science professional, you must often analyze, test, and establish relationships between variables in a dataset to draw meaningful conclusions. A concept called hypothesis testing, along with several tests, including t-tests and z-tests, are some of the commonly used tools in analytics to establish relationships between data points.

This tutorial will teach you the difference between a t-test and a Z-test with real examples. I will also provide additional resources for further learning.

A Quick Summary: t-tests vs. Z-tests

Choosing between a t-test and a Z-test can be summarized with these guidelines:

Use a t-test: When the sample size is small (n < 30) and/or the population variance is unknown.
Use a Z-test: When the sample size is large (n ≥ 30) and the population variance is known.

In both cases, we expect the data to be normally distributed. Read on to learn about each of the tests and their differences in detail. First, we will start with a quick introduction to hypothesis testing.

An Introduction to Hypothesis Testing

Hypothesis testing is a fundamental statistical method for inferring population parameters based on sample data. It provides a structured approach for evaluating claims or assumptions about a population using empirical evidence.

At the core of hypothesis testing are two complementary statements:

The null hypothesis (H₀) is a statement of no effect, difference, or relationship. It represents the status quo or the current understanding.
The alternative hypothesis (H₁) is a statement that contradicts the null hypothesis. It represents the claim or the new understanding that the researcher wants to prove.

For example, suppose you want to determine if a new teaching method improves student test scores. You might form the following hypotheses:

Null hypothesis (H₀): The new teaching method has no effect on student test scores.
Alternative hypothesis (H₁): The new teaching method improves student test scores.

Hypothesis testing involves collecting sample data, calculating test statistics, and determining the probability of observing such results if the null hypothesis is true. Based on this probability, we can decide whether to reject the null hypothesis in favor of the alternative or fail to reject it.

Depending on the data types and research questions tested, several statistical tests are available for hypothesis testing. In this tutorial, we will focus on the t-test and Z-test.

What is a t-test?

A t-test is a statistical test used to determine whether there is a significant difference between the means of two groups or between a sample mean and a known value. It is particularly useful when dealing with small sample sizes or when the population standard deviation is unknown.

The t-test statistic for a one sample t-test is calculated using the formula:

t-test Equation. Image by Author.

where:

Xˉ is the sample mean
μ is the population mean (or the mean of the comparison group)
s is the sample standard deviation, and
n is the sample size.

Types of t-tests

There are three main types of t-tests. Each compares means under different conditions:

One-Sample t-test: This test compares the mean of a single sample to a known value or population mean. It determines if the sample mean significantly deviates from a specific benchmark. For example, we can use a one-sample t-test to evaluate whether the average test score of a small class differs from the national average.
Independent Two-Sample t-test: This test compares the means of two independent groups to determine if there is a statistically significant difference between them. It is commonly used in experiments where two groups undergo different treatments or conditions. For instance, we could use an independent two-sample t-test to compare test scores between students taught using two different teaching methods to see if one method is more effective.
Paired t-test: This test compares means from the same group at different times or under different conditions. It evaluates whether there is a significant change within the same group after an intervention or over time. An example is measuring student performance before and after implementing a new teaching strategy to assess its impact.

Assumptions of the t-test

The t-test relies on certain assumptions to provide valid results:

Normality of the Data: The t-test assumes that the data in each group are approximately normally distributed. This is especially important when dealing with small sample sizes. If the data are not normally distributed, the t-test results may be unreliable.
Homogeneity of Variances: For an independent two-sample t-test, the variances of the two groups being compared are assumed to be equal. This assumption ensures that the t-test correctly accounts for variability within each group. If the variances are not equal, it can affect the accuracy of the test.
Independence of Observations: The observations within each group should be independent. This means that the value of one observation should not influence or be related to the value of another observation. Violation of this assumption can lead to incorrect conclusions.

It is important to check these assumptions before applying the t-test in any analysis to ensure the validity of the results. Read our T-tests in R Tutorial or our Introduction to Python T-Tests to learn how to conduct t-tests in R or Python.

What is a Z-test?

A Z-test is a statistical test used to determine whether there is a significant difference between the sample mean and the population mean or between the means of two groups when the population variance is known, and the sample size is large.

It is primarily used when the sample size exceeds 30, allowing the use of the normal distribution to approximate the distribution of the test statistic.

The Z-test statistic for a one-sample Z-test is calculated using the formula:

Z-test Equation. Image by Author.

where:

Xˉ is the sample mean,
μ is the population mean,
σ is the population standard deviation, and
n is the sample size.

Types of Z-tests

There are three main types of Z-tests:

One-Sample Z-test: This test compares the mean of a single sample to a known population mean. It is used when you want to assess whether the sample mean significantly deviates from the population mean, assuming the population variance is known. For example, a one-sample z-test might be used to determine if the average height of a group of more than 30 people differs from the known national average height.
Two-Sample Z-test: This test compares the means of two independent samples to determine if there is a significant difference between them. It is used when both samples are large and the population variances are known. An example of this would be comparing the average test scores of students from two different schools to see if there is a significant difference in performance between the two schools.
Proportion Z-test: This test compares the proportion of a certain characteristic in a sample to a known population proportion or between two sample proportions. It is used to evaluate whether the observed proportion in the sample significantly differs from what is expected based on the population proportion. For instance, a proportion Z-test might be used to compare the proportion of voters favoring a particular candidate in a sample to the proportion observed in previous elections.

There are additional variations of the test, such as the paired Z-test, the Z-test for regression coefficients, and the Z-test for differences in means.

Assumptions of the Z-test

The Z-test relies on certain assumptions to provide valid results:

Known Population Variance: The Z-test assumes that the population variance is known. This is a key distinction from the t-test, where the population variance is typically unknown. The known variance allows for using the z-distribution to assess the significance of the test statistic.
Large Sample Size: The Z-test assumes a large sample size, typically greater than 30. With larger samples, the sampling distribution of the sample mean approaches a normal distribution, even if the original data are not normally distributed, according to the Central Limit Theorem.
Normal Distribution of the Population: The data are assumed to be drawn from a normally distributed population. This assumption is less critical for large samples but still important when the sample size is moderate.

Key Differences Between t-tests and Z-tests

The t-test and Z-test are used to compare sample statistics to population parameters, but they differ in their underlying assumptions, applications, and the conditions under which they are most appropriate. Let us analyze and understand the differences between the two tests:

Sample size considerations

t-test: The t-test is typically used when the sample size is small, generally less than 30. It is designed to be robust when the sample size does not meet the threshold needed for applying the Central Limit Theorem.
Z-test: The Z-test is used when the sample size is large, typically greater than 30. In large samples, the sampling distribution of the mean is approximately normal, which justifies using the Z-test.

Population variance knowledge

t-test: The t-test is used when the population variance is unknown. Instead of the population variance, the sample variance is used to calculate the test statistic. The t-distribution, which has heavier tails than the normal distribution, accounts for the additional uncertainty due to estimating the population variance.
Z-test: The Z-test requires that the population variance is known. This is a key assumption because it allows the use of the standard normal distribution to calculate the test statistic. When the population variance is known, the Z-test provides more precise estimates.

Distribution assumptions

t-test: The t-test assumes that the data within each group are approximately normally distributed. This is particularly important when dealing with small sample sizes. The test statistic in a t-test follows a t-distribution, which has wider tails than the normal distribution. This accounts for the additional variability and uncertainty when estimating the population standard deviation from a small sample.
Z-test: The Z-test assumes that the data are normally distributed or that the sample size is large enough to apply for the Central Limit Theorem. The Central Limit Theorem ensures that, for large samples, the sampling distribution of the mean is approximately normal, even if the underlying data are not perfectly normal.

Practical applications and use cases

t-test: The t-test is commonly used in small-sample studies, such as pilot studies, where the population variance is unknown. Examples include comparing the effectiveness of two treatments in a small group or assessing changes within the same group over time.
Z-test: The Z-test is used in large-sample studies or when dealing with well-established populations where the variance is known. It is often applied in quality control, survey analysis, and large-scale experimental studies.

Here is table with the key differences:

Key differences between t-test and Z-test. Image by Author.

Conclusion

This tutorial introduced you to hypothesis testing and two commonly used tests—t-tests and z-tests. We also learned each test’s definitions, different types, and assumptions and further understood their key differences. We concluded which test is best to be used in which scenario, thus enabling you to establish relationships between variables confidently through hypothesis testing.

After solidifying the statistical concepts behind hypothesis testing with our Introduction to Statistics course, I would encourage you to implement these concepts through any of the popular technologies through the following resources:

Hypothesis Testing in Python course
Hypothesis Testing in R course
Hypothesis Testing (chi-square test) in Excel tutorial

Happy learning!

Source:
https://www.datacamp.com/tutorial/t-test-vs-z-test