Erika Rasure is globally-recognized as a leading consumer economics subject matter expert, researcher, and educator. She is a financial therapist and transformational coach, with a special interest in helping women learn how to invest.
Fact checked by Fact checked by Suzanne KvilhaugSuzanne is a content marketer, writer, and fact-checker. She holds a Bachelor of Science in Finance degree from Bridgewater State University and helps develop content strategies for financial brands.
Degrees of freedom are the maximum number of logically independent values, which may vary in a data sample. Degrees of freedom are calculated by subtracting one from the number of items within the data sample.
Degrees of freedom are the number of independent variables that can be estimated in a statistical analysis and tell you how many items can be randomly selected before constraints must be put in place.
Within a data set, some initial numbers can be chosen at random. However, if the data set must add up to a specific sum or mean, for example, the number in the data set is constrained to evaluate the values of all other values in a data set, then meet the set requirement.
Example 1: Consider a data sample consisting of five positive integers. The values of the five integers must have an average of six. If four items within the data set are , the fifth number must be 10. Because the first four numbers can be chosen at random, the degree of freedom is four.
Example 2: Consider a data sample consisting of five positive integers. The values could be any number with no known relationship between them. Because all five can be chosen at random with no limitations, the degree of freedom is four.
Example 3: Consider a data sample consisting of one integer. That integer must be odd. Because there are constraints on the single item within the data set, the degree of freedom is zero.
The formula to determine degrees of freedom is:
D f = N − 1 where: D f = degrees of freedom N = sample size \begin &\text_\text = N - 1 \\ &\textbf \\ &\text_\text = \text \\ &N = \text \\ \end D f = N − 1 where: D f = degrees of freedom N = sample size
For example, imagine a task of selecting ten baseball players whose batting average must average to .250. The total number of players that will make up our data set is the sample size, so N = 10. In this example, 9 (10 - 1) baseball players can be randomly picked, with the 10th baseball player having a specific batting average to adhere to the .250 batting average constraint.
Some calculations of degrees of freedom with multiple parameters or relationships use the formula Df = N - P, where P is the number of different parameters or relationships. For example, in a 2-sample t-test, N - 2 is used because there are two parameters to estimate.
In statistics, degrees of freedom define the shape of the t-distribution used in t-tests when calculating the p-value. Depending on the sample size, different degrees of freedom will display different t-distributions. Calculating degrees of freedom is critical when understanding the importance of a chi-square statistic and the validity of the null hypothesis.
Degrees of freedom also have conceptual applications outside of statistics. Consider a company deciding the purchase of raw materials for its manufacturing process. The company has two items within this data set: the amount of raw materials to acquire and the total cost of the raw materials.
The company freely decides one of the two items, but their choice will dictate the outcome of the other. Because it can only freely choose one of the two, it has one degree of freedom in this situation. If the company decides the amount of raw materials, it cannot decide the total amount spent. By setting the total amount to spend, the company may be limited in the amount of raw materials it can acquire.
There are two different kinds of chi-square tests: the test of independence, which asks a question of relationship, such as, "Is there a relationship between gender and SAT scores?"; and the goodness-of-fit test, which asks something like "If a coin is tossed 100 times, will it come up heads 50 times and tails 50 times?"
For these tests, degrees of freedom are utilized to determine if a null hypothesis can be rejected based on the total number of variables and samples within the experiment. For example, when considering students and course choice, a sample size of 30 or 40 students is likely not large enough to generate significant data. Getting the same or similar results from a study using a sample size of 400 or 500 students is more valid.
To perform a t-test, you must calculate the value of t for the sample and compare it to a critical value. The critical value will vary, and you can determine the correct critical value by using a data set's t distribution with the degrees of freedom.
Sets with lower degrees of freedom have a higher probability of extreme values, and higher degrees of freedom, such as a sample size of at least 30, will be much closer to a normal distribution curve. Smaller sample sizes will correspond with smaller degrees of freedom and result in fatter t-distribution tails.
In the examples above, many of the situations may be used as a 1-sample t-test. For instance, 'Example 1,' where five values are selected but must add up to a specific average, can be defined as a 1-sample t-test. This is because there is only one constraint being placed on the variable.
The earliest and most basic concept of degrees of freedom was noted in the early 1800s, intertwined in the works of mathematician and astronomer Carl Friedrich Gauss. The modern usage and understanding of the term were expounded upon first by William Sealy Gosset, an English statistician, in his article "The Probable Error of a Mean," published in Biometrika in 1908 under a pen name to preserve his anonymity.
In his writings, Gosset did not specifically use the term "degrees of freedom." He did explain the concept throughout developing what would eventually be known as "Student’s T-distribution." The term was not popular until 1922. English biologist and statistician Ronald Fisher began using the term "degrees of freedom" when he published reports and data on his work developing chi-squares.
When determining the mean of a set of data, degrees of freedom are calculated as the number of items within a set minus one. This is because all items within that set can be randomly selected until one remains; that one item must conform to a given average.
Degrees of freedom tell you how many units within a set can be selected without constraints to still abide by a given rule overseeing the set. For example, consider a set of five items that add to an average value of 20. Degrees of freedom tell you how many of the items (4) can be randomly selected before constraints must be put in place. In this example, once the first four items are picked, you no longer have the liberty to randomly select a data point because you must "force balance" to the given average.
Degrees of freedom are always the number of units within a given set minus 1. It is always minus one because, if parameters are placed on the data set, the last data item must be specific so all other points conform to that outcome.
Some statistical analysis processes may call for an indication of the number of independent values that can vary within an analysis to meet constraint requirements. This indication is the degrees of freedom, the number of units in a sample size that can be chosen randomly before a specific value must be picked.