Understanding the fundamental formulas of statistics is essential for students, data analysts, researchers, and professionals who work with data. A well-organized statistics formula sheet serves as a quick reference guide, simplifying complex calculations and helping users grasp core concepts efficiently. Whether you're preparing for exams, conducting research, or analyzing data, mastering these formulas ensures accuracy and confidence in your statistical work. This comprehensive guide covers the essential formulas in descriptive statistics, probability, inferential statistics, and more, providing a valuable resource to enhance your statistical knowledge.
---
1. Descriptive Statistics Formulas
Descriptive statistics involve summarizing and describing the main features of a dataset. Key measures include measures of central tendency, dispersion, and position.
1.1 Measures of Central Tendency
These formulas help identify the typical or average value in a dataset.
- Mean (Average):
\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
Where:
- \( x_i \) = individual data point
- n = number of data points
- Median:
The middle value when data points are ordered from smallest to largest. For an odd number of observations:
- \( \text{Median} = x_{( \frac{n+1}{2} )} \)
For an even number of observations:
- \( \text{Median} = \frac{x_{( \frac{n}{2})} + x_{( \frac{n}{2} + 1)}}{2} \)
- Mode:
The value that appears most frequently in the dataset.
1.2 Measures of Dispersion
These quantify the spread or variability within a dataset.
- Range:
\[ \text{Range} = x_{max} - x_{min} \]
Where:
- \( x_{max} \) = maximum value
- \( x_{min} \) = minimum value
- Variance:
Population variance (\( \sigma^2 \)):
\[ \sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2 }{N} \]
Sample variance (\( s^2 \)):
\[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2 }{n-1} \]
Where:
- \( N \) = population size
- \( n \) = sample size
- \( \mu \) = population mean
- Standard Deviation:
\[ \sigma = \sqrt{\sigma^2} \]
for population, and
\[ s = \sqrt{s^2} \]
for sample.
- Quartiles: Divides data into four equal parts.
- Q1 (First Quartile): 25th percentile
- Q2 (Median): 50th percentile
- Q3 (Third Quartile): 75th percentile
- Interquartile Range (IQR):
\[ \text{IQR} = Q_3 - Q_1 \]
Measures the middle 50% spread of the data. - Probability of an Event:
\[ P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}} \]
- Complement Rule:
\[ P(\text{not } E) = 1 - P(E) \]
- Addition Rule: For mutually exclusive events:
- \[ P(A \cup B) = P(A) + P(B) \]
For non-mutually exclusive events:
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]
- Multiplication Rule: For independent events:
- \[ P(A \cap B) = P(A) \times P(B) \]
For dependent events:
\[ P(A \cap B) = P(A) \times P(B|A) \]
Where \( P(B|A) \) is the probability of B given A. - Binomial Distribution:
Used for fixed number of independent Bernoulli trials.
\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]
Where:
- \( n \) = number of trials
- \( k \) = number of successes
- \( p \) = probability of success in each trial
- Normal Distribution:
Continuous probability distribution characterized by mean (\( \mu \)) and standard deviation (\( \sigma \)). The probability density function:
\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{ - \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 } \]
- Confidence Interval for Population Mean (when population standard deviation is known):
\[ \bar{x} \pm Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
Where:
- \( Z_{\alpha/2} \) = Z-value corresponding to confidence level
- Confidence Interval for Population Mean (when population standard deviation is unknown, using t-distribution):
\[ \bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}} \]
Where:
- \( t_{\alpha/2, n-1} \) = t-value for confidence level and degrees of freedom
- Z-test (for large samples or known \( \sigma \)):
Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]
Where:
- \( \mu_0 \) = hypothesized population mean
- T-test (for small samples or unknown \( \sigma \)):
t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]
- Chi-square Test for Independence:
\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]
Where:
- O = observed frequency
- E = expected frequency
- \( x_i, y_i \) = paired data points
- \( \bar{x}, \bar{y} \) = means of x and y
1.3 Measures of Position
These help understand the location of data points within the distribution.
---
2. Probability Basics and Formulas
Probability is the foundation of inferential statistics, measuring the likelihood of events.
2.1 Basic Probability Rules
2.2 Probability Distributions
---
3. Inferential Statistics Formulas
Inferential statistics involve making predictions or generalizations about a population based on sample data.
3.1 Confidence Intervals
3.2 Hypothesis Testing
---
4. Correlation and Regression
These formulas help analyze relationships between variables.
4.1 Correlation Coefficient (Pearson's r)
Measures linear correlation between two variables:
\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \times \sum (y_i - \bar{y})^2}} \]
Where:
4.2 Regression Line Equation
Frequently Asked Questions
What is the formula for calculating the mean in basic statistics?
The mean is calculated by summing all data points and dividing by the number of data points: Mean (μ) = (Σx) / n.
How do you compute the median in a data set?
To find the median, order the data from smallest to largest and identify the middle value. If the number of observations is even, take the average of the two middle values.
What is the formula for variance in statistics?
Variance is calculated as: Variance (σ²) = Σ(xᵢ - μ)² / n for population, or Σ(xᵢ - x̄)² / (n - 1) for a sample.
How is the standard deviation related to variance?
Standard deviation is the square root of variance: σ = √σ².
What is the formula for calculating the probability of an event?
Probability (P) of an event = (Number of favorable outcomes) / (Total number of outcomes).
How do you calculate the range in a data set?
Range = Maximum value - Minimum value.
What is the formula for the z-score in standardization?
Z-score = (X - μ) / σ, where X is the value, μ is the mean, and σ is the standard deviation.
How is the coefficient of variation calculated?
Coefficient of variation (CV) = (Standard deviation / Mean) × 100%, used to compare variability between datasets.
What is the formula for the sample size in estimating a population mean?
Sample size n = (Z² × σ²) / E², where Z is the Z-score for confidence level, σ² is variance, and E is the margin of error.
How do you compute the probability of the union of two independent events?
P(A ∪ B) = P(A) + P(B) - P(A) ∩ P(B). For independent events, P(A ∩ B) = P(A) × P(B).