Basic Statistics Statistics Formula Sheet

basic statistics statistics formula sheet

Understanding the fundamental formulas of statistics is essential for students, data analysts, researchers, and professionals who work with data. A well-organized statistics formula sheet serves as a quick reference guide, simplifying complex calculations and helping users grasp core concepts efficiently. Whether you're preparing for exams, conducting research, or analyzing data, mastering these formulas ensures accuracy and confidence in your statistical work. This comprehensive guide covers the essential formulas in descriptive statistics, probability, inferential statistics, and more, providing a valuable resource to enhance your statistical knowledge.

---

1. Descriptive Statistics Formulas

Descriptive statistics involve summarizing and describing the main features of a dataset. Key measures include measures of central tendency, dispersion, and position.

1.1 Measures of Central Tendency

These formulas help identify the typical or average value in a dataset.

Mean (Average):

\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
Where:
- \( x_i \) = individual data point
- n = number of data points

Median:

The middle value when data points are ordered from smallest to largest. For an odd number of observations:
- \( \text{Median} = x_{( \frac{n+1}{2} )} \)
For an even number of observations:
- \( \text{Median} = \frac{x_{( \frac{n}{2})} + x_{( \frac{n}{2} + 1)}}{2} \)

Mode:

The value that appears most frequently in the dataset.

1.2 Measures of Dispersion

These quantify the spread or variability within a dataset.

Range:

\[ \text{Range} = x_{max} - x_{min} \]
Where:
- \( x_{max} \) = maximum value
- \( x_{min} \) = minimum value

Variance:

Population variance (\( \sigma^2 \)):
```
 \[ \sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2 }{N} \]
```
Sample variance (\( s^2 \)):
```
 \[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2 }{n-1} \]
```
Where:
- \( N \) = population size
- \( n \) = sample size
- \( \mu \) = population mean

Standard Deviation:

 \[ \sigma = \sqrt{\sigma^2} \]

for population, and

 \[ s = \sqrt{s^2} \]

for sample.

1.3 Measures of Position

Quartiles: Divides data into four equal parts.
- Q1 (First Quartile): 25th percentile
- Q2 (Median): 50th percentile
- Q3 (Third Quartile): 75th percentile

Interquartile Range (IQR):

\[ \text{IQR} = Q_3 - Q_1 \]
Measures the middle 50% spread of the data.

2. Probability Basics and Formulas

2.1 Basic Probability Rules

Probability of an Event:

\[ P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}} \]

Complement Rule:

\[ P(\text{not } E) = 1 - P(E) \]

Addition Rule: For mutually exclusive events:
- \[ P(A \cup B) = P(A) + P(B) \]
For non-mutually exclusive events:
```
 \[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]
```

Multiplication Rule: For independent events:
- \[ P(A \cap B) = P(A) \times P(B) \]
For dependent events:
```
 \[ P(A \cap B) = P(A) \times P(B|A) \]
```
Where \( P(B|A) \) is the probability of B given A.

2.2 Probability Distributions

Binomial Distribution:

Used for fixed number of independent Bernoulli trials.

 \[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

Where:



\( n \) = number of trials


\( k \) = number of successes


\( p \) = probability of success in each trial

Normal Distribution:

Continuous probability distribution characterized by mean (\( \mu \)) and standard deviation (\( \sigma \)). The probability density function:
```
 \[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{ - \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 } \]
```

3. Inferential Statistics Formulas

3.1 Confidence Intervals

Confidence Interval for Population Mean (when population standard deviation is known):

 \[ \bar{x} \pm Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]

Where:



\( Z_{\alpha/2} \) = Z-value corresponding to confidence level

Confidence Interval for Population Mean (when population standard deviation is unknown, using t-distribution):
```
 \[ \bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}} \]
```
Where:
- \( t_{\alpha/2, n-1} \) = t-value for confidence level and degrees of freedom

3.2 Hypothesis Testing

Z-test (for large samples or known \( \sigma \)):

 Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

Where:



\( \mu_0 \) = hypothesized population mean

T-test (for small samples or unknown \( \sigma \)):
```
 t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]
```

Chi-square Test for Independence:

 \[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Where:



O = observed frequency


E = expected frequency

4. Correlation and Regression

4.1 Correlation Coefficient (Pearson's r)

 \[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \times \sum (y_i - \bar{y})^2}} \]

Where:



\( x_i, y_i \) = paired data points


\( \bar{x}, \bar{y} \) = means of x and y

4.2 Regression Line Equation

Frequently Asked Questions

What is the formula for calculating the mean in basic statistics?

The mean is calculated by summing all data points and dividing by the number of data points: Mean (μ) = (Σx) / n.

How do you compute the median in a data set?

To find the median, order the data from smallest to largest and identify the middle value. If the number of observations is even, take the average of the two middle values.

What is the formula for variance in statistics?

Variance is calculated as: Variance (σ²) = Σ(xᵢ - μ)² / n for population, or Σ(xᵢ - x̄)² / (n - 1) for a sample.

How is the standard deviation related to variance?

Standard deviation is the square root of variance: σ = √σ².

What is the formula for calculating the probability of an event?

Probability (P) of an event = (Number of favorable outcomes) / (Total number of outcomes).

How do you calculate the range in a data set?

Range = Maximum value - Minimum value.

What is the formula for the z-score in standardization?

Z-score = (X - μ) / σ, where X is the value, μ is the mean, and σ is the standard deviation.

How is the coefficient of variation calculated?

Coefficient of variation (CV) = (Standard deviation / Mean) × 100%, used to compare variability between datasets.

What is the formula for the sample size in estimating a population mean?

Sample size n = (Z² × σ²) / E², where Z is the Z-score for confidence level, σ² is variance, and E is the margin of error.

How do you compute the probability of the union of two independent events?

P(A ∪ B) = P(A) + P(B) - P(A) ∩ P(B). For independent events, P(A ∩ B) = P(A) × P(B).