Basic Statistics Statistics Formula Sheet

Advertisement

basic statistics statistics formula sheet

Understanding the fundamental formulas of statistics is essential for students, data analysts, researchers, and professionals who work with data. A well-organized statistics formula sheet serves as a quick reference guide, simplifying complex calculations and helping users grasp core concepts efficiently. Whether you're preparing for exams, conducting research, or analyzing data, mastering these formulas ensures accuracy and confidence in your statistical work. This comprehensive guide covers the essential formulas in descriptive statistics, probability, inferential statistics, and more, providing a valuable resource to enhance your statistical knowledge.

---

1. Descriptive Statistics Formulas



Descriptive statistics involve summarizing and describing the main features of a dataset. Key measures include measures of central tendency, dispersion, and position.

1.1 Measures of Central Tendency


These formulas help identify the typical or average value in a dataset.


  1. Mean (Average):

    \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
    Where:

    • \( x_i \) = individual data point

    • n = number of data points




  2. Median:

    The middle value when data points are ordered from smallest to largest. For an odd number of observations:

    • \( \text{Median} = x_{( \frac{n+1}{2} )} \)


    For an even number of observations:

    • \( \text{Median} = \frac{x_{( \frac{n}{2})} + x_{( \frac{n}{2} + 1)}}{2} \)




  3. Mode:

    The value that appears most frequently in the dataset.



1.2 Measures of Dispersion


These quantify the spread or variability within a dataset.


  1. Range:

    \[ \text{Range} = x_{max} - x_{min} \]
    Where:

    • \( x_{max} \) = maximum value

    • \( x_{min} \) = minimum value




  2. Variance:

    Population variance (\( \sigma^2 \)):
     \[ \sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2 }{N} \]

    Sample variance (\( s^2 \)):
     \[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2 }{n-1} \]

    Where:

    • \( N \) = population size

    • \( n \) = sample size

    • \( \mu \) = population mean




  3. Standard Deviation:
     \[ \sigma = \sqrt{\sigma^2} \]
    for population, and
     \[ s = \sqrt{s^2} \]
    for sample.



  4. 1.3 Measures of Position


    These help understand the location of data points within the distribution.


    1. Quartiles: Divides data into four equal parts.

      • Q1 (First Quartile): 25th percentile

      • Q2 (Median): 50th percentile

      • Q3 (Third Quartile): 75th percentile




    2. Interquartile Range (IQR):

      \[ \text{IQR} = Q_3 - Q_1 \]
      Measures the middle 50% spread of the data.



    ---

    2. Probability Basics and Formulas



    Probability is the foundation of inferential statistics, measuring the likelihood of events.

    2.1 Basic Probability Rules




    1. Probability of an Event:

      \[ P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}} \]


    2. Complement Rule:

      \[ P(\text{not } E) = 1 - P(E) \]


    3. Addition Rule: For mutually exclusive events:

      • \[ P(A \cup B) = P(A) + P(B) \]


      For non-mutually exclusive events:
       \[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]



    4. Multiplication Rule: For independent events:

      • \[ P(A \cap B) = P(A) \times P(B) \]


      For dependent events:
       \[ P(A \cap B) = P(A) \times P(B|A) \]

      Where \( P(B|A) \) is the probability of B given A.



    2.2 Probability Distributions




    1. Binomial Distribution:

      Used for fixed number of independent Bernoulli trials.
       \[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]
      Where:

      • \( n \) = number of trials

      • \( k \) = number of successes

      • \( p \) = probability of success in each trial





    2. Normal Distribution:

      Continuous probability distribution characterized by mean (\( \mu \)) and standard deviation (\( \sigma \)). The probability density function:
       \[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{ - \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 } \]




    ---

    3. Inferential Statistics Formulas



    Inferential statistics involve making predictions or generalizations about a population based on sample data.

    3.1 Confidence Intervals




    1. Confidence Interval for Population Mean (when population standard deviation is known):
       \[ \bar{x} \pm Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
      Where:

      • \( Z_{\alpha/2} \) = Z-value corresponding to confidence level





    2. Confidence Interval for Population Mean (when population standard deviation is unknown, using t-distribution):
       \[ \bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}} \]

      Where:

      • \( t_{\alpha/2, n-1} \) = t-value for confidence level and degrees of freedom





    3.2 Hypothesis Testing




    1. Z-test (for large samples or known \( \sigma \)):
       Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]
      Where:

      • \( \mu_0 \) = hypothesized population mean





    2. T-test (for small samples or unknown \( \sigma \)):
       t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]



    3. Chi-square Test for Independence:
       \[ \chi^2 = \sum \frac{(O - E)^2}{E} \]
      Where:

      • O = observed frequency

      • E = expected frequency






    ---

    4. Correlation and Regression



    These formulas help analyze relationships between variables.

    4.1 Correlation Coefficient (Pearson's r)


    Measures linear correlation between two variables:

     \[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \times \sum (y_i - \bar{y})^2}} \]
    Where:

    • \( x_i, y_i \) = paired data points

    • \( \bar{x}, \bar{y} \) = means of x and y




    4.2 Regression Line Equation

    Frequently Asked Questions


    What is the formula for calculating the mean in basic statistics?

    The mean is calculated by summing all data points and dividing by the number of data points: Mean (μ) = (Σx) / n.

    How do you compute the median in a data set?

    To find the median, order the data from smallest to largest and identify the middle value. If the number of observations is even, take the average of the two middle values.

    What is the formula for variance in statistics?

    Variance is calculated as: Variance (σ²) = Σ(xᵢ - μ)² / n for population, or Σ(xᵢ - x̄)² / (n - 1) for a sample.

    How is the standard deviation related to variance?

    Standard deviation is the square root of variance: σ = √σ².

    What is the formula for calculating the probability of an event?

    Probability (P) of an event = (Number of favorable outcomes) / (Total number of outcomes).

    How do you calculate the range in a data set?

    Range = Maximum value - Minimum value.

    What is the formula for the z-score in standardization?

    Z-score = (X - μ) / σ, where X is the value, μ is the mean, and σ is the standard deviation.

    How is the coefficient of variation calculated?

    Coefficient of variation (CV) = (Standard deviation / Mean) × 100%, used to compare variability between datasets.

    What is the formula for the sample size in estimating a population mean?

    Sample size n = (Z² × σ²) / E², where Z is the Z-score for confidence level, σ² is variance, and E is the margin of error.

    How do you compute the probability of the union of two independent events?

    P(A ∪ B) = P(A) + P(B) - P(A) ∩ P(B). For independent events, P(A ∩ B) = P(A) × P(B).