Descriptive Statistics
Introduction
Quantitative data
analysis of a large collection of data is made possible using certain
numerical computations that give an understanding of the nature of the data
collected and make it easier to interpret their trend. Descriptive statistics
and inferential statistics are the two methods used for this purpose.
Descriptive
Statistics Defined
Descriptive statistics describe, show,
and summarize the basic features of a dataset found in a given study,
presented in a summary that describes the data sample and its measurements.
It helps analysts to understand the data better.
Descriptive statistics represent the available
data sample and does not include theories, inferences, probabilities, or
conclusions. That’s a job for inferential statistics.
Example: Analysts often use
charts and graphs to present descriptive statistics. If you stood outside of
a movie theater, asked 50 members of the audience if they liked the film they
saw, then put your findings on a pie chart, that would be descriptive
statistics. In this example, descriptive statistics measure the number of yes
and no answers and shows how many people in this specific theater liked or
disliked the movie.
Descriptive statistics describes or
summarizes the basic features or characteristics of the data. It assigns
numerical values to describe the trend of the samples collected. It converts
large volumes of data and presents it in a simpler, more meaningful format that
is easier to understand and interpret. It is paired with graphs and tables;
descriptive statistics offer a clear summary of the data’s complete collection.
Descriptive statistics indicate that
interpretation is the primary purpose, while inferential statistics make future
predictions for a larger set of data based on descriptive values obtained.
Hence, descriptive statistics form the first step and the basis of quantitative
data analysis.
Types of Descriptive Statistics
There are four major types of
descriptive statistics used to measure a given set of data characteristics.
A) Measures of Frequency
This measures how often a particular
variable occurs in the distribution. It can be measured in numbers or
percentages and shows how frequently a response or variable occurs.
B) Measures of Central Tendency
Measures of central tendency indicate
the average or the most common variable in the data set. They identify certain
points by computing the mean, median, and mode.
C) Measures of Variation or
Dispersion
This shows how spread out the responses
in the data set are. It helps identify the gap between the highest and lowest
values and how far apart individual values are from the mean or the average.
Measures of variation are calculated using the range, standard deviation, and
variance.
D) Measure of Position
This measures how individual values are
positioned with one another. This method of calculation relies on a
standardized value. Percentiles and quartile ranks indicate the measures of
position.
Methods Used in Descriptive
Statistics
The various descriptive statistics
methods used to arrive at the characteristics of the data set include:
A) Mean
Mean is the average of all the values
and can be calculated by adding up all the values and dividing the total sum by
the number of values. Mean = Sum of values/Number of values
B) Median
The median of the set is the value that
is at the exact center of the set. If there are two values at the center, their
mean is calculated to find the median.
C) Mode
The mode is the value that appears most
frequently in the set. Arranging the values in order from lowest to highest
helps identify the mode. Any data set can have no mode, one mode, or multiple
modes.
D) Range
The range is the difference between the
highest value of the data set and the lowest value. It can be calculated by
subtracting the lowest value from the highest value. The range indicates how
far apart the values are.
E) Standard Deviation
Standard deviation measures the average
variability of the values in the data set or how far individual values are from
the mean. A large value of the standard deviation indicates high variability.
F) Variance
Variance measures the degree of spread
in the data set and is the average of squared deviations from the mean. A
squared standard deviation gives the variance.
These methods can be used for
univariate analysis, bivariate analysis, or multivariate analysis as needed.
The univariate analysis considers only
one variable at a particular time. This allows the examination of each variable
in the data set using different measures of frequency, variation, and central
tendency.
The bivariate analysis identifies any
available relationship between two different variables. The frequency and
variability of the two variables are measured together to see if they vary
together. The measure of central tendency can also be taken during bivariate
analysis.
Multivariate analysis is similar to
bivariate analysis within the exception that it takes more than two variables
into account to identify any relationship between them.
Examples of Descriptive
Statistics
The most important reason for the wide
use of descriptive statistics is that it makes a complex set of data easier to
interpret by giving a convenient summary. Here are some examples where
descriptive statistics help:
·
It
indicates the overall performance of a sportsman in a tournament, such as in
baseball. A batting average gives the average number of hits by the batter in
the total time at-bat.
·
A
GPA or grade point average indicates the overall performance of a student at
school across multiple tests and courses throughout the year.
·
Identify
the distribution of college students using different variables like year of
study, gender, course, etc.
·
Determine
the demographics of a certain population in a city, state, or country.
Descriptive statistics can identify the distribution of the population in terms
of gender or occupation, the variance in income levels, etc.
Important Tools in Descriptive
Statistics
Various descriptive statistics tools
can be called on for specific scenarios. Choosing the right tool depends
entirely on the objective of the analysis and the type and number of variables
at hand.
There are two categories of tools in
descriptive statistics:
1.
Numerical Tools: These include the
various methods of calculation:
2.
Mean
3.
Median
4.
Mode
5.
Standard
deviation
6.
Variance
7.
Range
8.
Coefficient
of variation
9.
Skewness
and kurtosis coefficients
10.Quartiles
11.Percentiles
12.Contingency tables
13.Frequency tables
14.Correlation
15.RV coefficient
1.
Graphic Tools: These allow the
representation of various data points as graphs or tables:
2.
Box
plots
3.
Scatter
plots
4.
Whisker
plots
5.
Bar
chart
6.
Pie
chart
7.
Histogram
8.
Ternary
diagram
9.
Correlation
map
10.Probability plot
11.Strip plot
Importance of Descriptive
Statistics
Descriptive statistics is the basis of
any quantitative data analysis process. It gives a simplified picture of the
data set, no matter how wide or complex the data, and enables easy
interpretation. It is the first step to describing the data and its features.
The importance of descriptive statistics lies in its fundamentals as the
measures and values obtained through descriptive statistics are essential for
any advanced statistical analysis.
Descriptive analytics forms the
foundation of quantitative analysis of any set of data. While a single
indicator for a large set of data may distort the specifics of the values, it
still delivers a convenient and usable summary that indicates the relationship
between the variables and allows for essential comparisons.
|