Descriptive Statistics IntroductionQuantitative data analysis of a large collection of data is made possible using certain numerical computations that give an understanding of the nature of the data collected and make it easier to interpret their trend. Descriptive statistics and inferential statistics are the two methods used for this purpose. Descriptive Statistics DefinedDescriptive statistics describe, show,
and summarize the basic features of a dataset found in a given study,
presented in a summary that describes the data sample and its measurements.
It helps analysts to understand the data better. Descriptive statistics represent the available
data sample and does not include theories, inferences, probabilities, or
conclusions. That’s a job for inferential statistics. Example: Analysts often use
charts and graphs to present descriptive statistics. If you stood outside of
a movie theater, asked 50 members of the audience if they liked the film they
saw, then put your findings on a pie chart, that would be descriptive
statistics. In this example, descriptive statistics measure the number of yes
and no answers and shows how many people in this specific theater liked or
disliked the movie. Descriptive statistics describes or
summarizes the basic features or characteristics of the data. It assigns
numerical values to describe the trend of the samples collected. It converts
large volumes of data and presents it in a simpler, more meaningful format that
is easier to understand and interpret. It is paired with graphs and tables;
descriptive statistics offer a clear summary of the data’s complete collection. Descriptive statistics indicate that
interpretation is the primary purpose, while inferential statistics make future
predictions for a larger set of data based on descriptive values obtained.
Hence, descriptive statistics form the first step and the basis of quantitative
data analysis. Types of Descriptive StatisticsThere are four major types of
descriptive statistics used to measure a given set of data characteristics. A) Measures of Frequency This measures how often a particular
variable occurs in the distribution. It can be measured in numbers or
percentages and shows how frequently a response or variable occurs. B) Measures of Central Tendency Measures of central tendency indicate
the average or the most common variable in the data set. They identify certain
points by computing the mean, median, and mode. C) Measures of Variation or
Dispersion This shows how spread out the responses
in the data set are. It helps identify the gap between the highest and lowest
values and how far apart individual values are from the mean or the average.
Measures of variation are calculated using the range, standard deviation, and
variance. D) Measure of Position This measures how individual values are
positioned with one another. This method of calculation relies on a
standardized value. Percentiles and quartile ranks indicate the measures of
position. Methods Used in Descriptive StatisticsThe various descriptive statistics
methods used to arrive at the characteristics of the data set include: A) Mean Mean is the average of all the values and can be calculated by adding up all the values and dividing the total sum by the number of values. Mean = Sum of values/Number of values B) Median The median of the set is the value that
is at the exact center of the set. If there are two values at the center, their
mean is calculated to find the median. C) Mode The mode is the value that appears most
frequently in the set. Arranging the values in order from lowest to highest
helps identify the mode. Any data set can have no mode, one mode, or multiple
modes. D) Range The range is the difference between the
highest value of the data set and the lowest value. It can be calculated by
subtracting the lowest value from the highest value. The range indicates how
far apart the values are. E) Standard Deviation Standard deviation measures the average variability of the values in the data set or how far individual values are from the mean. A large value of the standard deviation indicates high variability. F) Variance Variance measures the degree of spread
in the data set and is the average of squared deviations from the mean. A
squared standard deviation gives the variance. These methods can be used for
univariate analysis, bivariate analysis, or multivariate analysis as needed. The univariate analysis considers only
one variable at a particular time. This allows the examination of each variable
in the data set using different measures of frequency, variation, and central
tendency. The bivariate analysis identifies any
available relationship between two different variables. The frequency and
variability of the two variables are measured together to see if they vary
together. The measure of central tendency can also be taken during bivariate
analysis. Multivariate analysis is similar to
bivariate analysis within the exception that it takes more than two variables
into account to identify any relationship between them. Examples of Descriptive StatisticsThe most important reason for the wide
use of descriptive statistics is that it makes a complex set of data easier to
interpret by giving a convenient summary. Here are some examples where
descriptive statistics help: ·
It
indicates the overall performance of a sportsman in a tournament, such as in
baseball. A batting average gives the average number of hits by the batter in
the total time at-bat. ·
A
GPA or grade point average indicates the overall performance of a student at
school across multiple tests and courses throughout the year. ·
Identify
the distribution of college students using different variables like year of
study, gender, course, etc. ·
Determine
the demographics of a certain population in a city, state, or country.
Descriptive statistics can identify the distribution of the population in terms
of gender or occupation, the variance in income levels, etc. Important Tools in Descriptive StatisticsVarious descriptive statistics tools
can be called on for specific scenarios. Choosing the right tool depends
entirely on the objective of the analysis and the type and number of variables
at hand. There are two categories of tools in
descriptive statistics: 1.
Numerical Tools: These include the
various methods of calculation: 2.
Mean 3.
Median 4.
Mode 5.
Standard
deviation 6.
Variance 7.
Range 8.
Coefficient
of variation 9.
Skewness
and kurtosis coefficients 10.Quartiles 11.Percentiles 12.Contingency tables 13.Frequency tables 14.Correlation 15.RV coefficient 1.
Graphic Tools: These allow the
representation of various data points as graphs or tables: 2.
Box
plots 3.
Scatter
plots 4.
Whisker
plots 5.
Bar
chart 6.
Pie
chart 7.
Histogram 8.
Ternary
diagram 9.
Correlation
map 10.Probability plot 11.Strip plot Importance of Descriptive StatisticsDescriptive statistics is the basis of
any quantitative data analysis process. It gives a simplified picture of the
data set, no matter how wide or complex the data, and enables easy
interpretation. It is the first step to describing the data and its features.
The importance of descriptive statistics lies in its fundamentals as the
measures and values obtained through descriptive statistics are essential for
any advanced statistical analysis. Descriptive analytics forms the foundation of quantitative analysis of any set of data. While a single indicator for a large set of data may distort the specifics of the values, it still delivers a convenient and usable summary that indicates the relationship between the variables and allows for essential comparisons. |
No comments:
Post a Comment