The Marketing Research Guy: December 2022

Wednesday, December 14, 2022

Regression Analysis

Regression Analysis

Regression analysis is the statistical method used to determine the structure of a relationship between two variables (single linear regression) or three or more variables (multiple regression).

According to the Harvard Business School Online course Business Analytics, regression is used for two primary purposes:

1. To study the magnitude and structure of the relationship between variables

2. To forecast a variable based on its relationship with another variable

Both of these insights can inform strategic business decisions.

A regression is a statistical technique that relates a dependent variable to one or more independent (explanatory) variables.
A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the explanatory variables.
It does this by essentially fitting a best-fit line and seeing how the data is dispersed around this line.
Regression helps economists and financial analysts in things ranging from asset valuation to making predictions.
Once you’ve generated a regression equation for a set of variables, you effectively have a road-map for the relationship between your independent and dependent variables. If you input a specific X value into the equation, you can see the expected Y value.
This can be critical for predicting the outcome of potential changes, allowing you to ask, “What would happen if this factor changed by a specific amount?”

For Example – Suppose a soft drink company wants to expand its manufacturing unit to a newer location. Before moving forward; the company wants to analyze its revenue generation model and the various factors that might impact it. Hence, the company conducts an online survey with a specific questionnaire.

After using regression analysis, it becomes easier for the company to analyze the survey results and understand the relationship between different variables like electricity and revenue – here revenue is the dependent variable. In addition to that, understanding the relationship between different independent variables like pricing, number of workers, and logistics with the revenue helps the company to estimate an impact of varied factors on sales and profits of the company.

Descriptive Statistics

Introduction

Quantitative data analysis of a large collection of data is made possible using certain numerical computations that give an understanding of the nature of the data collected and make it easier to interpret their trend. Descriptive statistics and inferential statistics are the two methods used for this purpose.

Descriptive Statistics Defined

Descriptive statistics describe, show, and summarize the basic features of a dataset found in a given study, presented in a summary that describes the data sample and its measurements. It helps analysts to understand the data better.

Descriptive statistics represent the available data sample and does not include theories, inferences, probabilities, or conclusions. That’s a job for inferential statistics.

Example: Analysts often use charts and graphs to present descriptive statistics. If you stood outside of a movie theater, asked 50 members of the audience if they liked the film they saw, then put your findings on a pie chart, that would be descriptive statistics. In this example, descriptive statistics measure the number of yes and no answers and shows how many people in this specific theater liked or disliked the movie.

Descriptive statistics describes or summarizes the basic features or characteristics of the data. It assigns numerical values to describe the trend of the samples collected. It converts large volumes of data and presents it in a simpler, more meaningful format that is easier to understand and interpret. It is paired with graphs and tables; descriptive statistics offer a clear summary of the data’s complete collection.

Descriptive statistics indicate that interpretation is the primary purpose, while inferential statistics make future predictions for a larger set of data based on descriptive values obtained. Hence, descriptive statistics form the first step and the basis of quantitative data analysis.

Types of Descriptive Statistics

There are four major types of descriptive statistics used to measure a given set of data characteristics.

A) Measures of Frequency

This measures how often a particular variable occurs in the distribution. It can be measured in numbers or percentages and shows how frequently a response or variable occurs.

B) Measures of Central Tendency

Measures of central tendency indicate the average or the most common variable in the data set. They identify certain points by computing the mean, median, and mode.

C) Measures of Variation or Dispersion

This shows how spread out the responses in the data set are. It helps identify the gap between the highest and lowest values and how far apart individual values are from the mean or the average. Measures of variation are calculated using the range, standard deviation, and variance.

D) Measure of Position

This measures how individual values are positioned with one another. This method of calculation relies on a standardized value. Percentiles and quartile ranks indicate the measures of position.

Methods Used in Descriptive Statistics

The various descriptive statistics methods used to arrive at the characteristics of the data set include:

A) Mean

Mean is the average of all the values and can be calculated by adding up all the values and dividing the total sum by the number of values.

Mean = Sum of values/Number of values

B) Median

The median of the set is the value that is at the exact center of the set. If there are two values at the center, their mean is calculated to find the median.

C) Mode

The mode is the value that appears most frequently in the set. Arranging the values in order from lowest to highest helps identify the mode. Any data set can have no mode, one mode, or multiple modes.

D) Range

The range is the difference between the highest value of the data set and the lowest value. It can be calculated by subtracting the lowest value from the highest value. The range indicates how far apart the values are.

E) Standard Deviation

Standard deviation measures the average variability of the values in the data set or how far individual values are from the mean. A large value of the standard deviation indicates high variability.

F) Variance

Variance measures the degree of spread in the data set and is the average of squared deviations from the mean. A squared standard deviation gives the variance.

These methods can be used for univariate analysis, bivariate analysis, or multivariate analysis as needed.

The univariate analysis considers only one variable at a particular time. This allows the examination of each variable in the data set using different measures of frequency, variation, and central tendency.

The bivariate analysis identifies any available relationship between two different variables. The frequency and variability of the two variables are measured together to see if they vary together. The measure of central tendency can also be taken during bivariate analysis.

Multivariate analysis is similar to bivariate analysis within the exception that it takes more than two variables into account to identify any relationship between them.

Examples of Descriptive Statistics

The most important reason for the wide use of descriptive statistics is that it makes a complex set of data easier to interpret by giving a convenient summary. Here are some examples where descriptive statistics help:

· It indicates the overall performance of a sportsman in a tournament, such as in baseball. A batting average gives the average number of hits by the batter in the total time at-bat.

· A GPA or grade point average indicates the overall performance of a student at school across multiple tests and courses throughout the year.

· Identify the distribution of college students using different variables like year of study, gender, course, etc.

· Determine the demographics of a certain population in a city, state, or country. Descriptive statistics can identify the distribution of the population in terms of gender or occupation, the variance in income levels, etc.

Important Tools in Descriptive Statistics

Various descriptive statistics tools can be called on for specific scenarios. Choosing the right tool depends entirely on the objective of the analysis and the type and number of variables at hand.

There are two categories of tools in descriptive statistics:

1. Numerical Tools: These include the various methods of calculation:

2. Mean

3. Median

4. Mode

5. Standard deviation

6. Variance

7. Range

8. Coefficient of variation

9. Skewness and kurtosis coefficients

10.Quartiles

11.Percentiles

12.Contingency tables

13.Frequency tables

14.Correlation

15.RV coefficient

1. Graphic Tools: These allow the representation of various data points as graphs or tables:

2. Box plots

3. Scatter plots

4. Whisker plots

5. Bar chart

6. Pie chart

7. Histogram

8. Ternary diagram

9. Correlation map

10.Probability plot

11.Strip plot

Importance of Descriptive Statistics

Descriptive statistics is the basis of any quantitative data analysis process. It gives a simplified picture of the data set, no matter how wide or complex the data, and enables easy interpretation. It is the first step to describing the data and its features. The importance of descriptive statistics lies in its fundamentals as the measures and values obtained through descriptive statistics are essential for any advanced statistical analysis.

Descriptive analytics forms the foundation of quantitative analysis of any set of data. While a single indicator for a large set of data may distort the specifics of the values, it still delivers a convenient and usable summary that indicates the relationship between the variables and allows for essential comparisons.

SPSS an Introduction

SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Current versions have the brand name: IBM SPSS Statistics.

What is SPSS? SPSS is a Windows based program that can be used to perform data entry and analysis and to create tables and graphs. SPSS is capable of handling large amounts of data and can perform all of the analyses covered in the text and much more.

SPSS Statistics is used in education, market research, healthcare, government and retail throughout the entire analytics process, from planning and data collection to analysis, reporting and deployment.

SPSS introduces the following four programs that help researchers with their complex data analysis needs

1. Statistics Program

SPSS’s statistics program gives a large amount of basic statistical functionality; some include frequencies, cross-tabulation, bivariate statistics, etc.

2. Modeler Program

Researchers are able to build and validate predictive models with the help of advanced statistical procedures.

3. Text Analytics for Surveys Program

It gives robust feedback analysis. which in turn get a vision for the actual plan.

4. Visualization Designer

Researchers found this visual designer data to create a wide variety of visuals like density charts and radial box plots.

What Is Generalisability?

• Generalisability is the degree to which you can apply the results of your study to a broader context. Research results are considered generalisable when the findings can be applied to most contexts, most people, most of the time.

Example of Generalisability

Suppose you want to investigate the shopping habits of people in your city. You stand at the entrance to a high-end shopping street and randomly ask passersby whether they want to answer a few questions for your survey. Do the people who agree to help you with your survey accurately represent all the people in your city? Probably not. This means that your study can’t be considered generalisable.

The goal of research is to produce knowledge that can be applied as widely as possible. However, since it usually isn’t possible to analyse every member of a population, researchers make do by analysing a portion of it, making statements about that portion.

To be able to apply these statements to larger groups, researchers must ensure that the sample accurately resembles the broader population.

• In general, a study has good generalisability when the results apply to many different types of people or different situations. In contrast, if the results can only be applied to a subgroup of the population or in a very specific situation, the study has poor generalisability.

Why is generalisability important?

Obtaining a representative sample is crucial for probability sampling. In contrast, studies using non-probability sampling designs are more concerned with investigating a few cases in depth, rather than generalising their findings. As such, generalisability is the main difference between probability and non-probability samples.

There are three factors that determine the generalisability of your study in a probability sampling design:

• The randomness of the sample, with each research unit (e.g., person, business, or organisation in your population) having an equal chance of being selected.

• How representative the sample is of your population.

• The size of your sample, with larger samples more likely to yield statistically significant results.

• Generalisability is crucial for establishing the validity and reliability of your study. In most cases, a lack of generalisability significantly narrows down the scope of your research—i.e., to whom the results can be applied.

• However, research results that cannot be generalised can still have value. It all depends on your research objectives.

Types of generalisability

There are two broad types of generalisability:

• Statistical generalisability, which applies to quantitative research

• Theoretical generalisability (also referred to as transferability), which applies to qualitative research

Statistical generalizability: It is critical for quantitative research. Statistical generalisation is achieved when you study a sample that accurately mirrors characteristics of the population. The sample needs to be sufficiently large and unbiased.

Theoretical generalizability: In qualitative research, statistical generalisability is not relevant. This is because qualitative research is primarily concerned with obtaining insights on some aspect of human experience, rather than data with solid statistical basis. By studying individual cases, researchers will try to get results that they can extend to similar cases. This is known as theoretical generalisability or transferability.

Steps To Ensure Generalisability In Research?

In order to apply your findings on a larger scale, you should take the following steps to ensure your research has sufficient generalisability.

• Define your population in detail. By doing so, you will establish what it is that you intend to make generalisations about. For example, are you going to discuss students in general, or students on your campus?

• Use random sampling. If the sample is truly random (i.e., everyone in the population is equally likely to be chosen for the sample), then you can avoid sampling bias and ensure that the sample will be representative of the population.

• Consider the size of your sample. The sample size must be large enough to support the generalisation being made. If the sample represents a smaller group within that population, then the conclusions have to be downsized in scope.

• If you’re conducting qualitative research, try to reach a saturation point of important themes and categories. This way, you will have sufficient information to account for all aspects of the phenomenon under study.