## Linear Regression Analysis

The term multiple regression (linear regression analysis with more than one variable) was first used by Pearson in 1908. Multiple regression is a statistical technique that help us in understanding about the relationship between several independent or predictor variables and a dependent variable (also called criterion variable).

Let us take an example to understand this in some detail. Suppose that you are a researcher and you record data of 1000 MBA graduates about their gender, age, ethnicity, high school performance,  parental income, work experience, and GMAT score. Once this information is collected it will be of interest to see how these variables are related to the GPA of the MBA graduate. For example, you might want to understand out of GMAT score and parental income, which one is a better predictor of the price. You may also detect ‘outliers’, i.e a student should score a higher GPA given other details. This information can be used in a multiple regression analysis to build a regression equation of the form:

GPA = 1.1 + 1.5*age +1.2*ethnicity + 0.8*gender + 1.9*high_school_performance + 2.5*parental_income + 3.1*work_experience + 3.9*GMAT_Score

Once this regression line is determined, we can easily determine the expected GPA of an MBA graduate basis all the other information specified in the model. In addition, the regression model will tell us a host of other information such as, on an average, how high/low the Female MBA applicants score compared to the Male MBA applicants.

Multiple regression analysis procedures are widely used in social science domain. In general, multiple regression allows the researcher to explore the general question “what is the best predictor of …”. In the context of the MBA student example cited above, the educational researchers might want to learn what are the best predictors of success in B-school.