Regression Analysis with Scikit-Learn (part 1 - Linear)
- Authors
- Topics:
This lesson is the first of two that focus on an indispensable set of data analysis methods, linear and logistic regression. Linear regression represents how one (or more) quantitative measures relate to, or predict, some other quantitative measure. A computational historian, for example, might use linear regression analysis to do the following:
- Assess how access to rail transportation affected population density and urbanization in the American Midwest between 1850 and 18601
- Interrogate the ostensible link between periods of drought and the stability of nomadic societies
Logistic and linear regression are perhaps the most widely used methods in quantitative analysis, including (but not limited to) computational history. They remain popular in part because:
- They are extremely versatile, as the above examples suggest
- Their performance can be evaluated with easy-to-understand metrics
- The underlying mechanics of model predictions are accessible to human interpretation (in contrast to many ‘black box’ models)
Reviewed by:
- Thomas Jurczyk
- Rennie C Mapp
Learning outcomes
After completing this lesson, you will be able to:
- Run linear regression algorithms in Python using the Scikit-learn library
- Validate models and assess their performance
- Interpret the results given by linear regression models
- Know which common pitfalls to avoid when conducting regression analysis
Check out this lesson on Programming Historian's website
Go to this resource