Examples of regression data and analysis
The Excel files whose links are given below provide examples of linear and logistic regression analysis illustrated with RegressIt. Most of them include detailed notes that explain the analysis and are useful for teaching purposes.
Links for examples of analysis performed with other add-ins are at the bottom of the page. If you normally use Excel's own Data Analysis Toolpak for regression, you should stop right now and visit this link first.
TO DOWNLOAD AN XLSX FILE FROM THE OPTIONS BELOW, RIGHT-CLICK THE FILE LINK ON A PC OR CTRL-CLICK IT ON A MAC, AND CHOOSE THE SAVE-LINK-AS OPTION, AND SELECT A CONVENIENT FOLDER. In Chrome or Edge you will also get a security warning message. Click the up-arrow or three dots at the right of the warning bar and choose the "keep" option.
1. Weekly beer sales: This example deals with price/demand relationships and illustrates the use of a nonlinear data transformation--the natural log--which is an important mathematical wrench in the toolkit of linear regression. Its analysis is described in detail on the Features pages, in the User Manual, and on the Statistical Forecasting site.
Beer_sales_with_analysis.xlsx
2. Automobile fuel economy: This example from the 1983 ASA Data Expo is widely used in teaching and in the machine learning literature and is discussed on the Excel-to-R-and-back pages on this site. The objective is to predict a car's fuel consumption from its physical attributes and its country of origin.
AutoMPGmodels.xlsx
AutoMPG_R_models.xlsx
3. Yearly baseball batting averages: A good example of simple regression is the exercise of predicting a numerical measure of a professional athlete's performance in a given year by a linear function of his or her performance on the same measure in the previous year. Baseball batting averages are particularly good raw material for this kind of analysis because they are averages of almost-independent and almost-identically distributed random variables with large sample sizes, and they measure a skill that needs to be exhibited within acceptable limits by all players, not merely specialists at a position. A thorough discussion of this example can be found on the Statistical Forecasting site.
Baseball_batting_averages_with_analysis.xlsx
Baseball_player_statistics_1960-2004--larger_data_file_with_more_variables.xlsx (2.4M)
4. Monthly stock returns: This example illustrates a classic model in finance theory in which simple regression is used for estimating "betas" of stocks.
Stock_returns_with_analysis.xlsx
5. Daily web site visitors: This data set consists of 3 months of daily visitor counts on an educational web site. There is a very strong day-of-week effect that provides a good opportunity for using dummy variables to capture a repeating time pattern. The analysis of this series is illustrated on the Forecasting page on this site.
Daily_web_site_visitors_with_analysis.xlsx
Web_site_visitors_2014-2020.xlsx (larger sample with complex seasonality - 2167 days - updated in August 2020 )
6. Fish catch (***new--February 2020***): This classic data set, obtained from the jes.amstat data archive, illustrates the use of regression to predict the weight of a fish from its physical measurements and its species. It's an extension of the standard model that is used in the fishery literature and provides another nice example of the use of dummy variables and the natural log transformation.
Fish_catch_data_with_analysis.xlsx
7 . Monthly natural gas consumption in North Carolina: This data set consists of monthly natural gas consumption by end use (commercial, residential, etc.) in North Carolina from 2001 to 2015, together with various measures of average monthly temperature (daily mean/min/max and heating and cooling degree days). The objective is to predict the gas consumption variables from the temperature variables and/or month-of-year dummy variables.
NC_natural_gas_consumption_analysis.xlsx
The Excel files whose links are given below provide examples of linear and logistic regression analysis illustrated with RegressIt. Most of them include detailed notes that explain the analysis and are useful for teaching purposes.
Links for examples of analysis performed with other add-ins are at the bottom of the page. If you normally use Excel's own Data Analysis Toolpak for regression, you should stop right now and visit this link first.
TO DOWNLOAD AN XLSX FILE FROM THE OPTIONS BELOW, RIGHT-CLICK THE FILE LINK ON A PC OR CTRL-CLICK IT ON A MAC, AND CHOOSE THE SAVE-LINK-AS OPTION, AND SELECT A CONVENIENT FOLDER. In Chrome or Edge you will also get a security warning message. Click the up-arrow or three dots at the right of the warning bar and choose the "keep" option.
1. Weekly beer sales: This example deals with price/demand relationships and illustrates the use of a nonlinear data transformation--the natural log--which is an important mathematical wrench in the toolkit of linear regression. Its analysis is described in detail on the Features pages, in the User Manual, and on the Statistical Forecasting site.
Beer_sales_with_analysis.xlsx
2. Automobile fuel economy: This example from the 1983 ASA Data Expo is widely used in teaching and in the machine learning literature and is discussed on the Excel-to-R-and-back pages on this site. The objective is to predict a car's fuel consumption from its physical attributes and its country of origin.
AutoMPGmodels.xlsx
AutoMPG_R_models.xlsx
3. Yearly baseball batting averages: A good example of simple regression is the exercise of predicting a numerical measure of a professional athlete's performance in a given year by a linear function of his or her performance on the same measure in the previous year. Baseball batting averages are particularly good raw material for this kind of analysis because they are averages of almost-independent and almost-identically distributed random variables with large sample sizes, and they measure a skill that needs to be exhibited within acceptable limits by all players, not merely specialists at a position. A thorough discussion of this example can be found on the Statistical Forecasting site.
Baseball_batting_averages_with_analysis.xlsx
Baseball_player_statistics_1960-2004--larger_data_file_with_more_variables.xlsx (2.4M)
4. Monthly stock returns: This example illustrates a classic model in finance theory in which simple regression is used for estimating "betas" of stocks.
Stock_returns_with_analysis.xlsx
5. Daily web site visitors: This data set consists of 3 months of daily visitor counts on an educational web site. There is a very strong day-of-week effect that provides a good opportunity for using dummy variables to capture a repeating time pattern. The analysis of this series is illustrated on the Forecasting page on this site.
Daily_web_site_visitors_with_analysis.xlsx
Web_site_visitors_2014-2020.xlsx (larger sample with complex seasonality - 2167 days - updated in August 2020 )
6. Fish catch (***new--February 2020***): This classic data set, obtained from the jes.amstat data archive, illustrates the use of regression to predict the weight of a fish from its physical measurements and its species. It's an extension of the standard model that is used in the fishery literature and provides another nice example of the use of dummy variables and the natural log transformation.
Fish_catch_data_with_analysis.xlsx
7 . Monthly natural gas consumption in North Carolina: This data set consists of monthly natural gas consumption by end use (commercial, residential, etc.) in North Carolina from 2001 to 2015, together with various measures of average monthly temperature (daily mean/min/max and heating and cooling degree days). The objective is to predict the gas consumption variables from the temperature variables and/or month-of-year dummy variables.
NC_natural_gas_consumption_analysis.xlsx
8. Logistic regression examples
Titanic_logistic_models.xlsx (see the Titanic web page for discussion)
GLOW_logistic_models.xlsx (see the GLOW web page for discussion)
Email_logistic_models.xlsx
Logistic_example_Y_vs_X1.xlsx (example used in logistic regression notes pdf file)
Titanic_logistic_models.xlsx (see the Titanic web page for discussion)
GLOW_logistic_models.xlsx (see the GLOW web page for discussion)
Email_logistic_models.xlsx
Logistic_example_Y_vs_X1.xlsx (example used in logistic regression notes pdf file)
9. Examples of analysis with other Excel add-ins: Analysis Toolpak, StatTools, Analyse-it, XLSTAT, SigmaXL, XLMiner, Unistat. See which ones you like.
What's wrong with Excel's own data analysis add-in for regression (Analysis Toolpak)
Comparison_of_Toolpak_and_RegressIt_models.xlsx
Comparison_of_add-ins_for_regression (slides from talk at International Symposium on Forecasting, 6/22/2015)
Beer_sales_with_StatTools_analysis.xlsx
Beer_sales_with_Analyse-it_analysis.xlsx
Beer_sales_with_XLSTAT_analysis.xlsx
Beer_sales_with_SigmaXL_analysis.xlsx
Examples_of_regression_forecasts_from_4_add-ins.xlsm (macro-enabled workbook)
XLMiner_Regression_Analysis.xlsx
Unistat_linear_and_logistic_regression.xlsx
Also: Stata then and now (regression output frozen in time)
What's wrong with Excel's own data analysis add-in for regression (Analysis Toolpak)
Comparison_of_Toolpak_and_RegressIt_models.xlsx
Comparison_of_add-ins_for_regression (slides from talk at International Symposium on Forecasting, 6/22/2015)
Beer_sales_with_StatTools_analysis.xlsx
Beer_sales_with_Analyse-it_analysis.xlsx
Beer_sales_with_XLSTAT_analysis.xlsx
Beer_sales_with_SigmaXL_analysis.xlsx
Examples_of_regression_forecasts_from_4_add-ins.xlsm (macro-enabled workbook)
XLMiner_Regression_Analysis.xlsx
Unistat_linear_and_logistic_regression.xlsx
Also: Stata then and now (regression output frozen in time)
If you have some examples of data analysis with RegressIt that you would like to share, please send them to feedback@regressit.com and we will be happy to consider them for posting here, with attribution.