RegressIt program features

General

- Free add-in for Excel for Windows that performs multivariate descriptive data analysis and multiple linear regression analysis.
- Written in Visual Basic, the program file is less than 500K in size and does not permanently install itself. It can be launched either by opening it from wherever it is stored or by clicking a link on this web site.
- Runs on PC's, and can also be used on a Mac at present with PC virtualization software such as VMware Fusion. A native Mac version is going to be released within a few months.
- Designed to support and and encourage best practices of data analysis by making it easy (even enjoyable) to do the right things. Makes a good complement to other statistical software anyone who is at least an occasional Excel user.
- Provides abundant table and chart output that is presentation-quality and in native Excel format. There is also an option to produce charts in bitmapped form.
- RegressIt can be operated simultaneously with other Excel statistics add-ins such as XLSTAT, Analyse-it, StatTools, and SigmaXL, effectively adding more menu options and better-designed tables and charts to their regression procedures. (See
**these slides**for a direct comparison of regression output produced by the different add-ins.) - Charts are formatted in accordance with good principles of statistical graphics, they are uniquely titled and descriptively labeled, they have properly scaled axes, and their point sizes are adjusted for the size of the data set. (Excel's default formatting is
*not*used.) As Excel chart objects, they can be instantly enlarged to fill the screen by moving them to individual worksheets, and they are subject to further editing or active linking after being copied to Word or Powerpoint files. - Numbers are formatted with enough decimal places but not too many, and they are lined up to the best degree possible in tables. An advantage of the Excel environment is that numbers are stored in full precision within cells, so they do not need to be formatted to show many more digits than are useful.
- Variables are defined as named column ranges, they can be up to 32766 rows in length, and they can have descriptive names up to 30 characters in length. All tables, charts, and dialog boxes are formatted to accommodate long variable names with full visibility.
- The output is well-organized and well-documented. The results of each data analysis or regression analysis are stored on a separate worksheet with an identifying analysis name. Default names indicate the chronological sequence (Data Analysis 1, Data Analysis 2, ..., Model 1, Model 2, etc.) but custom names can be assigned when procedures are executed.
- Output worksheets are pre-formatted for printing. The complete output of a regression model will fit on 8.5" wide paper and can be printed out in its entirety (usually 2 or 3 pages for the default tables and charts) by just hitting the Print button. It is also easy to display and print only selected tables and charts.
- A separate model summary worksheet provides an audit trail for all regression analyses performed in the same workbook, including a table of side-by-side comparisons of model statistics.
- When the program is launched, the user is given the opportunity to enter a personal and/or project name for the analysis session. Every data analysis and regression model worksheet is then stamped at the top with a bitmapped label containing the user name, analysis name, date, and run time. This helps to authenticate individual work and contributes to the overall audit trail.

Data analysis procedure

- Produces a table of descriptive statistics and correlation matrix and optional series plots and scatterplots for up to 50 variables. Autocorrelations at lags 1-12 are also provided for time series data. Click
**here**for an example. - Variables are sorted alphabetically by default in the table and chart arrays, but there is an option to designate a variable to list first, say, the dependent variable in a multiple regression analysis or a common predictor for several dependent variables.
- The series plots, which are stacked one above the other, can show the data as points, lines, points-and-lines, or vertical bars.
- The correlation matrix uses font shading to highlight the most and least significant values, and column names are staggered down the diagonal to allow long variable names to be fully visible. When the dependent variable for a regression model is selected as the variable to list first, the first column of the correlation matrix shows its correlations with all the other variables right next to their names.
- The scatterplot matrix can be either a full square array or it can be limited to those plots in which the variable-to-list-first appears on either the X-axis or Y-axis. Each plot in the matrix is a separate Excel chart that can be independently moved, copied, re-sized, and edited.
- The bottom titles of scatterplots include the correlation and either its squared value or the slope coefficient.
- Scatterplot axes are scaled so that their limits are the minimum and maximum values of the variables, to spread the points out as fully as possible, and those values (plus the midpoints of the ranges) are shown on the axis scales to provide additional summary statistics for the viewer.
- The scatterplots may also optionally include regression lines and/or center-of-mass points.

Regression procedure

- Can handle up to 50 independent variables.
- The regression output worksheet is formatted to place the most important information at the top with as much as possible in view at one time. Click
**here**for an example of the default view of the top of the worksheet for a simple regression model. - The title of every regression table and graph is automatically labeled with the analysis name, dependent variable name, sample size, and number of independent variables. This is helpful for presentation purposes and makes it easy to trace any piece of output back to its source if it is copied elsewhere.
- R-squared and adjusted R-squared, standardized coefficients, t-statistics, F-statistics, and P-values are calculated with live formulas for educational purposes.
- Confidence limits in tables and charts are also calculated with live formulas, allowing the confidence level (stored in cell I10) to be interactively adjusted after fitting the model, which is useful for demonstrations and preparing reports.

- Standard chart output includes plots of actual and predicted values vs. row number (with confidence limits for forecasts, if any), residuals vs. observation number, residuals vs. predicted values, and a residual histogram chart. A line fit plot with confidence bands is also produced for a simple (1-variable) regression model. Click
**here**for an annotated example.

- Optional chart output includes a normal quantile plot and residuals vs. independent variable plots.

- The residuals vs. observation number plot is formatted as a column chart to highlight signs and time patterns.

- Statistical tests of residuals include the adjusted Anderson-Darling statistic and, for time series data, the Durbin-Watson statistic and autocorrelations at lags 1-7 and 12. The Anderson-Darling statistic is also printed on the residual histogram and normal quantile plots and the lag-1 autocorrelation and Durbin-Watson statistic (if present) are also printed on the residuals vs. observation number plot.
- When the
*regression*procedure is launched from an existing regression model sheet, the starting variable selections are the ones used in that model, making it easy to tweak any previously fitted model and explore the model space in a systematic way. - When the
*data analysis*procedure is launched from an existing regression model sheet, the starting variable selections are the ones used in that model, with the dependent variable designated as the first variable, so that a descriptive data analysis report can be instantly produced to match the variables and sample used in the regression model. - The regression model sheet includes an optional residual table in which, by default, rows are sorted in descending order of absolute value, so that outliers appear at the top. The table can be re-sorted by row number using only a few keystrokes, if desired.

Forecasting procedure

- If the "Forecasts for missing or additional values of the dependent variable" box is checked, a forecast table will be included on the regression model output worksheet, and forecasts for any rows where the independent variables were all present and the dependent variable was missing will be produced automatically.

- Additional forecasts can be produced after fitting the model by entering values for the independent variables into additional rows in the forecast table and then hitting the Forecast button on the RegressIt toolbar.

- Forecasts are displayed in a table and also plotted, together with confidence limits for means and forecasts. The actual-and-predicted-versus-observation-number chart will also show the forecasts and their confidence limits if the forecast table and chart are not hidden.
- If the default editable-graphs option was used, the confidence limits shown in the forecast table and charts will respond interactively to changes of the value in the confidence-level cell at the top of the worksheet.

Data transformation procedure

- Allows variables to be easily transformed with a variety of commonly used functions and, in the case of time series data, time transformations.
- Mathematical transformations include natural log, exponential, square root, power, standardization with either the sample or population standard deviation, and creation of dummy variables from either numeric or text data.
- Time transformations include lag by k periods, difference from k periods ago, difference of natural log from k periods ago, percentage change from k periods ago, deflation at a fixed rate, and centered and trailing moving averages.
- Each created variable is automatically assigned a descriptive name that concatenates the variable name and the transformation, such as X_LN for natural log of X or X_LAG1 for X lagged by 1 period, and it is stored in a new column that is inserted immediately to the right of its parent variable. This provides a clear audit trail and also ensures that the transformed variables appear adjacent to their parent variables when their names are alphabetically sorted in tables of statistics or coefficient estimates or in chart arrays.

Model summary worksheet

Click

Also be sure to visit the

- Contains side-by-side comparisons of summary statistics and coefficient estimates (with P-values) for all regression models fitted in the same workbook. This serves as an audit trail for the contents of the workbook as well as a tool for easily comparing model statistics.
- Provides a well-formatted presentation of comparative results from multiple models that is suitable for directly copying into a slide or report.
- Results for models with different dependent variables are arranged in separate blocks of rows, i.e., statistics are arranged side-by-side only for models fitted to the same dependent variable.

Click

**here**to take a tour of RegressIt's various procedures, beginning with descriptive data analysis.Also be sure to visit the

**Advice**page for more detailed information and instructions on how to take full advantage of RegressIt's features.