Here are some additional details of RegressIt's inputs and outputs, as well as some advice on how to use it, to go along with the list of

**features**presented earlier. If you have encountered any problems, see the**tech support**page for more information.**1. Be sure you are using the latest version of RegressIt.**It is currently version 2.2.2, released on May 7. To check your version number, click the Info ("i") button on right end of the RegressIt toolbar. Updating is easy: just replace your existing RegressIt.xlam file with the one on the**download page.**We periodically make enhancements, and they are always backward-compatible.**2. Take advantage of the ability to enter user/session names and analysis names:**You are encouraged to enter an identifying user name and/or session name at the time RegressIt is launched and also to assign your own descriptive names to data analyses and regression analyses at the time they are run, at least for those you think you will wish to keep. The user/session name is stamped in bitmap form at the top of every analysis worksheet, and the analysis name serves as the worksheet name and is also included in the title of every table and chart on a regression output worksheet, as shown in this**example of regression output**. (Default names are of the generic form "Data Analysis n" and "Model n" for the n-th data analysis or regression model.) These program features help to make all of your output self-documenting and presentation-ready to the greatest extent possible. A tiny amount of attention to the naming of sessions and models enhances the audit trail, makes the workbook easier to navigate, and reduces the possibility of mistakes and oversights.

**3. Visualize your data.**Don't just look at test statistics and their P-values. Take advantage of RegressIt's high-quality chart output to see what your variables look like, how their patterns line up, how well the model accounts for those patterns, what adjustments to the data or the model may be needed, and how best to illustrate your findings for others.**RegressIt makes it easy to explore variations on regression models, and it is not uncommon to create a large number of model worksheets in the same workbook. It's OK to delete the ones that are inferior: their summary statistics and coefficient estimates are stored on the model summary worksheet, so they will remain part of the audit trail.**

4. You don't have to save the worksheet for every regression model:4. You don't have to save the worksheet for every regression model:

**Normally it is best to use the default editable-graphs option when running a data analysis or regression model. When this is done, each chart object contains the data that it displays, and it is subject to arbitrary editing and reformatting. Also, editable graphs require less storage space than bitmapped graphs for sample sizes up to around 1000. If your data set is much larger than that, you may wish to turn off the editable-graphs option, at least during the preliminary phase of the analysis when many models are being explored, so as to keep file sizes more manageable**

5. Editable vs. bitmapped graphs:5. Editable vs. bitmapped graphs:

**.**

6.6.

**Running the data analysis procedure immediately after a regression yields matching descriptive statistics:**If you launch the data analysis procedure from a regression model worksheet, the default variable selections are those of the regression model, with the dependent variable designated as the variable to list first in the table and chart arrays. This allows a descriptive statistics report to be instantly generated for the same variables and sample used in a given regression model.**7. The confidence level can be adjusted interactively after fitting a model:**The confidence level that is stored in cell I10 on a regression model sheet can be interactively adjusted after the model is fitted, and all confidence limits in tables will be updated accordingly. If the default editable-graphs option has been chosen, confidence bands on charts will also be updated.**On the output sheet for a simple regression model, the formulas for calculating the regression line and confidence bands on the line fit plot are located behind the plot itself, in a nicely formatted table. If you grab the line fit plot and drag it to the right, you can see the table. It shows the calculations of predictions as well as standard errors and confidence limits for both means and predictions for 5 equally spaced values of the independent variable. The cells in the table contain live formulas, so you can plug in different values for the independent variable if you wish to see the corresponding results for them. It is easy to rescale the axes on the plot to show new values that are outside the original ranges of the variables if necessary: just right-click on an axis scale, choose "Format Axis" from the pop-up menu, and set the minimum and maximum and crossing point to "Auto."**

8. Regression line formulas are behind the line fit plot:8. Regression line formulas are behind the line fit plot:

**9. Check the "Time series statistics" box when working with time series data:**Both the data analysis and regression procedure menus include a check-box for time series statistics. You should generally check this box when working with time series data. It has several functions. First, it causes a table of autocorrelations to be added to a data analysis output worksheet and a table of residual autocorrelations to be added to a regression model output worksheet. (See items 15 and 16 below for more about these.) Second, it activates the time transformation options (such as lag, difference, percent-change, etc.) in the variable transformation procedure. If you have not checked the time series statistics box first, the menu of options that you see when you click the variable-transformation button will not include time transformations. And third, for a regression model, checking this box will cause connecting lines to be drawn between points on the actual-and-predicted-vs-observation-number chart.**10. How to copy and paste tables and charts:**Many tables and charts on regression model worksheets contain numbers computed with live formulas. This allows the interactive adjustment of confidence limits, as noted above, and it has instructional value in showing the equations by which various statistics are computed, including R-squared and adjusted R-squared, t-stats, F-stats, P-values, standardized regression coefficients, standardized residuals, and standard errors of residual autocorrelations. It also has important implications for copying and pasting results. If you copy a table from the output of a regression model to another part of the Excel worksheet, you should paste the results as*values*rather than formulas, so that links do not get misdirected.*For best results, use the Paste-Special/Values-and-Number-Formats command when pasting a table of regression statistics, and immediately afterward, while the range of pasted cells is still selected, also execute the Paste-Special/Formats command to get identical text formatting such as boldface column headings.*If you are pasting a table or chart to Word or Powerpoint, you can use any of the paste options or the paste-link option depending on how much or little further editing updating you wish to be able to do. You can even embed the Excel file in a Powerpoint file by pasting any portion of it as an Excel object, although this is not recommended for large data sets. It is generally best to use one of the*picture*formats when pasting tables and charts into Word or Powerpoint files in order to preserve the integrity of the results. If you wish to edit a chart for presentation, we recommend that you make a second copy of it in the Excel file, edit it there as desired, and then copy and paste the edited version to your document in picture form.**Each data analysis or regression output worksheet is pre-formatted for printing in 8.5" width to a printer or pdf file, and it can all be printed at once by just hitting the print button. However, it is possible that page breaks will occur in the middle of tables or charts if they are not adjusted beforehand. This is easy to do: just go to View/Page Break Preview on the Excel menu. You will then see a reduced-size image of the worksheet on which page breaks (if any) are indicated by blue horizontal dashed lines. If one of these lines falls in the middle of a table or chart, just grab it and drag it upward, then adjust any others below it if needed. Then click the Normal button on the View toolbar to return to the normal worksheet view, and proceed with your printing.**

11. How to adjust page breaks in the print area:11. How to adjust page breaks in the print area:

**It is often desirable to enlarge an editable chart to fill an entire sheet, using the move-or-copy sheet command. The charts are formatted so that they look good by default when enlarged in this way, and they are subject to further editing if needed. It helps to be systematic when doing this, in order to preserve a clean audit trail in the workbook. The following approach is recommended. First, use the ordinary copy-and-paste command to make a second copy of the chart on the original worksheet. Then move the copied chart to a new worksheet by right-clicking on it and choosing Move from the pop-up menu. Immediately after doing this,**

12. The best way to move charts to new sheets:12. The best way to move charts to new sheets:

*move the new chart worksheet to a position to the right of the model summary worksheet by right-clicking its name tab, choosing Move-or-Copy from the pop-up menu**, and scrolling down the "Before Sheet" list to select a place at the bottom.*This will serve two purposes. First, it will keep all the enlarged charts together, which makes it easier to find them and page back and forth among them. Second, and more importantly, it will prevent the worksheet counter from getting confused about where to insert the next data analysis or regression sheet. If you insert a new chart sheet in the middle of the existing data analysis and regression worksheets, any additional data analysis or regression worksheets will be out of order. It is also advisable to rename a chart worksheet immediately after it is created. The default name it is assigned will be "Chart XXX", where XXX is a large number that is determined by the total number of individual charts on all the analysis worksheets that have been created so far. You may want to assign it a more descriptive name such as "Model 2 line fit".**On the data analysis output worksheet, the print width is automatically set to whatever is needed to show the summary statistics table, the autocorrelation table (if any), and the series plots (if any), with full visibility. If the analysis contains a large number of variables, it is possible that the correlation matrix or the scatterplot matrix (if any) will not fit in the same width. If that is the case, and you wish to print them out, you should select their areas individually before printing. In the case of a scatterplot matrix, you are usually only interested in a subset of it anyway. Note that because you have the option to designate the variable to be listed first in the table and chart arrays (say, the dependent variable for the regression models), it is always possible for the correlations and scatterplots which involve that variable to be positioned at the left edge of the default print area.**

13. How to print large correlation matrices and scatterplot matrices:13. How to print large correlation matrices and scatterplot matrices:

**When (and only when) the "Forecasts for missing or additional values" box is checked at the time a regression model is fitted, a forecast table and chart will be created on the output worksheet, and forecasts will be calculated by default for every row in which the dependent variable is missing and the independent variables are all present. If the editable-graphs box was also checked, it is possible to compute additional forecasts after fitting the model: just enter values for the independent variables in one or more additional rows in the forecast table, beneath their respective names, and hit the "Forecasting" button on the RegressIt toolbar. If the forecast table and chart are maximized (i.e., unhidden), the forecasts will also be plotted on the actual-and-predicted-vs-observation-number chart. The forecast table and chart can also be minimized by clicking the "-" box next to them, in which case the forecasts will also disappear from the actual-and-predicted chart.**

14. Manual vs. automatic calculation of forecasts:14. Manual vs. automatic calculation of forecasts:

**When the time series statistics box is checked for a regression model, the output includes a table of residual autocorrelations. Standard errors for the autocorrelations are also calculated, and they are stored in the second row below the autocorrelations themselves, although**

15. Standard errors of residual autocorrelations are available:15. Standard errors of residual autocorrelations are available:

*by default their font color is set to white*in order to hide them so as to prevent visual clutter. If you want to see them, just highlight this row of cells and click the font color button on the Excel toolbar. The standard error of the lag-k autocorrelation is 1/SQRT(n-k), where n is the sample size, and an autocorrelation is significantly different from zero at the 0.05 level if its absolute magnitude is greater than 2 standard errors.**When the time series statistics box is checked for a data analysis, a table of autocorrelations is produced. (It is minimized by default: click the "+" in the sidebar to show it.) The labels in column A of the autocorrelation table are formatted for the easy creation of well-labeled column charts, which is the way autocorrelations are normally plotted. For example, if Y is the name of one of the variables, its row title in the table is "Autocorrelations of Y". To create a nicely formatted autocorrelation plot with that character string as the chart title, proceed as follows. Highlight the row containing the autocorrelations, including the row title in column A. Click the Insert/Column-chart button on the Excel toolbar to create a column chart from the data. Then delete the legend, right-click on the vertical axis scale, choose "Format Axis", and change the limits to +/- 1. This takes about 30 seconds.**

16. How to create an autocorrelation chart in the data analysis procedure:16. How to create an autocorrelation chart in the data analysis procedure:

**When a regression model is run, it is possible to have its predicted values and residuals stored in a table at the bottom of the model sheet, sorted in descending order of absolute value of the residuals by default so as to highlight the largest errors. It is also possible to have their values written back to the data worksheet, where they will be stored in new columns inserted right next to the dependent variable. However, there is no menu command for computing or storing the corresponding standard errors and confidence limits. (These are calculated for out-of-sample predictions, if any, but not for those within the sample.) If you want to do it anyway, it is not hard, provided the data set is not too large. First, copy the entire data range on the data worksheet and use the Paste-Special/Values command to paste the copy immediately below the original range, so that you now have a duplicate set of values for each variable, i.e., twice as many rows and the same number of columns. Second, go to Formulas/Name Manager on the Excel menu and delete all the existing range names. Third, highlight the entire new data range (including the duplicate rows for all the columns) and hit the Create-Variable-Name button to re-assign the original names to the now-twice-as-long variables. Finally,**

17. How to calculate standard errors and confidence intervals for all the predictions:17. How to calculate standard errors and confidence intervals for all the predictions:

*delete the duplicate values for the dependent variable*. Then run or re-run your models using the "Forecast missing or additional values" option. Forecasts and confidence limits for the missing values of the dependent variable in the extra rows will be computed automatically and shown in the forecast table. (Note: these calculations can be time-consuming for large data sets, so with those you will probably not want to turn the forecasting option on for every model.)**There are no universally accepted formulas for calculating R-squared and adjusted R-squared for a regression model that does not include a constant, as explained**

18. Calculation of R-squared and adjusted R-squared for a regression with no constant:18. Calculation of R-squared and adjusted R-squared for a regression with no constant:

**here**, and some software packages do not even report them. RegressIt follows the same convention used by SPSS for no-constant models, namely: R-squared is defined as 1 minus {the sum of squared residuals divided by the sum of squared values of the dependent variable}, and adjusted R-squared is defined as 1 minus {the square of the standard error of the regression divided by the mean squared value of the dependent variable}. These formulas appear in the cells for R-squared and adjusted R-squared on the worksheet, and the regression statistics table shows the root-mean-squared value of the dependent variable rather than its standard deviation. You should generally ignore them, however, and in particular do not try to compare them to the corresponding statistics for models that do include a constant.**There are many different statistical tests for normally distributed errors. The one that is used in RegressIt is the adjusted Anderson-Darling statistic, which is considered by many to be the best for use in regression analysis over a wide range of sample sizes. The formulas for computing it, along with a discussion, can be found**

19. Calculation and interpretation of the adjusted Anderson-Darling statistic:19. Calculation and interpretation of the adjusted Anderson-Darling statistic:

**here**, and an Excel worksheet that illustrates the calculations can be found**here**. However, it is often too much to expect that a regression model's errors will be normally distributed enough to satisfy this test or any of the others, so it should not be used obsessively. In fact, the normality assumption is not strictly necessary in regression analysis except for purposes of calculating confidence intervals for coefficient estimates and predictions. If the Anderson-Darling statistic raises a red flag, the shape of the distribution of the errors (as revealed by the residual histogram and normal quantile plots) and the most extreme errors (which appear by default at the top of the residual table on the regression model worksheet) should be studied carefully to determine whether there is a systematic departure from normality or whether a few very large errors are to blame, and in the latter case, whether those very large errors had a lot of leverage with respect to estimation of the coefficients and whether they were due to circumstances that are likely to be repeated.**The make-dummy-variable tool in the variable transformation procedure can be used to create dummy (0-1) variables for all distinct values of a given input variable. The input variable may have either text or integer values. The names that are automatically assigned to the dummy variables are of the form X_EQ_zzz, where X is the input variable name and zzz is one of its unique values. The cells in the data ranges for the created variables contain**

20. Creating dummy variables:20. Creating dummy variables:

*formulas*to compute the 0's and 1's, not hard-coded values. (This is done in order to leave an audit trail and also allow for editing of the input values later.) These formulas are of the form =IF(TEXT(inputvalue,"0") = zzz, 1, 0). For example, if you have a variable called Month that contains month-of-year data coded in text form (January, February, ...), the names of dummy variables created from it would be Month_EQ_January, Month_EQ_February, etc. If the Month variable is in column A, with data beginning in row 2, then the formula for the value of the Month_EQ_January variable in row 2 would be =IF(TEXT(A2,"0") = "January", 1, 0). The TEXT(.,"0") function rounds off numeric values. This means that*if the input values are numbers that are not integers, the rounded-off values will not match the actual data, and the values in the column will probably consist entirely of 0's*. If the data range for the dummy variables is huge (i.e., if you have a large data set and a large number of dummy variables), you may wish to convert the formulas to values by selecting the range, copying it to the clipboard, and then using the Paste-Special/Values command to paste it back on top of itself in value format.**If you click on the cell containing the run time of a model on the model summary worksheet, a pop-up box will appears that shows the elapsed time needed to perform the analysis and produce the table and chart output. See the**

21. Elapsed times are available on the model summary worksheet:21. Elapsed times are available on the model summary worksheet:

**Model Summary**page on this site for an example.