RegressIt features for R users (and not-yet-R-users)
RegressIt has a novel interface for exchanging models between Excel and RStudio, which allows you to take advantage of the analysis options of both environments. You can use Excel as a menu-driven front end for fitting linear and logistic regression models in RStudio, with no writing of R code, and you can use RStudio as a back end for producing output in Excel, while at the same time getting customized output in RStudio that is more detailed and better formatted than the default outputs of the lm and glm procedures. For a demonstration of how this feature works, watch this 4-minute video: RegressIt_Excel_to_R_interface.mp4. The analysis is described in detail in the next few web pages in this sequence. For instructions in hard-copy form, see this document: RegressItInterfaceWithR.pdf. If you have not previously installed or used R, see the how-to-install-R page.
The code that RegressIt's menu interface exports to RStudio can send output back to Excel on a new worksheet formatted exactly like those produced within Excel. The table output produced within RStudio has also been customized to look similar to what is seen in Excel. This allows you to take advantage of R's additional modeling features such as stepwise variable selection, out-of-sample forecasting, and cross-validation while carrying out all your analysis in the Excel environment. The following flow chart shows the sequence of steps, in which only a few clicks are needed:
RegressIt has a novel interface for exchanging models between Excel and RStudio, which allows you to take advantage of the analysis options of both environments. You can use Excel as a menu-driven front end for fitting linear and logistic regression models in RStudio, with no writing of R code, and you can use RStudio as a back end for producing output in Excel, while at the same time getting customized output in RStudio that is more detailed and better formatted than the default outputs of the lm and glm procedures. For a demonstration of how this feature works, watch this 4-minute video: RegressIt_Excel_to_R_interface.mp4. The analysis is described in detail in the next few web pages in this sequence. For instructions in hard-copy form, see this document: RegressItInterfaceWithR.pdf. If you have not previously installed or used R, see the how-to-install-R page.
The code that RegressIt's menu interface exports to RStudio can send output back to Excel on a new worksheet formatted exactly like those produced within Excel. The table output produced within RStudio has also been customized to look similar to what is seen in Excel. This allows you to take advantage of R's additional modeling features such as stepwise variable selection, out-of-sample forecasting, and cross-validation while carrying out all your analysis in the Excel environment. The following flow chart shows the sequence of steps, in which only a few clicks are needed:
It is not necessary to know anything about writing R code in order to use these features, but you will see it in action and have the opportunity to tweak it, and you will get a gentle introduction to the RStudio environment. The code is written by RegressIt and stored in a script file that can be executed immediately by hitting Ctrl-V and Enter at the command prompt in RStudio, which runs a single line of code that runs the script. The script file also leaves an audit trail and could be edited and re-run later if you wish.
The models you run in R can start from selections of variables that were used in models previously fitted with RegressIt, but they don't have to. The Excel file does not even need to contain the full data set. It only needs to have a worksheet with matching variable names (Excel range names), not the actual data, so you are not constrained by its 1M row limit. Even if you are not ordinarily an Excel user, this could be helpful when building models with very large numbers of variables, so that no editing of long equations is needed in R. The model sheets in Excel that are generated from the R code provide an audit trail (including side-by-side statistical comparisons on the model summaries worksheet) and they can be used to interactively add or remove variables and re-launch the models. We welcome your comments and suggestions for enhancement of this feature.
The Generate-R-Code dialog box is shown below and illustrated with examples on the next page. It offers you the following options:
A picture of the R code options dialog box is shown below, with its options for table and chart output, variable selection, model testing and cross-validation, and exporting formatted output back to Excel. Go on to the next page: Excel to R.
The models you run in R can start from selections of variables that were used in models previously fitted with RegressIt, but they don't have to. The Excel file does not even need to contain the full data set. It only needs to have a worksheet with matching variable names (Excel range names), not the actual data, so you are not constrained by its 1M row limit. Even if you are not ordinarily an Excel user, this could be helpful when building models with very large numbers of variables, so that no editing of long equations is needed in R. The model sheets in Excel that are generated from the R code provide an audit trail (including side-by-side statistical comparisons on the model summaries worksheet) and they can be used to interactively add or remove variables and re-launch the models. We welcome your comments and suggestions for enhancement of this feature.
The Generate-R-Code dialog box is shown below and illustrated with examples on the next page. It offers you the following options:
- A choice between linear and logistic regression for the dependent and independent variables selected in RegressIt.
- An a-la-carte selection of tables and charts.
- A choice between using all variables or a subset chosen by forward or backward stepwise selection, with or without tracing the steps in the output produced.
- A variety of options for training and testing, all of which can be combined any variable selection method:
- Fit a single model to all data or a subset thereof with no out-of-sample testing
- Fit a single model to a specified subset of the data and test on another subset (not necessarily all-other)
- Perform k-fold cross-validation using randomly generated folds or an integer variable in the Excel file to define the folds.
- Fit models to disjoint subsets of the data (with the sets defined by an integer variable in the Excel file) with or without testing each one on the rest of the data.
- Fit models to random subsets of the data (with a choice of how much data to hold out and how many iterations to run) and test each one on the remaining data
- Options for exporting well-formatted tables of predictions for the fitted set and/or test set. They are placed on the clipboard and the program pauses to give you the opportunity to paste them to the Excel file or elsewhere.
- In all of the options that involve fitting multiple models, a table of comparative error measures of all models is produced, and it is automatically copied to the clipboard for pasting to another location.
A picture of the R code options dialog box is shown below, with its options for table and chart output, variable selection, model testing and cross-validation, and exporting formatted output back to Excel. Go on to the next page: Excel to R.