Ribbon interface: navigating the 4 dimensions of an Excel file
Contrary to popular belief, Excel is an excellent environment in which to carry out regression analysis, particularly in settings where data visualization is important and where alternative models are developed and evaluated by multiple individuals: analysts, clients, students, instructors. First of all, nearly everyone has it and knows at least a little about how to use it: it's the lingua franca of desktop analysis. Second, its spreadsheet format is well suited for high-quality table and chart output that is ready-made for presentation and sharing. But what is especially good is that it is intrinsically a 4-dimensional environment, and these dimensions map neatly into the tasks of generating and cataloging alternative models, traversing their output, and drilling down below the surface. However, Excel's own interface is clumsy for navigating this environment. It doesn't provide a roadmap of a file's contents, and it requires tedious and unstructured tabbing and paging and scrolling in order to locate and align results for viewing. RegressIt's workbook organization, worksheet layouts, cell formatting, and ribbon interface are designed for the efficient use and navigation of all four dimensions. The following sections on this web page describe the functions of the 3 groups of buttons that the ribbon provides for exploring the model space, the output worksheet, and the contents of cells.
Model space
The first dimension is the worksheet dimension, which represents the model space. An Excel file may contain a large stack of worksheets, and each worksheet can be used to store all the output of a single descriptive analysis or a single regression model, each of which could include many tables and charts. However, to take full advantage of this representation, it's necessary to have tools for navigating among worksheets in simple and systematic ways. Excel's own navigation system is a merely collection of tabs at the bottom of the screen, and the tab names may provide little detail of what the worksheets contain or how they are logically or temporally related, and only a small number of tabs may be visible at once. There's a tradeoff between having worksheet names that are long enough to be descriptive yet short enough to keep many tabs in view at one time.
In regression analysis it is common to create new models as refinements or extensions of existing models, and while doing this it is good to keep the whole landscape in view and be able to retrace the steps that may have led to a particular result. When there are many variables, the number of inter-related models that are tested could be quite large. And often the end stage of the analysis comes down to detailed comparisons of a few leading candidates among models, whose worksheets could be located anywhere in the file, and there are many aspects of their goodness of fit to the data that may need to be compared. It is also common for work to be performed at different times and on different computers, particularly where collaboration or consulting or instruction are involved, and it is important to keep track of what was done when and where and by whom.
The model space navigation buttons on the Regressit ribbon provide tools to systematize these activities. The VCR buttons allow the user to move one worksheet to the left or right and jump directly to the first sheet (the data sheet) or the last sheet (the Model Summaries sheet) at any time. The Model Summaries worksheet is one of the distinctive features of the program. It shows side by side comparisons of key outputs of all models fitted to any given dependent variable, presented in a journal-article-style table that is suitable for inclusion in a report. Results for models fitted to different dependent variables are stored in different blocks of rows. This worksheet provides a comprehensive audit trail of all models that have ever been fitted in the workbook, even those whose own output worksheets were later deleted. The amount of displayed detail can also be varied: you can show only the summary information for each model, or you can include diagnostics and coefficient estimates. It is also possible to jump directly to a model's worksheet if it is is present, or re-run the model if its worksheet is not present, by launching the regression procedure from the cell above the model's name in this table. Thus, all models in the analysis history can be revisited or regenerated from this one worksheet with just a few mouse clicks.
The additional model space navigation buttons perform other important tasks that are unique among Excel add-ins. The History button displays a list of all worksheets currently in the file, with descriptive information about each one, such as the name of the dependent variable and the number of independent variables and the goodness-of-fit statistics for a regression model, as well as its run time and the name of the computer on which it was created. Descriptive statistics worksheets also appear on the list. The user can jump directly to any worksheet from here, and the list can be kept open on the screen while doing so. It thus provides a compact summary of the entire contents of the workbook, as well as a random-access navigation tool.
The audit trails provided by the history list and the model summaries worksheet are useful for documenting and traversing your own work, and they are especially useful for an instructor who is evaluating the thoroughness and authenticity of work by students. They also allow the instructor to easily tweak a student's models when carrying out grading or giving feedback.
The Relatives button allows the user to move among regression models by parent-child relationships, which is useful for retracing the sequence in which variables were added or deleted and for starting off in another direction from some earlier point. It also provides yet another view of the audit trail.
The Last-Model button allows the user to toggle back and forth between the last two models whose worksheets were viewed, no matter where they are in the file, and the Compare button allows the user to zero in on the same table or chart in the output of both models while doing this. After clicking the Compare button, you can use the Last-Model button to do instant back-and-forth comparisons of the same forecast chart or residual chart produced by two models. The Last-Stats button works similarly to toggle back and forth to the last descriptive statistics worksheet that was viewed.
Another very powerful tool in Excel is the Filter tool, which allows the interactive sorting of tables on arbitrarily selected columns. RegressIt's ribbon includes this tool, which can be used for many purposes in manipulating regression model output and refining models. For example, it can be used to sort the coefficient table on the basis of P-values or standardized coefficients in order to bubble the most or least significant variables to the top. The ribbon also includes a Remove tool which can be used to flag a variable for removal from a model merely by clicking on its row in the coefficient table, which causes it to be grayed-out. When the table has been sorted in the basis of P-values or standardized coefficients, this makes it easy to eliminate insignificant variables individually or in blocks without having to search for them in the main alphabetical variable list. The Filter tool can also be used to sort a list of correlations between the dependent variable and a set of candidate independent variables in order to preselect the ones that are most promising, and it can be used for interactive sorting of the residual table on the basis of any of its statistics, such as absolute standardized residual or leverage.
RegressIt can also exchange regression model code with R, allowing it to be used as a menu-driven front or back end. If the R code button is clicked while positioned on a model worksheet, the code for running the same model in R is copied to the clipboard, from which it can be pasted to the command prompt in RStudio and re-run by hitting Enter. The user is also given a menu of options for table and chart output to include in the R code, such as a standard chart array with model-specific titles attached. Models fitted in R can also be re-run in RegressIt with just a few clicks by copying the text string for the model equation from R to any cell in the workbook and then launching a regression model from there, adding RegressIt's Excel-based output options to those of R. Thus, RegressIt and R can be used to leverage each other.
Altogether these tools provide a detailed aerial view of the contents of the workbook and their history as well as efficient methods of navigating and comparing and refining and exporting models.
The first dimension is the worksheet dimension, which represents the model space. An Excel file may contain a large stack of worksheets, and each worksheet can be used to store all the output of a single descriptive analysis or a single regression model, each of which could include many tables and charts. However, to take full advantage of this representation, it's necessary to have tools for navigating among worksheets in simple and systematic ways. Excel's own navigation system is a merely collection of tabs at the bottom of the screen, and the tab names may provide little detail of what the worksheets contain or how they are logically or temporally related, and only a small number of tabs may be visible at once. There's a tradeoff between having worksheet names that are long enough to be descriptive yet short enough to keep many tabs in view at one time.
In regression analysis it is common to create new models as refinements or extensions of existing models, and while doing this it is good to keep the whole landscape in view and be able to retrace the steps that may have led to a particular result. When there are many variables, the number of inter-related models that are tested could be quite large. And often the end stage of the analysis comes down to detailed comparisons of a few leading candidates among models, whose worksheets could be located anywhere in the file, and there are many aspects of their goodness of fit to the data that may need to be compared. It is also common for work to be performed at different times and on different computers, particularly where collaboration or consulting or instruction are involved, and it is important to keep track of what was done when and where and by whom.
The model space navigation buttons on the Regressit ribbon provide tools to systematize these activities. The VCR buttons allow the user to move one worksheet to the left or right and jump directly to the first sheet (the data sheet) or the last sheet (the Model Summaries sheet) at any time. The Model Summaries worksheet is one of the distinctive features of the program. It shows side by side comparisons of key outputs of all models fitted to any given dependent variable, presented in a journal-article-style table that is suitable for inclusion in a report. Results for models fitted to different dependent variables are stored in different blocks of rows. This worksheet provides a comprehensive audit trail of all models that have ever been fitted in the workbook, even those whose own output worksheets were later deleted. The amount of displayed detail can also be varied: you can show only the summary information for each model, or you can include diagnostics and coefficient estimates. It is also possible to jump directly to a model's worksheet if it is is present, or re-run the model if its worksheet is not present, by launching the regression procedure from the cell above the model's name in this table. Thus, all models in the analysis history can be revisited or regenerated from this one worksheet with just a few mouse clicks.
The additional model space navigation buttons perform other important tasks that are unique among Excel add-ins. The History button displays a list of all worksheets currently in the file, with descriptive information about each one, such as the name of the dependent variable and the number of independent variables and the goodness-of-fit statistics for a regression model, as well as its run time and the name of the computer on which it was created. Descriptive statistics worksheets also appear on the list. The user can jump directly to any worksheet from here, and the list can be kept open on the screen while doing so. It thus provides a compact summary of the entire contents of the workbook, as well as a random-access navigation tool.
The audit trails provided by the history list and the model summaries worksheet are useful for documenting and traversing your own work, and they are especially useful for an instructor who is evaluating the thoroughness and authenticity of work by students. They also allow the instructor to easily tweak a student's models when carrying out grading or giving feedback.
The Relatives button allows the user to move among regression models by parent-child relationships, which is useful for retracing the sequence in which variables were added or deleted and for starting off in another direction from some earlier point. It also provides yet another view of the audit trail.
The Last-Model button allows the user to toggle back and forth between the last two models whose worksheets were viewed, no matter where they are in the file, and the Compare button allows the user to zero in on the same table or chart in the output of both models while doing this. After clicking the Compare button, you can use the Last-Model button to do instant back-and-forth comparisons of the same forecast chart or residual chart produced by two models. The Last-Stats button works similarly to toggle back and forth to the last descriptive statistics worksheet that was viewed.
Another very powerful tool in Excel is the Filter tool, which allows the interactive sorting of tables on arbitrarily selected columns. RegressIt's ribbon includes this tool, which can be used for many purposes in manipulating regression model output and refining models. For example, it can be used to sort the coefficient table on the basis of P-values or standardized coefficients in order to bubble the most or least significant variables to the top. The ribbon also includes a Remove tool which can be used to flag a variable for removal from a model merely by clicking on its row in the coefficient table, which causes it to be grayed-out. When the table has been sorted in the basis of P-values or standardized coefficients, this makes it easy to eliminate insignificant variables individually or in blocks without having to search for them in the main alphabetical variable list. The Filter tool can also be used to sort a list of correlations between the dependent variable and a set of candidate independent variables in order to preselect the ones that are most promising, and it can be used for interactive sorting of the residual table on the basis of any of its statistics, such as absolute standardized residual or leverage.
RegressIt can also exchange regression model code with R, allowing it to be used as a menu-driven front or back end. If the R code button is clicked while positioned on a model worksheet, the code for running the same model in R is copied to the clipboard, from which it can be pasted to the command prompt in RStudio and re-run by hitting Enter. The user is also given a menu of options for table and chart output to include in the R code, such as a standard chart array with model-specific titles attached. Models fitted in R can also be re-run in RegressIt with just a few clicks by copying the text string for the model equation from R to any cell in the workbook and then launching a regression model from there, adding RegressIt's Excel-based output options to those of R. Thus, RegressIt and R can be used to leverage each other.
Altogether these tools provide a detailed aerial view of the contents of the workbook and their history as well as efficient methods of navigating and comparing and refining and exporting models.
Table and chart space
The second and third dimensions of the Excel modeling environment are rows and columns on individual worksheets, which are naturally suited to table output interspersed with chart output, with no a priori limits on the number of columns in tables and matrices and no arbitrary line breaks or page breaks inserted.
A single worksheet may contain a large number of tables and charts, and Excel's own scrolling and paging tools are not efficient for locating and positioning them on the screen for viewing. Line-by-line vertical scrolling can be very slow when sample sizes are large, as editable charts or high-resolution charts are continuously redrawn. The page-up and page-down keys are also inefficient because they don't keep tables and charts aligned with the margins of the viewing area. The worksheet navigation buttons on the RegressIt ribbon are designed to deal with all of these issues. The Up and Down buttons allow the worksheet to be navigated up and down by a whole table or chart at a time, so that the topmost one in view is always aligned with the top of the screen, while minimizing the time lag for re-drawing. The Top button jumps to the top left cell in the worksheet, restoring its original viewpoint. Hitting the Compare button immediately afterward restores the original viewpoints of all worksheets in the file.
Because of the quantity and detail of the output on a worksheet, it is often useful to zoom in and out while navigating up and down or focusing on details. The Zoom buttons can be used to zoom the viewpoint in and out in predefined steps, and they affect all worksheets at once, so that the zoom level remains consistent when jumping between worksheets. This is also useful when transferring work between computers with different monitor or projector resolutions. You can re-set the whole workbook to a desired worksheet zoom level with a few mouse clicks.
The Show buttons allow the user to show or hide all tables or all charts at once, or everything at once. This can speed up navigation and it allows visual clutter to be reduced when needed. Every table or chart also has a descriptive title row above it on the worksheet, and these can be displayed or not by toggling the Titles button. If the title rows are displayed while all tables and charts are hidden, the user sees an outline view of the contents of the worksheet and can open up one table or chart at a time. The title rows are very useful to include when copying tables to reports or slide presentations, because they identify the model from which the table was obtained and enable it to be traced to its source in the Excel file, another thread in the audit trail.
The Conf+ and Conf- buttons interactively increase or decrease the confidence level used in the displayed confidence limits for coefficient estimates and forecasts, and they do so in pre-defined increments. Thus, it is easy to twiddle the confidence level after fitting a model in order to see its effects on interval widths, which can be useful for demonstration and also for breaking the obsession with 95%.
There Utilities group at the far right of the ribbon includes a button for toggling gridlines on and off, which can be used in situations where it is desired to adjust column widths. Ordinarily this is not necessary, though, because numbers in tables are formatted to show enough but not too many decimal places and because tables and matrices are designed so that long variable names don't overlap each other. You can use quite long variable names in RegressIt without worrying that the output will be poorly formatted for viewing or printing.
Altogether these worksheet navigation tools (and the design of the worksheet itself) greatly simplify the user's study of potentially large amounts of table and chart output for a given model.
The second and third dimensions of the Excel modeling environment are rows and columns on individual worksheets, which are naturally suited to table output interspersed with chart output, with no a priori limits on the number of columns in tables and matrices and no arbitrary line breaks or page breaks inserted.
A single worksheet may contain a large number of tables and charts, and Excel's own scrolling and paging tools are not efficient for locating and positioning them on the screen for viewing. Line-by-line vertical scrolling can be very slow when sample sizes are large, as editable charts or high-resolution charts are continuously redrawn. The page-up and page-down keys are also inefficient because they don't keep tables and charts aligned with the margins of the viewing area. The worksheet navigation buttons on the RegressIt ribbon are designed to deal with all of these issues. The Up and Down buttons allow the worksheet to be navigated up and down by a whole table or chart at a time, so that the topmost one in view is always aligned with the top of the screen, while minimizing the time lag for re-drawing. The Top button jumps to the top left cell in the worksheet, restoring its original viewpoint. Hitting the Compare button immediately afterward restores the original viewpoints of all worksheets in the file.
Because of the quantity and detail of the output on a worksheet, it is often useful to zoom in and out while navigating up and down or focusing on details. The Zoom buttons can be used to zoom the viewpoint in and out in predefined steps, and they affect all worksheets at once, so that the zoom level remains consistent when jumping between worksheets. This is also useful when transferring work between computers with different monitor or projector resolutions. You can re-set the whole workbook to a desired worksheet zoom level with a few mouse clicks.
The Show buttons allow the user to show or hide all tables or all charts at once, or everything at once. This can speed up navigation and it allows visual clutter to be reduced when needed. Every table or chart also has a descriptive title row above it on the worksheet, and these can be displayed or not by toggling the Titles button. If the title rows are displayed while all tables and charts are hidden, the user sees an outline view of the contents of the worksheet and can open up one table or chart at a time. The title rows are very useful to include when copying tables to reports or slide presentations, because they identify the model from which the table was obtained and enable it to be traced to its source in the Excel file, another thread in the audit trail.
The Conf+ and Conf- buttons interactively increase or decrease the confidence level used in the displayed confidence limits for coefficient estimates and forecasts, and they do so in pre-defined increments. Thus, it is easy to twiddle the confidence level after fitting a model in order to see its effects on interval widths, which can be useful for demonstration and also for breaking the obsession with 95%.
There Utilities group at the far right of the ribbon includes a button for toggling gridlines on and off, which can be used in situations where it is desired to adjust column widths. Ordinarily this is not necessary, though, because numbers in tables are formatted to show enough but not too many decimal places and because tables and matrices are designed so that long variable names don't overlap each other. You can use quite long variable names in RegressIt without worrying that the output will be poorly formatted for viewing or printing.
Altogether these worksheet navigation tools (and the design of the worksheet itself) greatly simplify the user's study of potentially large amounts of table and chart output for a given model.
Cell content space
The fourth dimension of the Excel environment is one that is under-appreciated, namely the multiple attributes and layers of information that can be stored within individual cells. The cell layer buttons on RegressIt's ribbon provide access to what is underneath.
A cell may contain a variety of things. It may contain a text label or a number, and underneath there could be either a fixed value or a formula. In RegressIt, many of the numerical cells on a regression model output worksheet (and some of the text cells as well) contain formulas rather than values. The formulas serve two purposes. First, they allow the interactive manipulation of confidence intervals for coefficients and forecasts that are displayed in tables and charts, and second, they show the formulas by which many other regression statistics are calculated. In particular, the formulas for computing R-squared and adjusted R-squared, the standard error of the regression, standardized coefficients, t-statistics, F-statistics, and P-values are all shown inside their respective cells. These can be used as an aid in teaching and demystifying the equation logic of regression models.
Cells may also contain comments, which are paragraphs of text that pop up when the user moves the mouse over them, when their indicators (little red flags) are turned on. In RegressIt, cell comments are used to display fairly detailed teaching notes attached to the headers for most the key outputs of a regression model. They are not included by default, but as an option selected at run time. Their flags can be toggled on and off via the Notes button on the ribbon. The View button can be used to display the comment in the currently selected cell and to position it at the top center of the screen rather than off to the side, which works well on computers with small screens or touchscreens. There are presently around 10 thousand words of teaching notes, and more are planned. Let us know if you would be interested in substituting your own notes in an institutional version.
Cell comments are also used to display more layers of detail behind coefficient estimates on the model summaries worksheet. Regardless of how a displayed coefficient is formatted on this sheet (standardized or unstandardized, with or without P-value), the point estimate, standard error, t-statistic, P-value, variance inflation factor, and standardized coefficient are stored within the attached cell comment. And behind the model name on a model worksheet or model summaries worksheet is a comment that contains detailed information about the circumstances under which the model was run, yet another link in the audit trail.
Cells also have attributes that include fill colors and font colors, which are toggled on and off via the Colors and Fonts buttons. These can used to highlight signs and magnitudes of numbers such as correlations and autocorrelations, t-statistics, and standardized coefficients. They serve to focus attention on those numbers that are most significant and to make it easier to spot patterns among them. A pre-defined color scale that ranges from dark blue (very significantly positive) to dark red (very significantly negative) is used.
So, there is much more than first meets the eye in the contents of many cells on a RegressIt output worksheet.
The fourth dimension of the Excel environment is one that is under-appreciated, namely the multiple attributes and layers of information that can be stored within individual cells. The cell layer buttons on RegressIt's ribbon provide access to what is underneath.
A cell may contain a variety of things. It may contain a text label or a number, and underneath there could be either a fixed value or a formula. In RegressIt, many of the numerical cells on a regression model output worksheet (and some of the text cells as well) contain formulas rather than values. The formulas serve two purposes. First, they allow the interactive manipulation of confidence intervals for coefficients and forecasts that are displayed in tables and charts, and second, they show the formulas by which many other regression statistics are calculated. In particular, the formulas for computing R-squared and adjusted R-squared, the standard error of the regression, standardized coefficients, t-statistics, F-statistics, and P-values are all shown inside their respective cells. These can be used as an aid in teaching and demystifying the equation logic of regression models.
Cells may also contain comments, which are paragraphs of text that pop up when the user moves the mouse over them, when their indicators (little red flags) are turned on. In RegressIt, cell comments are used to display fairly detailed teaching notes attached to the headers for most the key outputs of a regression model. They are not included by default, but as an option selected at run time. Their flags can be toggled on and off via the Notes button on the ribbon. The View button can be used to display the comment in the currently selected cell and to position it at the top center of the screen rather than off to the side, which works well on computers with small screens or touchscreens. There are presently around 10 thousand words of teaching notes, and more are planned. Let us know if you would be interested in substituting your own notes in an institutional version.
Cell comments are also used to display more layers of detail behind coefficient estimates on the model summaries worksheet. Regardless of how a displayed coefficient is formatted on this sheet (standardized or unstandardized, with or without P-value), the point estimate, standard error, t-statistic, P-value, variance inflation factor, and standardized coefficient are stored within the attached cell comment. And behind the model name on a model worksheet or model summaries worksheet is a comment that contains detailed information about the circumstances under which the model was run, yet another link in the audit trail.
Cells also have attributes that include fill colors and font colors, which are toggled on and off via the Colors and Fonts buttons. These can used to highlight signs and magnitudes of numbers such as correlations and autocorrelations, t-statistics, and standardized coefficients. They serve to focus attention on those numbers that are most significant and to make it easier to spot patterns among them. A pre-defined color scale that ranges from dark blue (very significantly positive) to dark red (very significantly negative) is used.
So, there is much more than first meets the eye in the contents of many cells on a RegressIt output worksheet.
Intelligent formatting of numbers and tables.
In many statistical analysis languages or packages, numbers are routinely displayed with far more digits than are meaningful, and tables in which they appear are formatted with no consistent alignment of decimal points and mangled by arbitrary line breaks, because there is no other way store the results at full precision in text output with a fixed page width. This is an obstacle to careful study of the values, and it is an insult to the eye of the user. The spreadsheet environment allows values to be displayed with an appropriate number of digits for viewing while storing them at full precision underneath, and there are no a priori constraints on numbers of columns or characters per line. In tables produced by RegressIt, numbers are formatted to show enough but not too many decimal places of precision, with digits to the right of the decimal point added in groups of three to keep alignments as simple as possible, and the tables are designed so that no adjustments of column widths are needed in order to avoid cutting off long variable names in row or column headings. Even other Excel add-ins generally do not take these issues into account.
In many statistical analysis languages or packages, numbers are routinely displayed with far more digits than are meaningful, and tables in which they appear are formatted with no consistent alignment of decimal points and mangled by arbitrary line breaks, because there is no other way store the results at full precision in text output with a fixed page width. This is an obstacle to careful study of the values, and it is an insult to the eye of the user. The spreadsheet environment allows values to be displayed with an appropriate number of digits for viewing while storing them at full precision underneath, and there are no a priori constraints on numbers of columns or characters per line. In tables produced by RegressIt, numbers are formatted to show enough but not too many decimal places of precision, with digits to the right of the decimal point added in groups of three to keep alignments as simple as possible, and the tables are designed so that no adjustments of column widths are needed in order to avoid cutting off long variable names in row or column headings. Even other Excel add-ins generally do not take these issues into account.