Logistic regression

A separate version of the new program, called

The logistic version of RegressIt has interactive output features which you will not find in other software. They have been built into the program because often one of the objectives of logistic regression analysis is to explore tradeoffs between the two kinds of classification error--false positives and false negatives--and it is helpful to have immediate visual feedback when doing so. These tradeoffs are relevant to the choice of an appropriate "cutoff value" for converting probabilistic predictions into categorical predictions when needed. More generally the table and chart designs help you to visualize your data and to better understand the complex mathematical properties of a logistic model and to be prepared to illustrate them in front of others in a teaching or consulting environment. These features are demonstrated in two examples on the web pages that follow. One example is an analysis of the famous

A separate version of the new program, called

**RegressItLogistic**(which runs on PC's only) performs both linear and logistic regression analysis. It has the same ribbon and menu interfaces as RegressItPC and RegressItMac and the same presentation-ready table and chart design, but with additional analysis tools and output options that are specialized for logistic models. Like the other two versions of the program it can also export R code, so you can re-run a logistic model in R that you have previously fitted in RegressIt in a matter of seconds, with an assortment of outputs.The logistic version of RegressIt has interactive output features which you will not find in other software. They have been built into the program because often one of the objectives of logistic regression analysis is to explore tradeoffs between the two kinds of classification error--false positives and false negatives--and it is helpful to have immediate visual feedback when doing so. These tradeoffs are relevant to the choice of an appropriate "cutoff value" for converting probabilistic predictions into categorical predictions when needed. More generally the table and chart designs help you to visualize your data and to better understand the complex mathematical properties of a logistic model and to be prepared to illustrate them in front of others in a teaching or consulting environment. These features are demonstrated in two examples on the web pages that follow. One example is an analysis of the famous

**Titanic data set**that was the subject of a**Kaggle**data science competition. For comparison, the use of R to fit a similar logistic regression model to the same data can be found at**this link**. The other example is an analysis of the**GLOW data set**that is studied in detail in the classic textbook of logistic regression by Hosmer and Lemeshow.For those who aren't already familiar with it, logistic regression is a tool for making inferences and predictions in situations where the dependent variable is

A logistic regression model approaches the problem by working in units of

In a logistic model, a linear equation is used to generate predictions and confidence limits in units of log odds, which are converted back to units of probability by the formula above. The predicted probabilities for the binary dependent variable (and their associated confidence limits) smoothly approach 0 and 1 as values of independent variables go to their own extremes. If you plot the predictions against a single continuously-distributed independent variable while holding the others fixed, you see an S-shaped or reverse-S shaped "logistic curve."

**binary**, i.e., an indicator for an event that either happens or doesn't. For quantitative analysis, the outcomes to be predicted are coded as 0’s and 1’s, while the predictor variables may have arbitrary values. An ordinary linear regression model is inappropriate in this situation. You can try it, but the results are often illogical or inapplicable. The errors of a linear regression model fitted to binary data cannot be normally distributed nor identically distributed for all values of the predictions, the predictions and confidence limits can stray outside the unit interval, and such a model is incapable of finely discriminating among probabilities that are close to 0 or 1, which can be important in situations where there are small but non-zero chances of events with large consequences.A logistic regression model approaches the problem by working in units of

**log odds**rather than probabilities. Let p denote a value for the predicted probability of an event's occurrence. The corresponding log odds value is LogOdds = LN(p/(1-p)), where LN is the natural log function. The inverse relationship is p = EXP(LogOdds)/(1+EXP(LogOdds)) where EXP is the exponential function. As the value of p goes from 0 to 1, the corresponding value of log odds goes symmetrically from minus infinity to plus infinity, and vice versa. A probability of 1/2 corresponds to a log odds value of 0, and in general the log odds value for probability p is minus the log odds value for probability 1-p.In a logistic model, a linear equation is used to generate predictions and confidence limits in units of log odds, which are converted back to units of probability by the formula above. The predicted probabilities for the binary dependent variable (and their associated confidence limits) smoothly approach 0 and 1 as values of independent variables go to their own extremes. If you plot the predictions against a single continuously-distributed independent variable while holding the others fixed, you see an S-shaped or reverse-S shaped "logistic curve."

In some applications of logistic regression the objective of the analysis is to come up with a model to generate a predicted probability of the dependent event under a given set of values for the independent variables. In other settings the objective is to make categorical predictions: a definite 1 or a definite 0 in each case. Usually this is done by adding one more parameter to the model: a cutoff value. If the predicted probability of a positive outcome is greater than the cutoff value, it is categorically predicted to be a 1, otherwise 0. The chosen cutoff value ought to depend on the relative consequences of type 1 and type 2 errors (false positives and false negatives), although a value of 0.5 is often used by default.

Basic computer output for a logistic regression model looks very much the same as for a linear regression model, including a number called R-squared, a table of coefficient estimates for independent variables, an analysis-of-variance table, and a residual table. Many other outputs are specialized for logistic regression, including classification tables that show the model's performance in making binary predictions based on a cutoff value as well as a variety of specialized statistics and charts that help you to visualize and test the model's assumptions. These are illustrated in the examples that follow and in the accompanying Excel files on the download page.