I am interested in calculating area under the curve (AUC), or the c-statistic, by hand for a binary logistic regression model.
For example, in the validation dataset, I have the true value for the dependent variable, retention (1 = retained; 0 = not retained), as well as a predicted retention status for each observation generated by my regression analysis using a model that was built using the training set (this will range from 0 to 1).
My initial thoughts were to identify the "correct" number of model classifications and simply divide the number of "correct" observations by the number of total observations to calculate the c-statistic. By "correct", if the true retention status of an observation = 1 and the predicted retention status is > 0.5 then that is a "correct" classification. Additionally, if the true retention status of an observation = 0 and the predicted retention status is < 0.5 then that is also a "correct" classification. I assume a "tie" would occur when the predicted value = 0.5, but that phenomenon does not occur in my validation dataset. On the other hand, "incorrect" classifications would be if the true retention status of an observation = 1 and the predicted retention status is < 0.5 or if the true retention status for an outcome = 0 and the predicted retention status is > 0.5. I am aware of TP, FP, FN, TN, but not aware of how to calculate the c-statistic given this information.