The coefficient of determination, commonly denoted as R2, is a statistical measure used to assess the goodness of fit of a linear regression model. It provides insight into the proportion of the variance in the dependent variable that is explained by the independent variables included in the model. In other words, it quantifies how well the independent variables account for the variability in the dependent variable.
The R2 value ranges from 0 to 1, with the following interpretations:
- R2=0: The model explains none of the variability in the dependent variable. It's essentially the same as predicting the mean of the dependent variable for all observations.
- R2=1: The model explains 100% of the variability in the dependent variable, and the predicted values match the observed values perfectly.
However, a high R2 value doesn't necessarily mean that the model is a good fit. It's possible for a model to have a high R2 value and still have issues like multicollinearity, overfitting, or omitted variables.
One limitation of R2 is that it tends to increase as you add more independent variables to the model, even if those variables are not truly meaningful predictors. This is why adjusted R2 is often used, which penalizes the addition of unnecessary variables.
The adjusted R2 accounts for the number of independent variables in the model and provides a more accurate measure of how well the model's variables explain the variance in the dependent variable.
In summary, R2 is a useful metric for assessing the overall fit of a linear regression model, but it should be used in conjunction with other evaluation techniques to ensure that the model is both accurate and reliable.