In the previous post we started exploring statistical domain and will dive in more deeply today. So basically we will try to see what all the values in summary(model) in R suggest.

Here is a screenshot of how this summary looks :

**Significant of Residue?**

- We want our residues to be normally distributed and centered around zero
- It is like throwing at the arrow-board
- If it is missing in just one direction there is a scope of improvement
- If it is missing equally in all directions than we can try to reduce standard deviation
- Irreducible error should always be observed in all the directions simultaneously.

- Residues quantile gives us the first look at symmetric
- And R also gives standard deviation of residuals known as RSE : residue standard error

**What is the relationship between t value and p-value in the coefficient section?**

- With this values what R is trying to test is if variable has any relationship with the output
- This is preset statistical question (Null hypothesis) and you cannot change it.

- If coefficient is zero then it is not contributing, otherwise it is.
- So t values is number of standard deviation mean is away from zero.
- Larger the t value more the significance of variable.
- Actually all these is related to probability and sampling.
- You keep taking samples from larger population.
- For each sample there will be different coefficient.
- For some sample it can be zero as well.

- So in the result which R displays we have a mean and standard deviation.
- Coefficient is probabilistic variable centered at mean (Estimate in R summary).
- Mean is away from zero by t standard deviation.
- What is the probability of observing coefficient beyond t standard deviation?
- This probability is given by p-value, which is Probability (coefficient > [t deviation from mean])

**Role of R^2**

How to interpret R^2?

- It shows how much of the variance is explained by the model. See formulas for greater understanding.

Why use R^2 over RSE?

- R^2 has an advantage over RSE because it is always between 0 and 1

What can be considered as good value of R^2?

- Good value of R^2 depends on problem setting. In physics when we are sure that data comes from linear model it is close to 1. While in marketing domin very small proportion of the variance can be explained by predictor. So R^2 = 0.1 is also realistic.

Difference between absolute and adjusted R^2?

- R^2 always increases with no of variables, but adjusted R^2 decrease if added variable is not significant
- Formula of adjusted R^2 somehow contains no of variables, so when the variables is added and gain is not significant result actually deceases.
- Sometimes RSE increases while RSS decreases in the below formula
- Not RSS and RSE are not related to R^2, this is just to show possible formula

## F Statistics

Significance of F-score?

- T test tells us if single variable is significant, while f-test tells us if a group of variables are jointly significant.
- F-statistics also has a p value associated to it.
- Null hypothesis for F test is H0: Intercept only model and your model are equal.
- While R-squared provides an estimate of the strength of the relationship between your model and the response variable, it does not provide a formal hypothesis test for this relationship. Later is given by F-test.

Next question comes is why we need F-statistics when we have p values of individual coefficient?

- It seems that when one of the coefficient is significant (has good p-value), overall model will also be significant.
- However, this is violated when no of variable p is very large.

Good values of F-statistic?

- It depends on value of n and p
- n = no of observations in training set
- p = no of independent variables

- When n is large F-value little greater than 1 is enough to reject null hypothesis.
- But it is good to take decision based on corresponding p value, which takes into account both n and p

**What is degrees of freedom?**

Although not highlighted in the screenshot, just want to share that degrees of freedom is the difference between n and no of non zero coefficient, intercept included.

**Significance Score *** in coefficient section?**

R indicates whether p value is good or bad by showing stars against it.

Thanks for reading out, hope it helps.

Edit : Found the formula for adjusted R2 here :