Hands-on Data Learning Session: Introduction to ANOVA

Session Report
Pranav Vijay Sonar

An online International Summer School Program on “Data, Monitoring and Evaluation” is a two-month immersive online hands-on certificate training course organized by IMPRI Impact and Policy Research Institute, New Delhi. After an insightful session on ‘Data Deluge & Public Policy’ in the second week of the program on June 10th, it is time for the ‘Hands-on Data Learning Session’. The second speaker, Prof Nilanjan Banik, Professor and Program Director (BA, Economics and Finance), Mahindra University, HyderabadVisiting Consultant, IMPRI was welcomed by Fiza Mahajan, visiting researcher at IMPRI. This session is the continuation of the Hands-on Data Learning Session that started last week. Prof Banik started the discussion on the ‘Introduction of ANOVA’.

He started the session with a discussion of last week’s topic of Probability distributions. He referred back to the question that he asked in the last session on mean and deviation and their real-world use. And answered it with an explanation. He continues the talk with cumulative distribution and density function. He explains the distinction between the two and the relationship between both. He says that looking at the distribution function helps to understand where the government or any policy-making organization should intervene and how it should intervene.

He further explains the formula to use in Excel for normal distribution and the parameter it involves. He mentions that one standard deviation occupies 68.3% area in a normal distribution, two standard deviations cover 95.4% area and 3 standard deviations would cover 99.7% area which is almost all area. And this much area is good enough for us to do any calculation.

Data used as Evidence to Support Results

With reference to his papers in Review of Development Economics and Development Policy Review, Prof Banik shows how from the real data,  results were captured. By showing the data on graphs which was possible with the help of Excel and material from last week’s session, evidence of twin peaks is captured. These twin peaks suggest that within the service sector, we are finding an instance where the rich or the highly skilled labor are becoming richer compared to the arms skilled labor within the services sector. He observed this for the data between 1999 to 2006.

He then explains the importance of t-test statistics and how it can be used to diagnose whether any sample mean is better than others. He shows what values the variables for comparison in the formula take, and the variables can be anything like the accessibility of basic healthcare, education, etc. For the next part, he introduced the concept of the Right/Left/Two-tailed test. He explains the concept of a null hypothesis and alternate hypothesis and gives the formula to calculate the t value.

After explaining the theory, he shows practically how to calculate various descriptive statistics using Excel. For this purpose, he uses the stock prices of Tata Power and Adani Power and compares them using a t-test. He shows a step-by-step demonstration of this calculation. He also explains the concept of p-value, which is the probability of the Null hypothesis being correct. And explains what significant p values mean.

Then he takes a problem to perform various tests. The problem reads that a gym claims that if you join the gym, your weight will be less than 88. Thus the gym’s claim that a person who joins the gym their weight will be reduced to less than 88 is the null hypothesis. And hence an alternate hypothesis is that the weight is greater than 88.

So from the data of weights of gym members and with Excel commands, he calculates mean, and standard deviations of weights. And then he calculates t-test statistics from the values calculated. And calculates the p-value corresponding to the obtained t-value using the Excel function TDIST. This p-value is almost zero. Thus, we can reject the null hypothesis. Thus, going by this sample data, it can be concluded that after joining the gym, one’s weight does not fall below 88.

Further, he explains what would change when we perform the right tail test than the left tail test. He concludes the session by listing all the topics that were covered in the session. Prof Banik ends the fruitful session by addressing the questions and doubts of participants.

Acknowledgment: Pranav Vijay Sonar is a research intern at IMPRI.

