Data Analytics for Policy Research Cohort 2.0

Event Reports
Aasthaba Jadeja

Generation Alpha Data Center (GenAlphaDC), IMPRI Impact and Policy Research Institute, organized Data Analytics for Policy Research Cohort 2.0, a One-Month Immersive Online Hands-On Certificate Training Course in November 2023, to equip policymakers, researchers, and data enthusiasts with cutting-edge analytical skills.

Day 1

Hands-on Data Learning Sessions: Probability distributions: Density and Cumulative (EXCEL-based Session)

This session by Prof Nilanjan Banik delved into the world of probability distributions, focusing particularly on Probability Density Functions (PDFs) and their role in analyzing outcome variables for policy-making. Professor Nilanjan provided a foundation in essential statistical components like density functions, random variables, and skewness, before diving into a hypothetical example of student test scores to illustrate the concept of PDFs.

To bring the concept to life, Professor Nilanjan employed a real-life example: predicting the probability of rainfall in Delhi using PDFs. The session then transitioned to hands-on practice in Excel, where participants learned to visualize density distributions, generate random variables, and calculate key statistics. To solidify understanding, Professor Nilanjan provided further examples, showcasing the impact of different distribution shapes and their practical implications. The session concluded with a Q&A session to address any student confusion and offer resources for further exploration.

To read a more elaborate session report: click here.

Data Deluge & Public Policy: Promises & Perils

This session by Dr Soumyadip Chattopadhyay discussed the importance of data in public policymaking in India. It covered various factors, characteristics, and progress made in the public policy data sector. The session highlighted the increasing amount of data being generated and the need for proper analysis to inform policy decisions. It explained the 5 frameworks of data (volume, variety, velocity, veracity, and value) and the 5 characteristics of data quality (accuracy, completeness, timeliness, consistency, and uniqueness). These frameworks and characteristics can be used to improve the quality of public policies.

The session also discussed the cost-benefit framework of data, economies of scale and scope of data, and the structure of government data. It emphasized the importance of an integrated data system and explained the Integrated Command and Control Centre (ICCC) in Smart Cities. Finally, the session discussed the Revised National Policy on Official Statistics (NPOS) and its relation to Digital India. It concluded with a mention of the coming challenges in the public policy data world.

To read a more elaborate session report: click here.

Research Ethics in Data Collection and Analysis

This session by Dr. Amar Jesani focused on the crucial role of ethics in research and data collection, particularly when involving public data. Dr. Amar highlighted the importance of ethical principles applying even when data is used secondarily by others, and emphasized considering various stakeholders beyond just researchers and participants. He elaborated on the research ethics framework, stressing social values, scientific validity, and participant benefits. He cautioned against potentially harmful or misleading research, and emphasized methodological rigor and reciprocity in research methods like surveys.

The session also covered data protection, emphasizing informed consent, transparency, and voluntary participation. Dr. Amar highlighted the importance of protecting participant privacy and confidentiality, and warned against promising confidentiality if it cannot be upheld. He discussed selection bias and the need for careful data source examination. He emphasized understanding data collection and analysis methodologies, data privacy and security practices, and responsible data sharing to prevent misuse. Finally, he discussed data transparency, fabrication, and plagiarism in publication, stressing the need for ethical considerations in data-driven policy-making.

To read a more elaborate session report: click here.

Day 2

Hands-on Data Learning Sessions: Introduction of ANOVA (EXCEL-based Session)

(Report)

How to Carry out an Empirical Project: A Step-by-Step Approach

This session offered a comprehensive guide for conducting an empirical project, emphasizing a structured approach for researchers to achieve robust and credible outcomes. The speaker, Dr. Soumyadip Chattopadhyay, outlined key steps, starting with defining clear and focused research questions that directly align with the project’s objectives. This forms the foundation for the entire research process, guiding every subsequent step.

Next, a thorough literature review is crucial to identify existing research gaps and build a strong theoretical framework. This framework serves as the conceptual backbone, informing the relationships between variables and providing a basis for formulating testable hypotheses. Choosing the right research design and data collection methods is also critical, ensuring they align with the objectives and allow for the gathering of relevant data. Standardized tools, clear instructions, and adequate sampling techniques are essential for data quality and reliability.

Following data collection, utilizing appropriate statistical techniques for analysis is vital. Whether quantitative or qualitative, the analysis should address the research questions and test the hypotheses. Interpreting the results in the context of the research questions and existing literature allows for drawing well-supported conclusions and discussing their significance. Acknowledging limitations enhances transparency and credibility, while offering recommendations for future research or real-world applications contributes to the ongoing discourse in the field. Finally, compiling all findings, analyses, and discussions into a comprehensive research report adhering to academic standards ensures clarity and coherence, effectively communicating the research’s contribution to knowledge.

By following these steps and addressing the crucial points highlighted by the speaker, researchers can navigate the complexities of empirical projects with confidence. This systematic approach fosters rigor, reliability, and ultimately, meaningful contributions to the chosen academic or practical domain.

To read a more elaborate session report: click here.

The Statistical System in India and an Introduction to Various Official and Other Databases

This session by Dr. Arjun Kumar explored the crucial role of India’s statistical system in shaping policies, planning, and societal development. The multi-tiered system, involving central and state agencies, gathers and analyzes data to inform evidence-based decisions.

Key official databases include the National Sample Survey (NSS), providing insights into various social and economic aspects; the Census of India, capturing demographic trends; and the Economic Census, detailing economic activity. Other notable databases support policymaking in areas like finance, health, and crime.

The speaker further highlighted the system’s role in infrastructure development, monetary and fiscal policy formulation, and crime prevention strategies. Challenges like survey fatigue, data gaps, and accessibility were also discussed.

Ultimately, comprehensive and reliable statistical data empowers policymakers to address complex challenges, target interventions effectively, and monitor policy impacts. As India progresses, a robust and evolving statistical system remains vital for informed decision-making and navigating a dynamic society.

To read a more elaborate session report: click here.

Day 3

Hands-on Data Learning Sessions: Interpretation of Model

This session by Prof. Nilanjan Banik explored the fundamentals of data analysis for policy research, focusing on statistical tests and interpretations.

The first part covered analyzing income inequality through cumulative distribution and density functions. Prof. Banik explained how these functions help assess data characteristics and choose appropriate tests. He discussed normal distribution limitations and advocated for non-parametric tests and log transformations when dealing with heterogeneous data. He also introduced the Jaguar Test Statistics for normality checks and shared examples from research papers.

The second part focused on diagnostic tests for regression models. Using a practical dataset, Prof. Banik explained how to identify and address issues like autocorrelation, heteroscedasticity, and multicollinearity using tests like Durbin-Watson, Breusch-Pagan, White, and Variance Inflation Factor (VIF). He emphasized the importance of interpreting test statistics and choosing appropriate remedies to ensure the quality of regression analysis.

Overall, the session provided valuable insights into using statistical tests for data analysis in policy research. Prof. Banik’s hands-on approach with real-world data equipped participants with essential skills for conducting meaningful analyses and interpreting their results. The interactive discussion further solidified the learning and encouraged participants to actively engage with the concepts.

To read a more elaborate session report: click here

Regression with Time Series Analysis & Forecasting: A Primer

This session by Dr. Soumyadip Chattopadhyay focused on time series analysis for policy research, emphasizing non-forecasting applications and the crucial role of stationarity.

Understanding stationarity is vital because it affects a variable’s behavior and how it’s modeled. Non-stationary variables can lead to spurious regressions, unreliable forecasting, and incorrect policy conclusions. Dr. Chattopadhyay discussed graphical methods, auto-correlation functions, and unit root tests to assess stationarity. He also introduced the auto production function and coldograms for visual inspection.

The session then explored cointegration as a way to analyze non-stationary time series. Methods for addressing non-stationarity, like differencing, were covered. The Engle-Granger test was explained as a way to check for cointegration between two variables. Dr. Chattopadhyay used a real dataset on consumption and GDP to demonstrate these concepts, including data import, transformation, stationarity assessment, differencing, and cointegration analysis.

Overall, the session provided a comprehensive understanding of time series analysis for policy research, highlighting the importance of stationarity and cointegration for drawing meaningful conclusions and informing policy decisions. Participants received the dataset and materials for further exploration.

To read a more elaborate session report: click here.

Gender Mainstreaming of Data, Monitoring and Evaluation

Dr. Vibhuti Patel’s session highlighted the crucial role of accurate statistics in tackling gender inequalities across various life aspects. Precise data is essential for effective problem-solving and informed policy decisions.

She traced the historical context, referencing the Beijing Platform (1995) where gender-disaggregated data collection was emphasized. However, challenges like gender stereotypes and underreporting of women’s work, particularly in unpaid caregiving roles, persist. These biases lead to misinformed policies and perpetuate inequalities.

Dr. Patel emphasized the need for intersectional data collection, considering diverse characteristics like disability and age. Additionally, clear and inclusive definitions in data collection, especially regarding the workforce, are crucial. Accurate gender statistics empower governments to address issues like gender-based violence, health outcomes, and representation in decision-making bodies.

The session concluded with a call for comprehensive gender mainstreaming strategies, starting with addressing the root causes of gender disparities. Dr. Patel stressed the importance of accurate and inclusive data in guiding effective policy research and development for diverse gender needs. This insightful session equipped participants with a deeper understanding of the challenges and opportunities in using data analytics for gender equality in policymaking.

To read a more elaborate session report: click here.

Day 4

Hands-on Data Learning Sessions: Dummy Variable

Prof. Nilanjan Banik’s session provided a comprehensive exploration of dummy variables, offering practical insights into their application and interpretation. The session delved into the concept of structural breaks, distinct shifts in data patterns. Dummy variables were introduced as tools to capture these breaks, exemplified by the economic reforms in India. Their role in interpreting coefficients associated with intercepts and slopes was emphasized, enabling researchers to understand changes in economic dynamics.

Prof. Banik went beyond theory, showcasing practical applications like deseasonalizing sales data using dummy variables. He even introduced the advanced “Spike” command in EViews for addressing combined intercept and slope changes. These practical examples highlighted the versatility of dummy variables beyond structural break detection.

Overall, the session empowered participants with valuable tools to handle qualitative variables in regression analysis. The clear explanations, real-world examples, and advanced techniques provided a foundation for researchers and analysts to extract meaningful insights from their data and make informed decisions across various fields. It was a significant step forward in understanding regression analysis in the evolving analytical landscape.

To read a more elaborate session report: click here.

Hands-on Data Learning Sessions: Regression Analysis with Qualitative variables – Categorical Dependent Variable Regression (including Logit and Probit Model)

Dr. Soumyadip Chattopadhyay’s session tackled the challenges of regression analysis with qualitative variables, specifically focusing on categorical dependent variables. He went beyond theory, providing practical insights and hands-on experience.

Binary choice models were the main focus, with logit and probit models receiving in-depth explanations. Dr. Chattopadhyay clarified their theoretical foundations, practical applications, and how to interpret their results, including marginal effects. He also discussed choosing between logit and probit based on factors like error term distribution and personal preferences.

Moving from theory to practice, Dr. Chattopadhyay analyzed a dataset on employment modes using a logit model in EViews software. He walked the audience through step-by-step implementation, interpreting coefficients, significance levels, and overall model fit. He even covered manual calculation of marginal effects, acknowledging EViews limitations.

Overall, the session provided a comprehensive understanding of regression analysis with categorical dependent variables. Participants gained theoretical knowledge, practical skills, and valuable tools for conducting their own analyses. This empowers researchers to navigate the complexities of this type of regression and extract meaningful insights from their data.

Read more Event reports at IMPRI:

Annual Series of Thematic Deliberations on Union Interim Budget 2024-25

Global and Local Evidence: Women in Leadership in Health Sector

Acknowledgement: Aasthaba Jadeja is a Visiting Researcher at IMPRI.