Course Description

This course is an introduction to statistical ideas and tools, underlying the foundations of data science. The course is broadly divided into 5 modules:

  • Module 1 Descriptive Statistics
  • Module 2 Probability & Random variables
  • Module 3 Estimation & Inference
  • Module 4 Statistical Modeling
  • Module 5 Statistical Computing

Course Syllabus

Elements of descriptive statistics, averages, dispersion, skewness, quantiles; graphical displays, pie charts, bar charts, histograms, scatter plots, box plots, steam and leaf plots.

Probability spaces, conditional probability, independence; Random variables, distribution functions, probability mass and density functions, functions of random variables, standard univariate discrete and continuous distributions; Mathematical expectations, moments, moment generating functions, inequalities; Multidimensional random variables, joint, marginal and conditional distributions, conditional expectations, independence, covariance, correlation, standard multivariate distributions, functions of multidimensional random variables; Forms of convergence, law of large numbers, central limit theorem.

Sampling distributions; Point estimation - estimators, minimum variance unbiased estimation, maximum likelihood estimation, method of moments estimation, Cramer -Rao inequality, consistency; Interval estimation; Testing of hypotheses - tests and critical regions, Neymann-Pearson lemma, uniformly most powerful tests, likelihood ratio tests.

Linear regression, ANOVA, discriminant analysis.

Computing techniques, cross-validation, bootstrap re-sampling.

Course Logistics

  • Schedule: Slot B, 9:00 am - 9:55 am Monday, 10:00 am - 10:55 am Tuesday, 11:00 am - 11:55 am Wednesday
  • Venue: 5103, Core 5.

Course Evaluation

There will 5 surprise quizzes and 5 assignments, a mid-semester examination and an end-semester examination with the following weightage:

  • Quizzes: 20%
  • Assignments: 20%
  • Mid semester exam: 30%
  • End semester exam: 30%

Some references (not an exhaustive list)

  • Hogg, R.V., McKean, J. and Craig, A.T., Introduction to mathematical statistics, 7th edition, Pearson Education, 2012.
  • Rice, J.A., Mathematical statistics and data analysis. 3rd edition, Cengage Learning, 2006
  • Wasserman, L., All of statistics: a concise course in statistical inference, Volume 26, New York Springer, 2004
  • Rohatgi, V.K. and Saleh, A.M.E., An introduction to probability and statistics, 3rd edition, John Wiley & Sons, 2015.
  • DeGroot, M.H. and Schervish, M.J., Probability and statistics, 4th edition, Pearson Education, 2010.

Topics Covered during the weeks

Lecture Date Topic Resources R codes
1 26-Jul-2023
  • First Handout
  • Some examples
2 31-Jul-2023 Data and Questions
3 1-Aug-2023 Graphical Displays: Pie charts, Bar graphs, Histograms, Scatter plots Data Codes
Thanks to Karan Kumawat, we have a nice bar chart for Nuclear powers of different countries with the proper labels here: Code , Barchart
4 2-Aug-2023 Measures of centre and spread; Skewness; Five-figure summary and Box and Whiskers Plots Book: Elements of Statistics by Daly, F. et al.
5 7-Aug-2023 Probability as a mathematical framework; Setting up a Probabilistic Model: Random experiment; Sample Space; Probability Law
6 8-Aug-2023 Probability axioms; Consequences of Probability axioms; Birthday Problem Birthday Problem
7 9-Aug-2023 Independent events; Newton-Pepys problem; Conditional Probability; Multiplication Rule; Bayes' Rule
8-9 12-Aug-2023 (Make up class for 16,17-Aug-2023) Conditional Probability; Total Probability Law; Tree Diagrams; Monty Hall Problem Monty Hall Problem Code 1 , Code 2
10 14-Aug-2023 Quiz 1 Quiz 1 solutions and marking scheme
15-Aug-2023 Assignment 1 We have a Teams group now: Grp_DA241-July-Nov-2023. If you are a part of this class and you are not added in the group please write to me immediately. The assignment is uploaded there and the due date is 11:59 pm, 20-Aug-2023. Late turn-ins will stop after 11:59 pm, 21-Aug-2023. Type the solutions (handwritten will not be accepatble) in Word/Latex and submit the PDF only.
11 28-Aug-2023 Random Variables; Discrete Random Variables; Probability Mass Function; Examples
12,13 29-Aug-2023 Review Discrete Random Variables and PMFs; Special Discrete Random Variables: Bernoulli, Indicator, Binomial, Hypergeometric, Geometric, Poisson. Poisson Paradigm and Binomial Convergence to Poisson
14 30-Aug-2023 Continuous Random Variables; Probability Density Functions; Cumulative Distribution Functions and thier Properties; Discrete Example.
3-Sep-2023 Assignment 2 Please check Teams group: Grp_DA241-July-Nov-2023. The assignment is uploaded there and the due date is 11:59 pm, 9-Sep-2023. Late turn-ins will stop after 11:59 pm, 10-Sep-2023. Type the solutions (handwritten will not be accepatble) in Word/Latex and submit the PDF only. All the other rules can be found on Teams. Question 5 solution
15 4-Sep-2023 Continuous Random Variables; Special r.v.s: Uniform, Piecewise Constant, Exponential, Normal; Universality of Uniform; Simulations
16, 17 5-Sep-2023 Normal random variables; Calculating Normal probabilities; Standardising a Normal random variable; Joint Distribustions; Marginals; Conditional Density; Independence of random variables; Some examples
18 11-Sep-2023 Expectation of a random variable; Properties of Expectations; Expectations of famous discrete random variables Practice Assignment 1
19, 20 12-Sep-2023 Quiz 2; Expectations continued; Law of Unconscious statistician Expected value of local maxima in random permutation of integers; Finding means and variances of common continuous distributions; Memorylessness property of Exponentials LOTUS
21 13-Sep-2023 Quiz 2 solutions; Moment Generating Functions; Examples Quiz 2 solutions and marking scheme
Practice Assignment 2
19-Sep-2023 Mid-semester Examination Mid-semester exam solutions
22 25-Sep-2023 Covariance and Correlation; Conditional Expectation; Examples: Two envelope paradox; Patterns in repeated coin flips The Other Person's Envelope is Always Greener by Barry Nalebuff
23 26-Sep-2023 Conditional Expectation continued
24 27-Sep-2023 Inequalities: Cauchy-Schwartz inequality; Jensen's inequality; Markov's inequality; Chebychev's inequality; Convergence of random variables in probability
25 3-Oct-2023 Weak Law of Large Numbers (WLLN); Pollster's problem; Demonstration of WLLN in R WLLN Demonstration; WLLN_animation1; WLLN_animation2
26 4-Oct-2023 Central Limit Theorem (CLT); Pollster's problem revisited; Demonstration of CLT in R Check out some nice demonstrations here: http://www.randomservices.org/random/apps/index.html . Also, some cool animations can be found here: https://yihui.org/animation/ CLT Demonstration; CLT Animation
27 9-Oct-2023 Statistical inference problems: Types of problems; Point Estimation
28 10-Oct-2023 Point Estimation: Some examples
29 11-Oct-2023 Desirable properties of estimators: Unbiasedness; Consistency; "small" Mean Squared Errors; Methods of estimation: Least Squares method; Method of Moments.
30, 31 16-Oct-2023 Maximum Likelihood Estimation; Examples; Quiz 3 Quiz 3 solutions and marking scheme
32 25-Oct-2023 Recall MLEs; Statistical Properties of MLEs; Fisher Information; Cramer-Rao lower bound
28-Oct-2023 Assignment 3 Please check Teams group: Grp_DA241-July-Nov-2023. The assignment is uploaded there and the due date is 11:59 pm, 3-Nov-2023. Late turn-ins will stop after 11:59 pm, same day. You can find the details of submission on Teams.
33 30-Oct-2023 Confidence Intervals; Examples
34 31-Oct-2023 Hypothesis Testing: Big picture; Likelihood Ratio Tests; Examples
35 6-Nov-2023 Hypothesis Testing: More general scenarios; Quiz 4
36 7-Nov-2023 Simple linear regression; Least Squares estimation; Maximum Likelihood Estimation; Probabilistic setting of the problem
37 8-Nov-2023 Multiple Linear Regression Model; Geometric Interpretation
38, 39 14-Nov-2023 A look ahead