Few years ago I gave several talks on Big Data and among the slides of my power point presentations was a slide entitled ‘Transitions by Generations’ which was appreciated by many participants. When Sir R. A. Fisher, father of modern statistics, advanced the idea of drawing statistical inference from a population using a small sample in the early twentieth century, his theories were based on the assumption that the underlying distribution from which the data arose was normal. During Fisher’s years numerical statistical tables available were mainly for normal, t, χ2, and F distributions where assumption was that the samples arose from an underlying normal distribution. These numerical tables were created using lot of manpower and tedious manual computations.
It is because of these computational problems, no other underlying distribution was considered for the development of new statistical theories. Just to be able to use these existing numerical tables, Fisher even suggested transformation of data which are not from the normal distribution so that the transformed data is approximately normally distributed and the existing normal theory-based analysis can be done using these transformed data. Some of such Fisher’s transformations are square root transformation, arcsine transformation, arctan transformation, and z transformation of correlation coefficient.
With the advent of the modern computer many areas of statistics which are computer intensive found place in practice which do not need the assumption of underlying normal distribution. To name a few of such computer intensive methods are Bayesian Statistics, Jackknife and Bootstrap Resampling Methods and Nonparametric Methods. With the creation of high speed computers with very high storage and computing capabilities, we have now entered the era of Big Data where massive data can be easily handled which was unthinkable during Fisher’s time.
The Chart below which I have developed shows how the development of telephone technology corresponds to the development of computer technology and in turn to the development of statistical methodologies from Small Data era to the current Big Data era.