Q: How to derive distribution estimates based on relatively small sample?
A: Using bootstrap to derive a statistic or model parameter.
The basic idea of bootstrapping is that inference about a population from sample data, (sample → population), can be modeled by resampling the sample data and performing inference about a sample from resampled data, (resampled → sample).
Bootstrap does not necessarily involve any assumptions about the data, or the sample statistic, being normally distributed. In practice, it is not necessary to actually replicate the sample a huge number of times. We simply replace each observation after each draw—we sample with replacement. In this way, we effectively create an infinite population in which the probability of an element being drawn remains unchanged from draw to draw. The algorithm for a bootstrap resampling of the mean is as follows, for a sample of size N:
- Draw a sample value, record, replace it
- Repeat N times
- Record the mean of the N resampled values
- Repeat steps 1-3 B times
- Use the B results to:
- Calculate their standard deviation (this estimates sample mean standard error)
- Produce a histogram or boxplot
- Find a confidence interval