How to Define a Statistic for Sampling

Briefly

Create your statistic as a measure in the original collection. Then when you attach a measures collection, you build up a distribution of that statistic.

To investigate a statistical measure using sampling, define it as a measure in your original data collection. (How to make a new measure).

After you've defined the attribute name, be sure to give the attribute a formula. Double-click the Formula column in the inspector to bring up the formula editor.

Then, to build up a distribution of that statistic through repeated sampling, make a measures collection with your sample collection as its source. The measures collection will hold the statistics you have defined, one for each sample.

That means you'll have three collections, like this:

Hints for formulas for statistical measures

Let's look at an example. Suppose we're investigating the distributions of sample means. We want to use income data on people for this study.

Your formulas will probably use what are called aggregate functions. Typical aggregate functions include mean( ) and median( ) and stddev( ). So median(income) is one likely formula for our collection attribute, which we might call medianIncome.

The sample collection will have attributes for income, sex, age, and so forth. And it will inherit medianIncome as a measure from its source. The measures collection, however, will have medianIncome as a regular attribute - one value for each sample.