How to Define a Statistic for Scrambling

Briefly

Create your statistic as a collection attribute in the original collection. Then when you attach a measures collection, you build up a distribution of that statistic.

To investigate a statistical measure using scrambling, define it as a measure in your original data collection. (How to make a new measure).

After you've defined the attribute name, be sure to give the attribute a formula. Double-click in the Formula column to bring up the formula editor.

Then, to build up a distribution of that statistic through repeated scrambling, hook up a measures collection to your scrambled collection. The measures collection will hold the statistics you have defined, one for each scramble.

That means you'll have three collections, like this:

Hints for formulas for statistical measures

Let's look at an example. Suppose we've planted sunflowers in two garden plots, a sunny one and a shady one.

Your formulas will probably use what are called aggregate functions. Typical aggregate functions include mean( ) and median( ) and stddev( ).

If you're trying to figure out whether plants in the sun or the shade are taller, you have two attributes: whichPlot is categorical and height is continuous. You're probably interested in the difference between them. In that case, your formula might be the difference in medians between the two garden plots:

median(height, whichPlot = "sunny") - median(height, whichPlot = "shady")

More about the formula editor.

More about functions

Other uses of formulas:

Writing formulas for attributes

Plotting functions

Plotting values

Writing filters