Statistical Functions for One Attribute

These functions are often the guts of a statistical analysis for one attribute. These are all aggregate functions. You can use them in formulas for attributes or for measures.

Note: All of these functions take an optional last argument, a Boolean expression. That modifies the function so that it applies only to those cases where the condition is true. For example, max( age, ancestry = "Norwegian") finds the age of the oldest Norwegian in the collection.

Counting and Proportions

count( )

The number of cases in a collection

count( a )

The number of cases having a valid value for a that is not false.

count( condition )

The number of cases where condition is true. Example: count( age > 65) gives the number of cases where age is greater than 65.

proportion( condition)

Gives the proportion of cases where condition is true.

Example: proportion( a > 12) tells us the proportion of all cases where a is greater than 12. The result is a number between 0 and 1, inclusive.

Measures of Center

mean( a)

The mean value for a in the collection.

median( a)

The median value for a in the collection.

Measures of Spread

iqr( a)
iqr( a, condition)

The interquartile range of the attribute a.
The same where condition is true. For example, iqr(height, age=10) gives the interquartile range for the heights of ten-year-olds.

About calculating IQR.

popStdDev( a)

The standard deviation of the attribute you give it, in this case, a. This is the "population standard deviation."

s( a)
stdDev( a)
sampleStdDev( a)

These three are synonyms. They estimate the standard deviation of a population given that the cases constitute a simple random sample. That is, it's just like popStdDev, except it's a little bigger because we used n 1 instead of n in the calculation.

popVariance( a)

The variance of the values in a. This is also popStdDev squared.

sampleVariance( a )
variance( a )

The estimate of the population variance. This is also s squared. (These two are synonyms.)

stdError( a )

An estimate for the error of the population mean of a, assuming that the cases are a simple random sample. This is the same as s( a ) divided by the square root of n.

Order Statistics

max( a )

The maximum value for a in the collection.

min( a )

The minimum value for a in the collection.

percentile( pct, a)
pct
= percentile

Gives the value at the given percentile for a.
Example: percentile( 95, height) gives you the 95th percentile of height in the data set.

Other

sum( a )

The sum of the attribute you give it, in this case, the sum of a.

uniqueValues( a)

The number of unique values that attribute has in the collection.
Example: uniqueValues( sex ) gives 2 if there are only two values (Male and Female) for sex.

first( a )

Gives you the value of a from the first case in the collection.

last( a )

Gives you the value of a from the last case in the collection.

Q1( a )

The value of a that lies at the 25th percentile; i.e. the first quartile.

Q3( a )

The value of a that lies at the 75th percentile; i.e. the third quartile.

Categories of Functions