Quote:
Originally Posted by ericp
I've seen the empirical variance expressed two different ways and I don't understand when to use which.
i. Var = F*S/N
ii. Var = F*S*N
The later looks like the variance of a binomial which makes sense to me in situations where we want, say, the number of data points below a value, like F(1500). However, this is exactly the case in SOA #227, yet the solution uses the former (i). I don't understand why.
If we wanted the variance of F(1500) why wouldn't it be:
# pts below 1500/ N * # pts above 150 / N * N  because this would be variance of a binomial with F = p and S = (1p) * N.
When finding variance of grouped data, specifically, when the value we want is within a group, we use ii. Yet, in the simulation section we write the variance as in i.
Can someone explain simply what the difference is or why we use one or the other?
thanks.

F(1500) is not the number of points below 1500.
You must distinguish between a
binomial random variable and a
binomial proportion random variable.
A binomial random variable ALWAYS assumes only integral values. A binomial random variable can never by 1/2, for example. Your formula ii is the formula for its variance.
A binomial proportion random variable is a binomial random variable divided by its parameter m (or N as you're calling it) and is almost never integer valued (unless it is 0 or 1). It is therefore easy to distinguish this from a binomial random variable. Your formula i is the formula for its variance. Another way to distinguish it from a binomial random variable is that it is always between 0 and 1, whereas a binomial random variable usually can be higher than 1 (unless m=1).
F(1500) assumes fractional values and is between 0 and 1, so it is a binomial proportion random variable.
The number of points below 1500 is always an integer, so it is a binomial random variable.
I leave it to you to go through your other examples and to determine which type of random variable they are.