Just thinking aloud here. I do have similar issues on finding N too.
1) These were my first thoughts.
I believe that part of the issue here is that we use the data for two purposes. One to find the missing parameter of the Poisson. The other to find Z. We have data for 2 years, taken as the sample. One way of solving it treats it as a sample of one twoyear period (N=1). Another is to treat it as 2 years data (the sample will need to be reduced proportionally to 1/2 the original values) (N=2).
We are trying to figure out the number of claims per policy per year. In one case we figure it for 2 years and then we divide it by 2 to get one year. They could have given the amount for year 1 and the amount for year 2. Or they could have given data for 5 years, and then N=5. The sample would have increased proportionally. The number of policies is used merely to find the average (and the sample standard deviation) of the number of claims per policy per year. And that's what we are trying to find out. You have yearly samples of the number of claims per policy.
That's why it's N= 1, or 2 or 5, depending on the amount of years sampled. And not N= 100 (for 2 years).
In sample 197 the exercise mentions that the number of claims per year follows a Poisson. In Sample 263 mentions "The number of claims incurred in a month by any insured follows a Poisson distribution with mean λ."
2) And after reading about this issue I came with something that may she a new light to this. Basically we are trying to find out how do we partition the weights of 2 different means, based on how credible each value is. In the case of the sample, the more data points the more credible the sample mean, x bar, will be respective to the mean of the model, mu.
In the case of the semiparametric, sample 197, we have data that we use to figure out the lambda of the Poisson frequency. On the sample we have basically a 2year period for one policyholder, that is N(1,2) = 1. That is then 1 exposure. Not the 100 policies shown before to evaluate the parameter. These 100 policies are not the sample.
In the case of the BühlmannStraub, sample 263, in the data sample we have a lot more exposures. We have to be careful on how it's read, as the number of insureds change month to month. Basically there are 500 insuredmonths data with 35 claims total. We will find out what the estimated value PC is for claims per insuredmonth. So the exposure will be insuredmonths, that is N=500. And then multiply PC by the number of months(12) and insured (300), then the result of the credibility will need to be multiplied by 3600.
I think that explains the difference on how N in Z is calculated for both sample problems.
Something that is not explained anywhere I could find is that the model in a way needs to be connected to this N. It's like given a 1 to the model and N to the sample.
What I did find is an alternate way to express it that perhaps can shed additional light
Z =VHM / (VHM + EPV/N)
where the second term of the denominator is the variance of the mean. The total denominator is then the Total Variance of the Estimator X bar.
__________________
German
______________
Prelims: 1/P  2/FM  3F/MFE  LTAM  STAM
VEE: Economics  Corporate Finance  Applied Statistics
Last edited by gauchodelpaso; 10202018 at 10:28 PM..
Reason: additional info
