Actuarial Outpost Does Empirical Estimate of F(X) not involve (n+1) in denominator?
 Register Blogs Wiki FAQ Calendar Search Today's Posts Mark Forums Read
 FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

#11
02-04-2012, 01:15 AM
 JCD Join Date: Sep 2011 Posts: 11

When you divide by n you are assuming that your data are telling you everything. What I mean by this is you are assuming that each observation you see is equally likely. So if you observed 120 200 320 380 450, then each is equally likely to occur with probability 1/5. Then everything is based on this idea. This is called the empirical distribution.

However, sometimes this doesn't really do the best job of predicting what is going on in the underlying distribution. Another way to view the five observations 120 200 320 380 450 is that these probably are not the only five possible values. There are probably a lot of other possible values that are in between my observations and based on the data. in particular I'd like the predicted 50th percentile to be the "middle" of the data. So here I would like the 50% to be 320. In other words F(320)=P(X<=320)=.5=3/6=3/(n+1). So to get this idea to work you have to divide by n+1.

As others mentioned use the smoothed estimate when doing moment matching or when doing p-p plots. Otherwise use the actual empirical distribution.

As n-> infinity these two values should approach each other.
#12
02-04-2012, 02:40 AM
 Ionic Order Member CAS SOA Join Date: Jan 2011 Location: Denver Posts: 1,284

I don't get the part about you how you would like the 50% percentile to be 320. Why do you want that? What if you had 6 data points?
#13
02-04-2012, 10:27 AM
 UFActuary Member Join Date: Jul 2005 Posts: 3,431

Quote:
 Originally Posted by HiLine I don't get the part about you how you would like the 50% percentile to be 320. Why do you want that? What if you had 6 data points?
HiLine, if the goal is to match your observation and best fit it to one of the distribution functions, you can only assume what's given to you. If you have 5 data points, you would not be able to assess what if you had 6. The N+1 is a smoothing technique as to artificially create some space between the first and last data points.

I think the smoothing is easiest to comprehend if you look at just 2 datapoints.

Lets say the values are 50, 100

If you use empirical no smoothing). x = 50 would be 0.5, x = 100 would be 100%.

But this would be a very poor assumption if these are sample observations from a model.

If you use n+1 and now do it

50 is 1/3 and 100 is 2/3

Makes much more sense. It's like in Microsoft Word when you Center your text so there's a left-margin and a right-margin.

you have 1/3 margin to the left and the same 1/3 margin between points, and then there's an equal margin to the right because it's easy to see there's likely more distribution you're not getting... and the middle of the distribution should not be 50 or 100.... it should be the average of the 2.

And sure enough 1/3 + 2/3 /2 = 50% and that lies at 75.

On another note, I read somewhere that you have to have more than 2 observations to really fit the data.... Not sure what the minimum was.
#14
02-04-2012, 11:11 AM
 Ionic Order Member CAS SOA Join Date: Jan 2011 Location: Denver Posts: 1,284

I was wondering about the margin thing too when I had to deal with questions whose empirical estimation uses ogives. So the percentile smoothing method solves this problem! Thanks a lot for your explanation!