

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 
DW Simpson 
Actuarial Salary Surveys 
Actuarial Meeting Schedule 
Contact DW Simpson 

Thread Tools  Display Modes 
#11




When you divide by n you are assuming that your data are telling you everything. What I mean by this is you are assuming that each observation you see is equally likely. So if you observed 120 200 320 380 450, then each is equally likely to occur with probability 1/5. Then everything is based on this idea. This is called the empirical distribution.
However, sometimes this doesn't really do the best job of predicting what is going on in the underlying distribution. Another way to view the five observations 120 200 320 380 450 is that these probably are not the only five possible values. There are probably a lot of other possible values that are in between my observations and based on the data. in particular I'd like the predicted 50th percentile to be the "middle" of the data. So here I would like the 50% to be 320. In other words F(320)=P(X<=320)=.5=3/6=3/(n+1). So to get this idea to work you have to divide by n+1. As others mentioned use the smoothed estimate when doing moment matching or when doing pp plots. Otherwise use the actual empirical distribution. As n> infinity these two values should approach each other. 
#13




Quote:
I think the smoothing is easiest to comprehend if you look at just 2 datapoints. Lets say the values are 50, 100 If you use empirical no smoothing). x = 50 would be 0.5, x = 100 would be 100%. But this would be a very poor assumption if these are sample observations from a model. If you use n+1 and now do it 50 is 1/3 and 100 is 2/3 Makes much more sense. It's like in Microsoft Word when you Center your text so there's a leftmargin and a rightmargin. you have 1/3 margin to the left and the same 1/3 margin between points, and then there's an equal margin to the right because it's easy to see there's likely more distribution you're not getting... and the middle of the distribution should not be 50 or 100.... it should be the average of the 2. And sure enough 1/3 + 2/3 /2 = 50% and that lies at 75. On another note, I read somewhere that you have to have more than 2 observations to really fit the data.... Not sure what the minimum was. 
Thread Tools  
Display Modes  

