View Single Post
Old 12-03-2017, 09:55 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 85,043
Blog Entries: 6

The specific functions I used from R were doing best fit (for regressions, minimizing the sum of squares of residuals - but one can use other weightings/metrics; the metrics being optimized are different for classification processes).

There's nothing particularly special about R, other than it was developed by a community of statistics-minded people and thus has been optimized for specific kinds of analysis and model-fitting. Theoretically, one could implement any of the algorithms used in R via Excel VBA (I DON'T RECOMMEND IT, THOUGH).

R functions have generally been developed by people who know the underlying theory and algorithmic approaches; it's the kind of stuff I used to code in Fortran back in the day, when I took numerical computing classes in the math department.

You can do it R, Fortran, python, whatever -- the point is somebody (or, more specifically, somebodies) have done the work to code the standard algorithms for people already.

I gave multiple examples of linear regressions -- I had to tell the lm() function which variables I wanted to regress against. I had to tell what kind of function I wanted to regress against (the first examples were linear, but I also did a few other kinds).

So yes, you'd have to stipulate the form of the model you're trying to fit, what you want to optimize, which data to use to do the fit. The functions/procedures give various statistics back to indicate significance of various variables, amount of correlation, etc.

When there are various suites for predictive analytics out there at a higher level, such as with reinsurers (as you mention), I believe they've fitted and tested a variety of data sets to see what kinds of structures work best for the kinds of models they're trying to fit (or the kinds of problems they're trying to solve).

That's where the cross-validation and other techniques come in -- what they do is help support that particular model structures work well for the kinds of data you're looking at. As new data comes in, parameters are updated to the particular structure -- I mentioned credibility as something similar.

For my demonstrations, I used public data on Kaggle because I wanted to easily share it with attendees without them needing to install any particular software.

If one wants to use proprietary data, then yes, you could add it to publicly available data... but you wouldn't be posting it to Kaggle Kernels. R can be used in a variety of environments, and it needn't be public as I did.

For those who want to see the Kaggle Kernel I ran, it's here:

I plan on adding some more comments and some more code over time.

LinkedIn Profile
Reply With Quote
Page generated in 0.35273 seconds with 9 queries