Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 12-31-2017, 05:46 AM
Aussie101 Aussie101 is offline
SOA
 
Join Date: May 2017
College: ECU
Posts: 6
Default SOA Predictive Analytics - R over Python

Hi,

I know R is the market leader in stats software but a bit surprised that there was no flexibility to choose Python for the Predictive Analytics assignment?

I have no experience with Python but have seen the ads from Udemy, and web casts / debates on "R vs Python".

Regards,

Aussie 101
Reply With Quote
  #2  
Old 12-31-2017, 08:44 AM
JohnLocke's Avatar
JohnLocke JohnLocke is online now
Member
SOA
 
Join Date: Mar 2007
Posts: 16,119
Default

Quote:
Originally Posted by Aussie101 View Post
Hi,

I know R is the market leader in stats software but a bit surprised that there was no flexibility to choose Python for the Predictive Analytics assignment?

I have no experience with Python but have seen the ads from Udemy, and web casts / debates on "R vs Python".

Regards,

Aussie 101
Coin toss, IMO.
__________________
i always post when i'm in a shitty mood. if i didn't do that, i'd so rarely post. --AO Fan

Lucky for you I was raised by people with a good moral center because if that were not the case, you guys would be in a lot of trouble.
So be very, very glad people like me exist. Your future basically depends on it. --jas66kent

The stock market is going to go up significantly due to Trump Economics --jas66kent
Reply With Quote
  #3  
Old 12-31-2017, 08:32 PM
764dak's Avatar
764dak 764dak is offline
Member
 
Join Date: Jun 2011
Posts: 871
Default

According to Benjamin Johnson at the 2017 Predictive Analytics Symposium:

Quote:
Why choose R over others? • Free • Install R on any computer • Open source • Share as much as you want • Large community • Easily find support online
Quote:
R is a tool, not the chest • Don’t join the argument of R or Excel, R or Python, R or SAS • This isn’t Team Edward vs Team Jacob

Team R (He’s a “vampie-R”)
Reply With Quote
  #4  
Old 01-02-2018, 04:53 PM
AMedActuary AMedActuary is offline
Member
SOA
 
Join Date: May 2007
College: UCLA Alumni
Posts: 379
Default

I think for most things, it's pretty similar so it would just depend on your background. I wonder though does Python have a 'dplyr' equivalent for data wrangling and markdown for reporting? Being able to use dplyr and markdown are very useful. I know that is not what is typically meant by 'predictive analytics' but those working in that area will have to do a lot of data wrangling and make reports.

Some of these debates end up being base R vs. Python which isn't a fair comparison. R does allow multi-core processing with "Microsoft R" and some do complain about the syntax but I think as long as you're using tidyverse, the syntax is pretty straightforward.

Also, ggplot2 is very useful for plotting. Is there a Python package that can produce great graphics so quickly? I believe the whole 'tidyverse' paradigm might be a good reason to prefer R but I know not everyone likes it or knows it.

Also, I know on Kaggle, Python seems to be more popular for machine learning applications. It may be because when you're doing a very specialized machine learning application, Python is better but I'm not sure.

Last edited by AMedActuary; 01-02-2018 at 05:16 PM..
Reply With Quote
  #5  
Old 01-02-2018, 06:31 PM
whoanonstop's Avatar
whoanonstop whoanonstop is offline
Member
Non-Actuary
 
Join Date: Aug 2013
Location: Los Angeles, CA
Studying for Spark / Scala
College: College of William and Mary
Favorite beer: Orange Juice
Posts: 5,717
Blog Entries: 1
Default

Quote:
Originally Posted by AMedActuary View Post
Also, I know on Kaggle, Python seems to be more popular for machine learning applications. It may be because when you're doing a very specialized machine learning application, Python is better but I'm not sure.
It is because many of the packages for machine learning algorithms in R are not coded efficiently. They work fine on smaller data sets but do not scale well. This may have changed as I haven't actively been using R for a while. I've switched almost primarily to Python except for occasional ad-hoc visualizations with ggplot.

If you don't have large amounts of data and you're not looking to do production quality code, there isn't much difference besides preferences.

-Riley
__________________
It is impossible to have a professional forum where the majority of your professionals are anonymous.

Map of Actuarial Hiring Companies
Reply With Quote
  #6  
Old 01-02-2018, 06:39 PM
soyleche's Avatar
soyleche soyleche is offline
Member
SOA AAA
 
Join Date: Apr 2005
Posts: 16,803
Default

Quote:
Originally Posted by AMedActuary View Post
Also, ggplot2 is very useful for plotting. Is there a Python package that can produce great graphics so quickly? I believe the whole 'tidyverse' paradigm might be a good reason to prefer R but I know not everyone likes it or knows it.
There is an implementation of ggplot for python: http://ggplot.yhathq.com/

I'm not sure how complete it is.
__________________
I'll never again say that I could never enjoy Bieber sung by a bunch of Mormons - Ben Folds
Reply With Quote
  #7  
Old 01-04-2018, 01:52 PM
Olrich Olrich is offline
Member
 
Join Date: Jan 2008
Posts: 155
Default

Quote:
Originally Posted by soyleche View Post
There is an implementation of ggplot for python: http://ggplot.yhathq.com/

I'm not sure how complete it is.
There is a newer package called `plotnine` that I think is a more complete/faithful version of ggplot in python.

http://pltn.ca/plotnine-superior-python-ggplot/

http://plotnine.readthedocs.io/en/stable/
Reply With Quote
  #8  
Old 01-04-2018, 03:01 PM
kevinykuo kevinykuo is offline
CAS
 
Join Date: Nov 2017
Posts: 13
Default

Quote:
Originally Posted by whoanonstop View Post
It is because many of the packages for machine learning algorithms in R are not coded efficiently. They work fine on smaller data sets but do not scale well. This may have changed as I haven't actively been using R for a while. I've switched almost primarily to Python except for occasional ad-hoc visualizations with ggplot.

If you don't have large amounts of data and you're not looking to do production quality code, there isn't much difference besides preferences.

-Riley
R is fine for big data and machine learning at scale because R is an interface language -- you're writing R but really calling implementations in C++ or Java/Scala. Outside of academic intro to stats classes few people are actually using "pure R" implementations of machine learning algorithms. ML in the real world is mostly H2O, Spark, TensorFlow, etc.

A Spark ML pipeline written in R/sparklyr is exactly the same as the one you'd get if you'd written it in Scala. Similarly for keras or TF models with R and Python. So there really is no performance hit here but one can argue that R has better integration with reporting with RMarkdown/shiny. Tidyverse is more opinionated, but you can use as little or as much of it as you want as an R user.
Reply With Quote
  #9  
Old 01-05-2018, 02:42 AM
whoanonstop's Avatar
whoanonstop whoanonstop is offline
Member
Non-Actuary
 
Join Date: Aug 2013
Location: Los Angeles, CA
Studying for Spark / Scala
College: College of William and Mary
Favorite beer: Orange Juice
Posts: 5,717
Blog Entries: 1
Default

Quote:
Originally Posted by kevinykuo View Post
R is fine for big data and machine learning at scale because R is an interface language -- you're writing R but really calling implementations in C++ or Java/Scala.
You're correct. However, that doesn't mean that the R code is written efficiently to interface with the C++. In fact, in writing a package, it is not a prerequisite to understand the low level language and therefore any efficient execution is lost.

Quote:
Originally Posted by kevinykuo View Post
Outside of academic intro to stats classes few people are actually using "pure R" implementations of machine learning algorithms.
I agree that they shouldn't be, but it would be naive to suggest that there aren't many people trying to run machine learning algorithms in R. In fact, I'd be willing to bet that a good chunk of people are doing this and in some scenarios, that is probably okay.

I'm suggesting that in cases where clusters aren't being used and the algorithms are being carried out locally, that the Python equivalents will outperform R in almost all cases. I can think of a few scenarios a couple years back where R was tanking and switching over to Python was beneficial. At the time, I was fighting a move away from R because I had been using it for quite some time.

Quote:
Originally Posted by kevinykuo View Post
ML in the real world is mostly H2O, Spark, TensorFlow, etc.
Spark, sure. Tensorflow, ok, although there is and entire rainbow of acceptable packages / modules available for deep learning. I've never used H2O, but just referencing this: https://www.h2o.ai/insurance/ and https://www.h2o.ai/banking/ makes me feel like something is being sold to people with little understanding. I'm going to try to make a note and watch some of the videos of people using H20, but I swear if it is anything like SAS enterprise miner, I'm going to stab someone.

Quote:
Originally Posted by kevinykuo View Post
A Spark ML pipeline written in R/sparklyr is exactly the same as the one you'd get if you'd written it in Scala.
Yeah. They all should be built on and referencing the Spark SQL with the same optimizer.

Quote:
Originally Posted by kevinykuo View Post
one can argue that R has better integration with reporting with RMarkdown/shiny.
I agree that RMarkdown, shiny, and ggplot are some strengths of using R, but I don't think Spark is needed to run these efficiently.

For R, Scala, and Python, each language has its strengths and weaknesses, so it really comes down to what you're doing the most. If one is deploying production level models that run on a day to day basis and are customer facing, I wouldn't choose R to do it. If the project is more ad-hoc, where the focus is on model analysis / reporting and the models don't run live, then R is likely a fine choice.

My impression is that "real" ML technology has not been adopted by the majority of insurance companies and in those scenarios, you can't advocate for base R over base Python with respect to performance across all Machine Learning algorithms.

-Riley
__________________
It is impossible to have a professional forum where the majority of your professionals are anonymous.

Map of Actuarial Hiring Companies
Reply With Quote
Reply

Tags
python, r studio, soa

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 02:29 PM.


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.39675 seconds with 9 queries