Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Reply
 
Thread Tools Display Modes
  #11  
Old 10-10-2017, 09:01 AM
venisonKurry venisonKurry is offline
Member
 
Join Date: Apr 2010
Posts: 323
Default

give sparklyr a shot

https://github.com/rstudio/sparklyr
Reply With Quote
  #12  
Old 10-13-2017, 06:03 PM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 332
Default

I wonder if Azure throttles transfers for cheap/demo accounts but exempts published tutorial datasets from throttling.

That aside, it may be appropriate to look at some optimizations. It's easier than you think to implement a task queue that's serviced by multiple threads. This improves performance even if the CPU isn't the bottleneck because you are preventing disk and memory from being idle while a single thread handler is tied up in a CPU intensive section.
Reply With Quote
  #13  
Old 10-13-2017, 10:15 PM
BG5150's Avatar
BG5150 BG5150 is offline
Member
Non-Actuary
 
Join Date: Jan 2009
Favorite beer: the one you're buying me
Posts: 19,183
Default

Hire some Indians to do it.
Reply With Quote
  #14  
Old 10-14-2017, 08:21 AM
Colonel Smoothie's Avatar
Colonel Smoothie Colonel Smoothie is online now
Member
CAS
 
Join Date: Sep 2010
College: Jamba Juice University
Favorite beer: AO Amber Ale
Posts: 44,693
Default

Quote:
Originally Posted by DiscreteAndDiscreet View Post
I wonder if Azure throttles transfers for cheap/demo accounts but exempts published tutorial datasets from throttling.

That aside, it may be appropriate to look at some optimizations. It's easier than you think to implement a task queue that's serviced by multiple threads. This improves performance even if the CPU isn't the bottleneck because you are preventing disk and memory from being idle while a single thread handler is tied up in a CPU intensive section.
I was able to download at gigabit speed on Azure. I was only able to load about 4 torrents at a time and I also encountered issues where downloading would just suddenly stop. Not sure if it was because of azure or the seeding side.

Decompressing the data took three days. I assigned multiple threads.

Sql ended up just taking way too long. I'm using spark on HDI and it's going really well.
__________________
Recommended Readings for the EL Actuary || Recommended Readings for the EB Actuary

Quote:
Originally Posted by Wigmeister General View Post
Don't you even think about sending me your resume. I'll turn it into an origami boulder and return it to you.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 10:58 PM.


Powered by vBulletin®
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.26580 seconds with 11 queries