Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

DW Simpson
Actuarial Jobs

Visit our site for the most up to date jobs for actuaries.

Actuarial Salary Surveys
Property & Casualty, Health, Life, Pension and Non-Tradtional Jobs.

Actuarial Meeting Schedule
Browse this year's meetings and which recruiters will attend.

Contact DW Simpson
Have a question?
Let's talk.
You'll be glad you did.


Reply
 
Thread Tools Search this Thread Display Modes
  #381  
Old 04-30-2019, 02:59 PM
Meshuga's Avatar
Meshuga Meshuga is offline
Member
Non-Actuary
 
Join Date: Dec 2001
Posts: 12,776
Default

some good ones here:

Best Chocolate Chip Cookie Bake Off
__________________
I know I don't talk in my sleep. Someone at work would have told me by now.
Reply With Quote
  #382  
Old 05-15-2019, 10:51 AM
Sredni Vashtar's Avatar
Sredni Vashtar Sredni Vashtar is offline
Member
 
Join Date: Mar 2010
Favorite beer: pilseners
Posts: 8,499
Blog Entries: 1
Default

Just grabbed from a tweet in Bernie's Marxism thread.
Not sure I actually like the details that much, but the idea is clever.

__________________
Líhumour est la politesse du dťsespoir
Reply With Quote
  #383  
Old 06-10-2019, 05:06 PM
Olrich Olrich is offline
Member
 
Join Date: Jan 2008
Posts: 162
Default

Not exactly data visualization, but I really liked this interview with Hadley Wickham, who created (among other things) a bunch of the R visualization stuff.

https://www.propublica.org/nerds/had...ta-journalists
Reply With Quote
  #384  
Old 06-10-2019, 06:00 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,529
Blog Entries: 6
Default

Quote:
Originally Posted by Olrich View Post
Not exactly data visualization, but I really liked this interview with Hadley Wickham, who created (among other things) a bunch of the R visualization stuff.

https://www.propublica.org/nerds/had...ta-journalists
I'm going to do a text dump (and then explain)

Quote:
ďYour Default Position Should Be SkepticismĒ and Other Advice for Data Journalists From Hadley Wickham
The chief scientist at RStudio and developer of open source tools for data scientists on bribes, bears and where your next story is hiding.

Spoiler:
So you want to explore the world through data. But how do you actually *do* it?

Hadley Wickham is a leading developer of open source tools for data science and works as the chief scientist at RStudio. We talked with him about interrogating data, what stories might be hiding in the gaps and how bears can really mess things up. What follows is a transcript of our talk, edited for clarity and length.

ProPublica: Youíve talked about the way data visualization can help the process of exploratory data analysis. How would you say this applies to data journalism?

Wickham: Iím not sure whether I should have the answers or you should have the answers! I think the question is: How much of data journalism is reporting the data that you have versus finding the data that you donít have ... but you should have ... or want to have ... that would tell the really interesting story.


Hadley Wickham (Courtesy of Hadley Wickham)
I help teach a data science class at Stanford, and I was just looking through this dataset on emergency room visits in the United States. There is a sample of every emergency visit from like 2013 to 2017 ... and then thereís this really short narrative, a one-sentence description of what caused the accident.

I think thatís a fascinating dataset because there are so many stories in it. I look at the dataset every year, and each time I try and pull out a little different story. This year, I decided to look at knife-related injuries, and there are massive spikes on Memorial Day, Fourth of July, Thanksgiving, Christmas Day and New Yearís.

As a generalist you want to turn that into a story, and there are so many questions you can ask. That kind of exploration is really a warmup. If youíre more of an investigative data journalist, youíre also looking for the data that isnít there. Youíve got to force yourself to think, well, what should I be seeing that Iím not?

ProPublica: Whatís a tip for someone who thinks that they have found something that isnít there. Whatís the next step that you take when you have that intuition?

Wickham: This is one of the things I learned from going to NICAR, which is completely unnatural to me, and thatís picking up a phone and talking to someone. Which I would never do. There is no situation in my life in which I would ever do that unless itís life-threatening emergency.

But, I think thatís when you need to just start talking to people. I remember one little anecdote. I was helping a biology student analyze their field work data, and I was looking at where they collected data over time.

And one year they had no data for a given field. And so I go talk to them. And I was like: ďWell, why is that? This is really weird.Ē

And theyíre like, well, there was a bear in the field that year. And so we couldnít collect any data.

But kind of an interesting story, right?

ProPublica: What advice would you have for editors who are managing or collaborating with highly technical people in a journalism environment but who may not share the same skill set? How can they be effective?

Wickham: Learn a little bit of R and basic data analysis skills. You donít have to be an expert; you donít have to work with particularly large datasets. Itís a matter of finding something in your own life thatís interesting that you want to dig into.

One [recent example]: I noticed on the account from my yoga class, there was a page that has every single yoga class that I had ever taken.

And so I thought it would be kind of fun to take a look at that. See how things change over time. Everyone has little things like that. Youíve got a Google Sheet of information about your neighbors, or your baby, or your cat, or whatever. Just find something in life where you have data that youíre interested in. Just so youíve got that little bit of visceral experience of working with data.

The other challenge is: When youíre really good at something, you make it look easy. And then people who donít know so much are like: ďWow, that looks really easy. It must have taken you 30 minutes to scrape those 15,000 Excel spreadsheets of varying different formats.Ē

Read More

New: You Can Now Search the Full Text of 3 Million Nonprofit Tax Records for Free
Search the full text of nearly 3 million nonprofit IRS filings, including investments and grants given to other nonprofits.
It sounds a little weird, but itís like juggling. If youíre really, really, really good at juggling, you just make it look easy, and people are like: ďOh well. Thatís easy. I can juggle eight balls at a time.Ē And so jugglers deliberately build mistakes into their acts. Iím not saying thatís a good idea for data science, but youíve taken this very hard problem, broken it down into several pieces, made the whole thing look easy. How do you also convey that this is something you had to spend a huge amount of time on? It looks easy now, because Iíve spent so much time on it, not because it was a simple problem.

Data cleaning is hard because it always takes longer than you expect. And itís really, really difficult to predict in advance where the problems are going to lie. At the same time, thatís where you get the value and can do stuff that no one has done before. The easy, clean dataset has already been analyzed to death. If you want something thatís unique and really interesting, youíve got to dig for it.

ProPublica: During that data cleaning process, is that where the journalist comes out? When youíre cleaning up the data but youíre also getting to know it better and youíre figuring out the questions and the gaps?

Wickham: Yeah, absolutely. Thatís one of the things that really irritates me. I think itís easy to go from ďdata cleaningĒ to ďWell, youíve got a data cleaning problem, you should hire a data janitor to take care of it.Ē And itís not this ďjanitorialĒ thing. Actually cleaning your data is when youíre getting to know it intimately. Thatís not something you can hand off to someone else. Itís an absolutely critical part of the data science process.

ProPublica: The perennial question. What makes R an effective environment for data analysis and visualization? What does it offer over other tool sets and platforms?

Wickham: I think you have basically four options. Youíve got R and Python. Youíve got JavaScript, or youíve got something point and click, which obviously encompasses a very, very large number of tools.

The first question you should ask yourself is: Do I want to use something point and clicky, or do I want to use a programming language? It basically comes down to how much time do you spend? Like, if youíre doing data analysis every day, the time it takes to learn a programming language pays off pretty quickly because you can automate more and more of what you do.

And so then, if you decided you wanted to use a programming language, youíve got the choice of doing R or Python or JavaScript. If you want to create really amazing visualizations, I think JavaScript is a place to do it, but I canít imagine doing data cleaning in JavaScript.

So, I think the main competitors are R and Python for all data science work. Obviously, I am tremendously biased because I really love R. Python is awesome, too. But I think the reason that you can start with R is because in R you can learn how to do data science and then you can learn how to program, whereas in Python youíve got to learn programming and data science simultaneously.

R is kind of a bit of a weird creature as a programming language, but one of the advantages is that you can get some basic templates that you copy and paste. You donít have to learn what a function is, exactly. You donít have to learn any programming language jargon. You can just kind of dive in. Whereas with Python youíre gonna learn a little bit more thatís just programming.

ProPublica: Itís true. Iíve tried to make some plots in Python and it was not pretty.

Wickham: Every team I talked to, there are people using R, and there are people using Python, and itís really important to help those people work together. Itís not a war or a competition. People use different tools for different purposes. I think is very important and one project, to that end, it is this thing called Apache Arrow, which Wes [McKinney] has been working on because of this new organization called Ursa.

Read More

Where in the U.S. Are You Most Likely to Be Audited by the IRS?
A new study shows dramatic regional differences in who gets audited. The hardest hit? Poor workers across the country.
Basically, the idea of Apache Arrow is to just to sit down and really think, ďWhat is the best way to store data-science-type data in memory?Ē Letís figure that out. And then once weíve figured it out, letís build a bunch of shared infrastructure. So Python can store the data in the same way. R can store the data in the same way. Java can store the data in the same way. And then you can see, and mostly use, the same data in any programming language. So youíre not popping it about all the time.

ProPublica: Do you think journalists risk making erroneous assumptions about the accuracy of data or drawing inappropriate conclusions, such as mistaking correlation for causation?

Wickham: One of the challenges of data is that if you can quantify something precisely, people interpret it as being more ďtruthy.Ē If youíve got five decimal places of accuracy, people are more likely to just kind of ďbelieve itĒ instead of questioning it. A lot of people forget that pretty much every dataset is collected by a person, or there are many people involved. And if you ignore that, your conclusions are going to possibly be fantastically wrong.

I was judging a data science poster competition, and one of the posters was about food safety and food inspection reports. And I Ö and this probably says something profound about me ... but I immediately think: ďAre there inspectors who are taking bribes, and if there were, how would you spot that from the data?Ē

You shouldnít trust the data until youíve proven that it is trustworthy. Until youíve got another independent way of backing it up, or youíve asked the same question three different ways and you get the same answer three different times. Then you should feel like the data is trustworthy. But until youíve understood the process by which the data has been collected and gathered ... I think you should be very skeptical. Your default position should be skepticism.

ProPublica: Thatís a good fit for us.
I've been on the AO for more than 15 years, and I find value in some of the stuff from the early years of the AO... if I could actually read the original content. The Wayback Machine doesn't capture everything (I learned this very painfully as my own old blog posts disappeared), so I like copying over the text so that if this is pertinent 10+ years from now, I can still point to it.
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #385  
Old 06-13-2019, 07:18 AM
Steve Grondin Steve Grondin is offline
Member
SOA AAA
 
Join Date: Nov 2001
Posts: 6,785
Default

Quote:
Originally Posted by campbell View Post
You aren't kidding
Mexico.on South America.
Greenland in North America
Reply With Quote
  #386  
Old 06-13-2019, 09:07 AM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,529
Blog Entries: 6
Default

Visual Capitalist has some good data sets and awful visualizations.
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #387  
Old 06-16-2019, 08:07 AM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,529
Blog Entries: 6
Default

https://medium.economist.com/mistake...w-8cdd8a42d368

The Economist - looks at bad graphs in its own publications (and fixes it)
















__________________
It's STUMP

LinkedIn Profile

Last edited by campbell; 06-16-2019 at 08:12 AM..
Reply With Quote
  #388  
Old 06-23-2019, 04:19 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,529
Blog Entries: 6
Default

this site has so many bad visualizations

but here's my favorite bad visualization from them. Currently.

https://howmuch.net/articles/state-o...overnment-debt

__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #389  
Old 06-23-2019, 11:18 PM
vjvj's Avatar
vjvj vjvj is offline
Note Contributor
Non-Actuary
 
Join Date: Nov 2005
Location: IL
Studying for MFE
Posts: 7,688
Default

Wow.
__________________
.
PLEASE SUPPORT CHILDREN'S CANCER RESEARCH

TO DONATE, OR FOR MORE INFO, CLICK HERE
Reply With Quote
  #390  
Old 06-24-2019, 09:16 PM
Knoath Knoath is offline
Member
CAS
 
Join Date: Oct 2015
Posts: 89
Default

I guess that rules out government debt as a possible reason for protest in Hong Kong.
Reply With Quote
Reply

Tags
data science, predictive analytics

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 07:33 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.16606 seconds with 12 queries