Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

Salary Surveys
Property & Casualty, Life, Health & Pension

Health Actuary Jobs
Insurance & Consulting jobs for Students, Associates & Fellows

Actuarial Recruitment
Visit DW Simpson's website for more info.
www.dwsimpson.com/about

Casualty Jobs
Property & Casualty jobs for Students, Associates & Fellows


Reply
 
Thread Tools Search this Thread Display Modes
  #381  
Old 04-30-2019, 02:59 PM
Meshuga's Avatar
Meshuga Meshuga is offline
Member
Non-Actuary
 
Join Date: Dec 2001
Posts: 12,664
Default

some good ones here:

Best Chocolate Chip Cookie Bake Off
__________________
I know I don't talk in my sleep. Someone at work would have told me by now.
Reply With Quote
  #382  
Old 05-15-2019, 10:51 AM
Sredni Vashtar's Avatar
Sredni Vashtar Sredni Vashtar is offline
Member
 
Join Date: Mar 2010
Favorite beer: pilseners
Posts: 8,192
Blog Entries: 1
Default

Just grabbed from a tweet in Bernie's Marxism thread.
Not sure I actually like the details that much, but the idea is clever.

__________________
L’humour est la politesse du désespoir
Reply With Quote
  #383  
Old 06-10-2019, 05:06 PM
Olrich Olrich is offline
Member
 
Join Date: Jan 2008
Posts: 162
Default

Not exactly data visualization, but I really liked this interview with Hadley Wickham, who created (among other things) a bunch of the R visualization stuff.

https://www.propublica.org/nerds/had...ta-journalists
Reply With Quote
  #384  
Old 06-10-2019, 06:00 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 88,614
Blog Entries: 6
Default

Quote:
Originally Posted by Olrich View Post
Not exactly data visualization, but I really liked this interview with Hadley Wickham, who created (among other things) a bunch of the R visualization stuff.

https://www.propublica.org/nerds/had...ta-journalists
I'm going to do a text dump (and then explain)

Quote:
“Your Default Position Should Be Skepticism” and Other Advice for Data Journalists From Hadley Wickham
The chief scientist at RStudio and developer of open source tools for data scientists on bribes, bears and where your next story is hiding.

Spoiler:
So you want to explore the world through data. But how do you actually *do* it?

Hadley Wickham is a leading developer of open source tools for data science and works as the chief scientist at RStudio. We talked with him about interrogating data, what stories might be hiding in the gaps and how bears can really mess things up. What follows is a transcript of our talk, edited for clarity and length.

ProPublica: You’ve talked about the way data visualization can help the process of exploratory data analysis. How would you say this applies to data journalism?

Wickham: I’m not sure whether I should have the answers or you should have the answers! I think the question is: How much of data journalism is reporting the data that you have versus finding the data that you don’t have ... but you should have ... or want to have ... that would tell the really interesting story.


Hadley Wickham (Courtesy of Hadley Wickham)
I help teach a data science class at Stanford, and I was just looking through this dataset on emergency room visits in the United States. There is a sample of every emergency visit from like 2013 to 2017 ... and then there’s this really short narrative, a one-sentence description of what caused the accident.

I think that’s a fascinating dataset because there are so many stories in it. I look at the dataset every year, and each time I try and pull out a little different story. This year, I decided to look at knife-related injuries, and there are massive spikes on Memorial Day, Fourth of July, Thanksgiving, Christmas Day and New Year’s.

As a generalist you want to turn that into a story, and there are so many questions you can ask. That kind of exploration is really a warmup. If you’re more of an investigative data journalist, you’re also looking for the data that isn’t there. You’ve got to force yourself to think, well, what should I be seeing that I’m not?

ProPublica: What’s a tip for someone who thinks that they have found something that isn’t there. What’s the next step that you take when you have that intuition?

Wickham: This is one of the things I learned from going to NICAR, which is completely unnatural to me, and that’s picking up a phone and talking to someone. Which I would never do. There is no situation in my life in which I would ever do that unless it’s life-threatening emergency.

But, I think that’s when you need to just start talking to people. I remember one little anecdote. I was helping a biology student analyze their field work data, and I was looking at where they collected data over time.

And one year they had no data for a given field. And so I go talk to them. And I was like: “Well, why is that? This is really weird.”

And they’re like, well, there was a bear in the field that year. And so we couldn’t collect any data.

But kind of an interesting story, right?

ProPublica: What advice would you have for editors who are managing or collaborating with highly technical people in a journalism environment but who may not share the same skill set? How can they be effective?

Wickham: Learn a little bit of R and basic data analysis skills. You don’t have to be an expert; you don’t have to work with particularly large datasets. It’s a matter of finding something in your own life that’s interesting that you want to dig into.

One [recent example]: I noticed on the account from my yoga class, there was a page that has every single yoga class that I had ever taken.

And so I thought it would be kind of fun to take a look at that. See how things change over time. Everyone has little things like that. You’ve got a Google Sheet of information about your neighbors, or your baby, or your cat, or whatever. Just find something in life where you have data that you’re interested in. Just so you’ve got that little bit of visceral experience of working with data.

The other challenge is: When you’re really good at something, you make it look easy. And then people who don’t know so much are like: “Wow, that looks really easy. It must have taken you 30 minutes to scrape those 15,000 Excel spreadsheets of varying different formats.”

Read More

New: You Can Now Search the Full Text of 3 Million Nonprofit Tax Records for Free
Search the full text of nearly 3 million nonprofit IRS filings, including investments and grants given to other nonprofits.
It sounds a little weird, but it’s like juggling. If you’re really, really, really good at juggling, you just make it look easy, and people are like: “Oh well. That’s easy. I can juggle eight balls at a time.” And so jugglers deliberately build mistakes into their acts. I’m not saying that’s a good idea for data science, but you’ve taken this very hard problem, broken it down into several pieces, made the whole thing look easy. How do you also convey that this is something you had to spend a huge amount of time on? It looks easy now, because I’ve spent so much time on it, not because it was a simple problem.

Data cleaning is hard because it always takes longer than you expect. And it’s really, really difficult to predict in advance where the problems are going to lie. At the same time, that’s where you get the value and can do stuff that no one has done before. The easy, clean dataset has already been analyzed to death. If you want something that’s unique and really interesting, you’ve got to dig for it.

ProPublica: During that data cleaning process, is that where the journalist comes out? When you’re cleaning up the data but you’re also getting to know it better and you’re figuring out the questions and the gaps?

Wickham: Yeah, absolutely. That’s one of the things that really irritates me. I think it’s easy to go from “data cleaning” to “Well, you’ve got a data cleaning problem, you should hire a data janitor to take care of it.” And it’s not this “janitorial” thing. Actually cleaning your data is when you’re getting to know it intimately. That’s not something you can hand off to someone else. It’s an absolutely critical part of the data science process.

ProPublica: The perennial question. What makes R an effective environment for data analysis and visualization? What does it offer over other tool sets and platforms?

Wickham: I think you have basically four options. You’ve got R and Python. You’ve got JavaScript, or you’ve got something point and click, which obviously encompasses a very, very large number of tools.

The first question you should ask yourself is: Do I want to use something point and clicky, or do I want to use a programming language? It basically comes down to how much time do you spend? Like, if you’re doing data analysis every day, the time it takes to learn a programming language pays off pretty quickly because you can automate more and more of what you do.

And so then, if you decided you wanted to use a programming language, you’ve got the choice of doing R or Python or JavaScript. If you want to create really amazing visualizations, I think JavaScript is a place to do it, but I can’t imagine doing data cleaning in JavaScript.

So, I think the main competitors are R and Python for all data science work. Obviously, I am tremendously biased because I really love R. Python is awesome, too. But I think the reason that you can start with R is because in R you can learn how to do data science and then you can learn how to program, whereas in Python you’ve got to learn programming and data science simultaneously.

R is kind of a bit of a weird creature as a programming language, but one of the advantages is that you can get some basic templates that you copy and paste. You don’t have to learn what a function is, exactly. You don’t have to learn any programming language jargon. You can just kind of dive in. Whereas with Python you’re gonna learn a little bit more that’s just programming.

ProPublica: It’s true. I’ve tried to make some plots in Python and it was not pretty.

Wickham: Every team I talked to, there are people using R, and there are people using Python, and it’s really important to help those people work together. It’s not a war or a competition. People use different tools for different purposes. I think is very important and one project, to that end, it is this thing called Apache Arrow, which Wes [McKinney] has been working on because of this new organization called Ursa.

Read More

Where in the U.S. Are You Most Likely to Be Audited by the IRS?
A new study shows dramatic regional differences in who gets audited. The hardest hit? Poor workers across the country.
Basically, the idea of Apache Arrow is to just to sit down and really think, “What is the best way to store data-science-type data in memory?” Let’s figure that out. And then once we’ve figured it out, let’s build a bunch of shared infrastructure. So Python can store the data in the same way. R can store the data in the same way. Java can store the data in the same way. And then you can see, and mostly use, the same data in any programming language. So you’re not popping it about all the time.

ProPublica: Do you think journalists risk making erroneous assumptions about the accuracy of data or drawing inappropriate conclusions, such as mistaking correlation for causation?

Wickham: One of the challenges of data is that if you can quantify something precisely, people interpret it as being more “truthy.” If you’ve got five decimal places of accuracy, people are more likely to just kind of “believe it” instead of questioning it. A lot of people forget that pretty much every dataset is collected by a person, or there are many people involved. And if you ignore that, your conclusions are going to possibly be fantastically wrong.

I was judging a data science poster competition, and one of the posters was about food safety and food inspection reports. And I … and this probably says something profound about me ... but I immediately think: “Are there inspectors who are taking bribes, and if there were, how would you spot that from the data?”

You shouldn’t trust the data until you’ve proven that it is trustworthy. Until you’ve got another independent way of backing it up, or you’ve asked the same question three different ways and you get the same answer three different times. Then you should feel like the data is trustworthy. But until you’ve understood the process by which the data has been collected and gathered ... I think you should be very skeptical. Your default position should be skepticism.

ProPublica: That’s a good fit for us.
I've been on the AO for more than 15 years, and I find value in some of the stuff from the early years of the AO... if I could actually read the original content. The Wayback Machine doesn't capture everything (I learned this very painfully as my own old blog posts disappeared), so I like copying over the text so that if this is pertinent 10+ years from now, I can still point to it.
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #385  
Old 06-13-2019, 07:18 AM
Steve Grondin Steve Grondin is offline
Member
SOA AAA
 
Join Date: Nov 2001
Posts: 6,684
Default

Quote:
Originally Posted by campbell View Post
You aren't kidding
Mexico.on South America.
Greenland in North America
Reply With Quote
  #386  
Old 06-13-2019, 09:07 AM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 88,614
Blog Entries: 6
Default

Visual Capitalist has some good data sets and awful visualizations.
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #387  
Old 06-16-2019, 08:07 AM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 88,614
Blog Entries: 6
Default

https://medium.economist.com/mistake...w-8cdd8a42d368

The Economist - looks at bad graphs in its own publications (and fixes it)
















__________________
It's STUMP

LinkedIn Profile

Last edited by campbell; 06-16-2019 at 08:12 AM..
Reply With Quote
  #388  
Old 06-23-2019, 04:19 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 88,614
Blog Entries: 6
Default

this site has so many bad visualizations

but here's my favorite bad visualization from them. Currently.

https://howmuch.net/articles/state-o...overnment-debt

__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #389  
Old 06-23-2019, 11:18 PM
vjvj's Avatar
vjvj vjvj is offline
Note Contributor
Non-Actuary
 
Join Date: Nov 2005
Location: IL
Studying for MFE
Posts: 7,625
Default

Wow.
__________________
.
PLEASE SUPPORT CHILDREN'S CANCER RESEARCH

TO DONATE, OR FOR MORE INFO, CLICK HERE
Reply With Quote
  #390  
Old 06-24-2019, 09:16 PM
Knoath Knoath is offline
Member
CAS
 
Join Date: Oct 2015
Posts: 82
Default

I guess that rules out government debt as a possible reason for protest in Hong Kong.
Reply With Quote
Reply

Tags
data science, predictive analytics

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 08:42 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.16615 seconds with 12 queries