Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Reply
 
Thread Tools Search this Thread Display Modes
  #321  
Old 08-28-2016, 11:22 PM
Marcie's Avatar
Marcie Marcie is offline
Member
CAS
 
Join Date: Feb 2015
Posts: 9,679
Default

Quote:
Originally Posted by campbell View Post
- ... people using Excel for serious purposes ...
Reply With Quote
  #322  
Old 08-29-2016, 01:32 AM
Fish Actuary's Avatar
Fish Actuary Fish Actuary is offline
Member
Non-Actuary
 
Join Date: Jun 2006
Location: Australia
Posts: 12,750
Default

Quote:
Originally Posted by Steve Grondin View Post
My guess is that Excel was used to manipulate and save raw data from other systems without the appropriate discipline to maintain data types. The most frequent issue I see is CUSIPs like nnnnnnEnn turned into scientific notation.
From my <limited> experience working with genomics data and helping others, Excel is mostly used for light editing as most of the analysis is done using Python, R and various unix-based scripts. I'd suspect that most of the Excel based errors are going to show up at the end of the analysis rather than the beginning as if Excel messed things up at the beginning, much of the rest of your analysis would blow-up leading to most reasonably competent researchers going back and figuring out what the problem is.

However, that being said, I work with people who have probably published in the journals that were screened. It might be tempting to give a talk on this paper the next time it's my turn to present at the weekly lab meeting. I may be able to name and shame some perpetrators...
Reply With Quote
  #323  
Old 08-29-2016, 07:14 AM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,016
Blog Entries: 6
Default

Washington Post:
https://www.washingtonpost.com/news/...business_pop_b

Quote:
An alarming number of scientific papers contain Excel errors

A surprisingly high number of scientific papers in the field of genetics contain errors introduced by Microsoft Excel, according to an analysis recently published in the journal Genome Biology.

A team of Australian researchers analyzed nearly 3,600 genetics papers published in a number of leading scientific journals — like Nature, Science and PLoS One. As is common practice in the field, these papers all came with supplementary files containing lists of genes used in the research.

The Australian researchers found that roughly 1 in 5 of these papers included errors in their gene lists that were due to Excel automatically converting gene names to things like calendar dates or random numbers.

A team of Australian researchers analyzed nearly 3,600 genetics papers published in a number of leading scientific journals — like Nature, Science and PLoS One. As is common practice in the field, these papers all came with supplementary files containing lists of genes used in the research.

The Australian researchers found that roughly 1 in 5 of these papers included errors in their gene lists that were due to Excel automatically converting gene names to things like calendar dates or random numbers.

[This new model for training scientists could create a conflict of interest]

You see, genes are often referred to in scientific literature by symbols — essentially shortened versions of full gene names. The gene "Septin 2" is typically shortened as SEPT2. "Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase" gets mercifully shortened to MARCH1.

But when you type these shortened gene names into Excel, the program automatically assumes they refer to dates — Sept. 2 and March 1, respectively. If you type SEPT2 into a default Excel cell, it magically becomes "2-Sep." It's stored by the program as the date 9/2/2016.

Even worse, there's no easy way to undo this automatic formatting once it has happened. Edit -> Undo simply deletes everything in the cell. You can try to convert the formatting from "General," the default, to "Text," which you might expect to change it back to the original characters you enter. But instead, changing the formatting to "Text" makes the cell contents appear as 42615 — Excel's internal numeric code referring to the date 9/2/2016.

Even more troubling, the researchers note that there's no way to permanently disable automatic date formatting within Excel. Researchers still have to remember to manually format columns to "Text" before you type anything in new Excel sheets — every. single. time.

....
The Australian researchers note that Excel isn't the only spreadsheet program with overly aggressive autoformatting issues — the same errors crop up in open-source programs like LibreOffice Calc and Apache OpenOffice Calc too.

They do note, however, that one perfectly free spreadsheet program did not have any issues storing the gene names as typed — Google Sheets.

__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #324  
Old 08-29-2016, 09:57 AM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,016
Blog Entries: 6
Default

spreadsheet complexity:
http://www.eusprig.org/treschenhofer-2016.pdf

Quote:
A Selection of Complexity Metrics
160707 Reschenhofer et al. - A Conceptual Model for Measuring the Complexity of Spreadsheets @ Annual EuSpRIG Conference 2016 © sebis 15
• We selected metrics from related literature originating from the domain of
Software Engineering and Linguistics
Software Engineering
• Average/Max AST depth per formula
• Number/Ratio of formula cells (to non-empty cells)
• Number/Ratio of input cells (to non-empty cells)
• Number of distinct formulas
• Average/Max fan-out per formula
• Average/Max fan-in per formula
• Average/Max number of conditionals per formula
• Average/Max spreading factor per formula


Linguistics
• Average/Max number of
functions per formula
• Average/Max number of
distinct functions per
formula
• Average/Max number of
elements per formula


• Each aspect/concept as defined by the conceptual model is captured by at
least one metric


https://arxiv.org/ftp/arxiv/papers/1209/1209.3517.pdf

Quote:
Measuring Spreadsheet Formula Understandability
Felienne Hermans, Martin Pinzger & Arie van Deursen
Mekelweg 4, Delft
{f.f.j.hermans, m.pinzger, arie.vandeursen}@tudelft.nl
ABSTRACT
Spreadsheets are widely used in industry, because they are flexible and easy to use. Sometimes they are even used
for business-critical applications. It is however difficult for spreadsheet users to correctly assess the quality of
spreadsheets, especially with respect to their understandability. Understandability of spreadsheets is important,
since spreadsheets often have a long lifespan, during which they are used by several users. In this paper, we
establish a set of spreadsheet understandability metrics. We start by studying related work and interviewing 40
spreadsheet professionals to obtain a set of characteristics that might contribute to understandability problems in
spreadsheets. Based on those characteristics we subsequently determine a number of understandability metrics. To
evaluate the usefulness of our metrics, we conducted a series of experiments in which professional spreadsheet users
performed a number of small maintenance tasks on a set of spreadsheets from the EUSES spreadsheet corpus. We
subsequently calculate the correlation between the metrics and the performance of subjects on these tasks. The
results clearly indicate that the number of ranges, the nesting depth and the presence of conditional operations in
formulas significantly increase the difficulty of understanding a spreadsheet.
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #325  
Old 09-06-2016, 01:54 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,016
Blog Entries: 6
Default

http://www.reuters.com/article/us-so...-idUSKCN11635K

Quote:
Lazard Ltd (LAZ.N), the investment bank that advised SolarCity Corp (SCTY.O) on its $2.6 billion sale to Tesla Motors Inc (TSLA.O), made an error in its analysis that discounted the value of the U.S. solar energy company by $400 million, a regulatory filing by Tesla showed on Wednesday.
.....
An analysis by Lazard for SolarCity that indicated an equity value of between $14.75 and $34.00 per share was wrong because it double-counted some of the company's projected indebtedness, according to Tesla's filing with the U.S. Securities and Exchange Commission.

This was the result of a computational error "in certain SolarCity spreadsheets setting forth SolarCity’s financial information that Lazard used in its discounted cash flow valuation analyses," according to the filing.

The error was not included in the valuation analysis performed by Tesla and its financial adviser, Evercore Partners Inc (EVR.N), the filing said.

After becoming aware of the mistake on Aug. 18, more than two weeks after the signing of the deal, Lazard realized the accurate valuation range was $18.75 to $37.75 per share.

__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #326  
Old 09-12-2016, 09:47 AM
Baron von Bootstrap's Avatar
Baron von Bootstrap Baron von Bootstrap is offline
Non-Actuary
 
Join Date: Jan 2016
Posts: 28
Default

From The Economist:
http://www.economist.com/blogs/graph...dsciencepapers

Quote:
THREE years ago Thomas Herndon, a young graduate student from the University of Massachusetts Amherst, made a splash among economists. Given an assignment to replicate the analysis behind a published academic paper, he pored over the data used for an influential study on government debt written by Carmen Reinhart and Kenneth Rogoff, both professors at Harvard. Much to the authors’ embarrassment, Mr Herndon found the most elementary of mistakes: they had accidentally omitted five rows of their spreadsheet when calculating an average. When included, the missing figures weakened the paper’s conclusion substantially.

Unsurprisingly, spreadsheet snafus are not unique to economics. A recent study in the journal Genome Biology looked at papers published between 2005 and 2015, and found spreadsheet-related errors in fully one-fifth of articles on genomics that provided supplementary data alongside their text. Although the papers themselves were not necessarily affected, such bugs can create complications for other scientists trying to replicate or build on previous work.

The syntax of genomics makes it particularly difficult for off-the-shelf software to digest. First, spreadsheets often confuse gene symbols for dates. The authors found that Microsoft Excel would often interpret “SEPT2”, which corresponds to the gene Septin 2, as “September 2nd”. The programme also tended to mistake identification codes like “2310009E13” for numbers in scientific notation—in this particular instance, the code would be read as 2.310009 times 1019. Such hiccups can be quickly resolved by diligent researchers. But they are easy to overlook, and analysts using more sophisticated software may not know to check for them.

The paper’s most worrisome finding is that mistakes seem to be proliferating. The authors find that the number of genomics papers packaged with error-ridden spreadsheets is increasing by 15% a year, far above the 4% annual growth rate in the number of genomics papers published. If we extrapolate current trends indefinitely into the future, then by 2025 every spreadsheet attached to a genetics paper will have an error—unless, of course, there is an error in the spreadsheet we used for this calculation.
Reply With Quote
  #327  
Old 12-01-2016, 04:15 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,016
Blog Entries: 6
Default

spreadsheet error with election connection:

http://www.wrn.com/2016/11/wisconsin...t-3-5-million/

Quote:
Wisconsin presidential recount will cost $3.9 million

......
EDIT: The WEC announced Tuesday afternoon that a spreadsheet error caused them to incorrectly calculate the total cost of the recount. The revised cost estimate has climbed from $3.5 million to $3.9 million.
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
  #328  
Old 12-01-2016, 05:06 PM
whisper's Avatar
whisper whisper is online now
Member
CAS AAA
 
Join Date: Jan 2002
Location: Chicago
Favorite beer: Hefewizen
Posts: 37,605
Default

Quote:
Originally Posted by campbell View Post
spreadsheet error with election connection:

http://www.wrn.com/2016/11/wisconsin...t-3-5-million/

Gives you confidence on the recount....
Reply With Quote
  #329  
Old 12-01-2016, 05:13 PM
1695814's Avatar
1695814 1695814 is offline
Member
SOA AAA
 
Join Date: Aug 2002
Studying for nothing. I quit.
Favorite beer: Root
Posts: 36,668
Default

Quote:
Originally Posted by whisper View Post

Gives you confidence on the recount....
Hey, as long as they're not doing the recount with a spreadsheet...
Reply With Quote
  #330  
Old 12-01-2016, 05:15 PM
campbell's Avatar
campbell campbell is offline
Mary Pat Campbell
SOA AAA
 
Join Date: Nov 2003
Location: NY
Studying for duolingo and coursera
Favorite beer: Murphy's Irish Stout
Posts: 89,016
Blog Entries: 6
Default

If they were using spreadsheets for election counts (or recounts), I would definitely be questioning the reliability of results.

I assume the spreadsheets are just for cost estimates. Probably missed a row when doing a SUM
__________________
It's STUMP

LinkedIn Profile
Reply With Quote
Reply

Tags
machine learning, predictive analytics, speadsheet error, spreadsheets

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 09:34 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.25951 seconds with 10 queries