IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

data analysis software for classes

Eric Ribbens E-Ribbens at wiu.edu
Wed Oct 4 13:52:01 EST 2000


I should have added that although Excel is good for data management and 
graphing, the statistical components should NOT be trusted. Here is an 
excellent summary of the problems:


		Excel Doesn't Properly Create Statistics

Note from Bill Donahue: I've just confirmed that the article described 
exists: B.D. McCullough and Berry Wilson. 1999. On the accuracy of 
statistical procedures in Microsoft Excel 97. Computational Statistics and 
Data Analysis, 31(1): 27-37)

FYI for Excel users

Don't rely on the statistics add-in pack supplied for Excel by Microsoft

IT GIVES THE WRONG ANSWERS!

The following is extracted from an article by Jon Honeyball in PC Pro, 
Issue 62, December 1999, pp 248-255.

"In front of me right now is a paper entitled "On the accuracy of 
statistical procedures in Microsoft Excel 97", reprinted from the Journal 
of Computational Statistics and Data Analysis, which is a highly 
prestigious, refereed academic journal. I might without exaggeration call 
it 'the bible of computational statistics' and there's arguably no higher 
reference in the world. The article comes from volume 31, issue 1, 28 July 
1999.

The abstract for the paper says: 'The reliability of statistical procedures 
in Excel are assessed in three areas: estimation (both linear and 
nonlinear), random number generation, and statistical distributions (such 
as for calculating p-values). Excel's performance in all three areas is 
found to be inadequate. Persons desiring to conduct statistical analyses of 
data are advised not to use Excel.'

As an opening statement,you must admit that it's a bit of a corker.

The paper's authors, BD McCullough and Berry Wilson of the Federal 
Communications Commission in Washington DC, go on to describe in precise 
detail how they applied the recently released StRD (Statistical Reference 
Datasets) from the American National Institute of Standards and Technology 
to assess the performance of Excel in a wide range of statistical tests. 
The results are stunningly bad, and, worse still, the paper refers back to 
work done by Sawitski in 1994 on Excel 4 and the problems reported then are 
still present in Excel 97. I've run some of the tests myself and they're 
still there in Excel 2000. The paper, which can't really be argued with, is 
littered with phrases like 'can be judged inadequate' and 'it can be 
deduced that Excel uses an unstable algorithm'. The authors find fault with 
its univariate summary statistics, analysis of variance, linear regression, 
nonlinear regression, random number generation and so forth. What can I 
say? If you use the statistics add-on package that ships with Excel, you 
really better know your stuff because Excel may well come up with wrong 
numbers.

Excel's statistics add-on pack is riddled with potential disaster areas, 
and since it has been subjected to the best analysis available in the world 
and found to be wholly lacking, the only applicable words are 'avoid' and 
'plague'. Instead, you should buy yourself a decent stats add-on package 
that has numerical methods that are open to peer review and whose authors 
know what they're doing (unfortunately, Microsoft's stats-pack team 
obviously doesn't)."

It's not just stats. I've been performing matrix operations using excel 
functions, and been getting weird answers. For example, if you use excel 
matrix functions to invert a moderate-sized (e.g. 10 by 10) square 
symmetric matrix, the inverse is often horribly asymmetric. On the other 
hand, it is possible to perform reasonably accurate calculations (I haven't 
tried stats) by writing one's own macros in excel.


---






More information about the Plant-ed mailing list

Send comments to us at archive@iubioarchive.bio.net