I should have added that although Excel is good for data management and
graphing, the statistical components should NOT be trusted. Here is an
excellent summary of the problems:
Excel Doesn't Properly Create Statistics
Note from Bill Donahue: I've just confirmed that the article described
exists: B.D. McCullough and Berry Wilson. 1999. On the accuracy of
statistical procedures in Microsoft Excel 97. Computational Statistics and
Data Analysis, 31(1): 27-37)
FYI for Excel users
Don't rely on the statistics add-in pack supplied for Excel by Microsoft
IT GIVES THE WRONG ANSWERS!
The following is extracted from an article by Jon Honeyball in PC Pro,
Issue 62, December 1999, pp 248-255.
"In front of me right now is a paper entitled "On the accuracy of
statistical procedures in Microsoft Excel 97", reprinted from the Journal
of Computational Statistics and Data Analysis, which is a highly
prestigious, refereed academic journal. I might without exaggeration call
it 'the bible of computational statistics' and there's arguably no higher
reference in the world. The article comes from volume 31, issue 1, 28 July
1999.
The abstract for the paper says: 'The reliability of statistical procedures
in Excel are assessed in three areas: estimation (both linear and
nonlinear), random number generation, and statistical distributions (such
as for calculating p-values). Excel's performance in all three areas is
found to be inadequate. Persons desiring to conduct statistical analyses of
data are advised not to use Excel.'
As an opening statement,you must admit that it's a bit of a corker.
The paper's authors, BD McCullough and Berry Wilson of the Federal
Communications Commission in Washington DC, go on to describe in precise
detail how they applied the recently released StRD (Statistical Reference
Datasets) from the American National Institute of Standards and Technology
to assess the performance of Excel in a wide range of statistical tests.
The results are stunningly bad, and, worse still, the paper refers back to
work done by Sawitski in 1994 on Excel 4 and the problems reported then are
still present in Excel 97. I've run some of the tests myself and they're
still there in Excel 2000. The paper, which can't really be argued with, is
littered with phrases like 'can be judged inadequate' and 'it can be
deduced that Excel uses an unstable algorithm'. The authors find fault with
its univariate summary statistics, analysis of variance, linear regression,
nonlinear regression, random number generation and so forth. What can I
say? If you use the statistics add-on package that ships with Excel, you
really better know your stuff because Excel may well come up with wrong
numbers.
Excel's statistics add-on pack is riddled with potential disaster areas,
and since it has been subjected to the best analysis available in the world
and found to be wholly lacking, the only applicable words are 'avoid' and
'plague'. Instead, you should buy yourself a decent stats add-on package
that has numerical methods that are open to peer review and whose authors
know what they're doing (unfortunately, Microsoft's stats-pack team
obviously doesn't)."
It's not just stats. I've been performing matrix operations using excel
functions, and been getting weird answers. For example, if you use excel
matrix functions to invert a moderate-sized (e.g. 10 by 10) square
symmetric matrix, the inverse is often horribly asymmetric. On the other
hand, it is possible to perform reasonably accurate calculations (I haven't
tried stats) by writing one's own macros in excel.
---