Mailing List Archive

Explicit is better than Implicit
Hmmm... Rename genes, fix Excel, or dump Excel in favor of Python? I know
what my choice would have been. :-)

https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates


Skip
--
https://mail.python.org/mailman/listinfo/python-list
Re: Explicit is better than Implicit [ In reply to ]
On 07/08/2020 05:33, Skip Montanaro wrote:
> Hmmm... Rename genes, fix Excel, or dump Excel in favor of Python? I know
> what my choice would have been. :-)
>
> https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates


At the risk of screaming off-topic...

The article does point-out that MS-Excel is attempting to be helpful in
identifying data, and thus formatting it appropriately. The human-error
is exposed: "opens the same spreadsheet in Excel without thinking,
errors will be introduced". So, should the mistake be laid at the feet
of the tool?
(No matter that I/many of us here will agree with your
preference/predilection!)

The reason that a Python solution would not have this problem is less to
do with Python, or even Gene nomenclature. It is because when we
(professional projects) code a solution, we proceed through
design-stages. We think about the data to be transformed, as well as the
process of transformation itself.

Of course, if we develop-by-prototype: adding a chunk of code 'here' and
another chunk 'there', with no top-down view; the very same sort of
problem could so-easily occur!
- despite and/or because of Python's fast-and-loose dynamic typing, for
example.

I postulate that the issue really stems from MSFT's Training Approach.
They start from the level of 'here is a column of numbers let's total
them', and then run through every command on the menus/ribbon. Their
material rarely talks about 'design' - and few individuals have the
patience/are afforded the budget, for the 'advanced courses' that do! NB
the same applies to MS-Word, etc.

MS-Excel (or better: LibreOffice Calc, etc, from the F/LOSS stable) is a
powerful tool with the additional virtue that it is easy to use. Thus,
people are able to concentrate on the demands of their own speciality,
and use of the tool becomes 'automatic' or 'muscle memory'. A mark of
"success" if ever there was one!

Unfortunately, this forms the mind-set of folk creating a worksheet in
an organic (prototype-as-product/design-less) fashion, and certainly
when picking-up someone else's spreadsheet (per quote, above).

However, the article continues to describe the tool: “It’s a widespread
tool and if you are a bit computationally illiterate you will use it"
and using any tool - particularly when also using someone else's data,
without over-view thought, is a bit like the old prank of asking some
'innocent' to "format c:" - and ultimately, as fatal.

If we started an MS-product solution from 'design', then we would
commence with templates and styles - that column of the worksheet would
be formatted as a string, eg "MARCH3", and not left to MS-Excel's
'intelligence'/tender mercies.

So, is it an Excel-problem? Is it a human-laziness problem? Is it plain
ignorance? Is it a training/learning issue?

We expect people driving a car to know how to drive - without expecting
them to be professional drivers (racers or truckies). Why don't we
expect people manipulating statistics and other forms of information to
be appropriately-able?


That they would alter the jargon and thinking of an entire discipline to
suit the sub-standard, overly-bossy, commonly-used tool is surely
'putting the cart before the horse'...

That said, names do matter. How often do you search the web for some
detail of/in Python and find an insinuation of snakes nestled amongst
the results - or someone thinks that it is time for a joke about
swallows or parrots? I don't have time to imagine how the folks who use
C or R manage!


PS programming languages also include 'danger zones'. Early in my career
I found a similar embarrassment of 'infallible belief in the tool', with
the same consequence of research papers containing erroneous
numbers/bases/conclusions being published. A suite of programs declared
storage 'arrays' and populated them with (knowingly) incomplete data
(reasonably complete, not exactly "sparse") - but forgetting that this
technology required the data-arrays to be zeroed first! So, random data
from previous use of the same storage area, in random formats, threw all
manner of 'spanners in the works'. When you take such news to your boss
and colleagues, do NOT even try to convince yourself that they will not
"shoot the messenger"!
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list
Re: Explicit is better than Implicit [ In reply to ]
On 07Aug2020 09:40, DL Neil <PythonList@DancesWithMice.info> wrote:
>On 07/08/2020 05:33, Skip Montanaro wrote:
>>Hmmm... Rename genes, fix Excel, or dump Excel in favor of Python? I know
>>what my choice would have been. :-)
>>
>>https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates
>
>
>At the risk of screaming off-topic...
>
>The article does point-out that MS-Excel is attempting to be helpful
>in identifying data, and thus formatting it appropriately. The
>human-error is exposed: "opens the same spreadsheet in Excel without
>thinking, errors will be introduced". So, should the mistake be laid at
>the feet of the tool?

The tool. Anything that reaches into my data and silently decides to
misinterpret it is busted, particularly if if it is silent,
unreversible, and untunable.

When I read a CSV file, quoted strings are just strings. "8.1" is a
string. And so forth.

When Excel reads a file, it looks for stuff and decides to upgrade its
type. Eg dates etc (particularly pernicious with US-style dates versus
the rest of the planet). Mojibake for data ensues.

As always, I am reminded of Heuer's Razor:

If it can't be turned off, it's not a feature. - Karl Heuer

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Explicit is better than Implicit [ In reply to ]
>
> When Excel reads a file, it looks for stuff and decides to upgrade its
> type. Eg dates etc (particularly pernicious with US-style dates versus
> the rest of the planet). Mojibake for data ensues.
>
> As always, I am reminded of Heuer's Razor:
>
> If it can't be turned off, it's not a feature. - Karl Heuer
>

Good one. I always remember the start-up days (late 90s) when I developed
and maintained an online concert calendar (Musi-Cal) written in Python. The
technology got bought by another start-up (Mojam) who used Perl for their
web stuff. Both front end systems talked to my Python-based back end
(communication between both front ends and the one back end was via
XML-RPC). I was sometimes frustrated by the stuff Perl did. The one which
stuck with me all these years was its silent conversion of the band name "
311 <https://en.wikipedia.org/wiki/311_(band)>" to the integer 311 on which
my Python backend obligingly barfed. I eventually had to put in data type
checks for all fields in my back end (my front end already had that sort of
input validation) as I could no longer assume a sentient front end was
handing it data.

Skip
--
https://mail.python.org/mailman/listinfo/python-list
Re: Explicit is better than Implicit [ In reply to ]
On Thu, Aug 06, 2020 at 07:19:01PM -0500, Skip Montanaro wrote:
> >
> > When Excel reads a file, it looks for stuff and decides to upgrade its
> > type. Eg dates etc (particularly pernicious with US-style dates versus
> > the rest of the planet). Mojibake for data ensues.
> >
> > As always, I am reminded of Heuer's Razor:
> >
> > If it can't be turned off, it's not a feature. - Karl Heuer
> >
>
> Good one. I always remember the start-up days (late 90s) when I developed
> and maintained an online concert calendar (Musi-Cal) written in Python. The
> technology got bought by another start-up (Mojam) who used Perl for their
> web stuff. Both front end systems talked to my Python-based back end
> (communication between both front ends and the one back end was via
> XML-RPC). I was sometimes frustrated by the stuff Perl did. The one which
> stuck with me all these years was its silent conversion of the band name "
> 311 <https://en.wikipedia.org/wiki/311_(band)>" to the integer 311 on which
> my Python backend obligingly barfed. I eventually had to put in data type
> checks for all fields in my back end (my front end already had that sort of
> input validation) as I could no longer assume a sentient front end was
> handing it data.

311 rocks. That Perl code, OTOH... =8^)

Obviously, it's a bug. One that likely is not that hard to correct...
FWIW this is what I've long hated about Perl: Yes, TIMTOWTDI, but
nearly all of them are subtlely (or sometimes not-so-subtlely) wrong.
You *can* write good Perl but it encourages the inexperienced to find
one of the wrong ways, by providing the illusion that it is easy to
work with.

Python is *actually* easy to work with... most of the time. "If you
want more things for you buck there's no luck..." =8^)

--
https://mail.python.org/mailman/listinfo/python-list
Re: Explicit is better than Implicit [ In reply to ]
On Thu, Aug 06, 2020 at 07:46:25PM -0500, Python wrote:
> On Thu, Aug 06, 2020 at 07:19:01PM -0500, Skip Montanaro wrote:
> Python is *actually* easy to work with... most of the time. "If you
> want more things for you buck there's no luck..." =8^)

[.And yes, I'm aware the line is "beats" not "things" but since Python
usually doesn't have "beats" (though it can!) I paraphrased a bit...]

--
https://mail.python.org/mailman/listinfo/python-list