Mailing List Archive

Re: $ModuleList =~ s/String::Scanf/Text::Scanf/g;
> From: Jarkko Hietaniemi <jhi@epsilon.hut.fi>
>
> Subject: $ModuleList =~ s/String::Scanf/Text::Scanf/g;

Um, actually that's a tricky subject.

Text::* arrived first. Then String::* got added, after some debate, for
Tom's String::Edit since the name Text::Edit was misleading.

That leaves us (me) in a tricky position.

What's the difference between Text::* modules and String::* modules?

I'm tempted to apply the general principle that String modules are more
abstract and likely to be used on small fragments of text. That's
why I put Scanf into String not Text.

Comments? (CC'd to p5p since I'll need general acceptance of the policy)

Tim.
Re: $ModuleList =~ s/String::Scanf/Text::Scanf/g; [ In reply to ]
On Mon, 30 Oct 1995, Tim Bunce wrote:

>
> > From: Jarkko Hietaniemi <jhi@epsilon.hut.fi>
> >
> > Subject: $ModuleList =~ s/String::Scanf/Text::Scanf/g;
>
> Um, actually that's a tricky subject.
>
> Text::* arrived first. Then String::* got added, after some debate, for
> Tom's String::Edit since the name Text::Edit was misleading.
>
> That leaves us (me) in a tricky position.
>
> What's the difference between Text::* modules and String::* modules?
>
> I'm tempted to apply the general principle that String modules are more
> abstract and likely to be used on small fragments of text. That's
> why I put Scanf into String not Text.
>
> Comments? (CC'd to p5p since I'll need general acceptance of the policy)

IMO, String modules handle strings. This can be DNA sequences, Japanese
text, TIFF images, or a dirty limerick. Text modules are based around
common concepts of human readable text, such as soundex, parsing based
on quotes and spaces, or wrapping into a paragraph. "String" is an
implementation style, "Text" is a particular usage of that style.

> Tim.

--
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)
Re: $ModuleList =~ s/String::Scanf/Text::Scanf/g; [ In reply to ]
Kenneth Albanowski writes:
> IMO, String modules handle strings. This can be DNA sequences, Japanese
> text, TIFF images, or a dirty limerick. Text modules are based around
> common concepts of human readable text, such as soundex, parsing based
> on quotes and spaces, or wrapping into a paragraph. "String" is an
> implementation style, "Text" is a particular usage of that style.

I've lately released two modules, XXX::Scanf (which implements sscanf())
and YYY::Approx (which implements agrepish approximate matching and
substitution) and am now a little bit confused... XXX was originally
String because Tim suggested so. Then I looked at my .../lib/ and
saw no String/, only Text/. Rename. Then I wrote Approx with YYY
as Text.

- should Scanf be Text:: or String::
(it turns scanf % formats into regexps and does the matching)
- should Approx be Text:: or String::
(it turns approximate strings into regexps and the matching)

IMNSHO, either of them, String:: and Text:: must go and be gone. The
division reminds me quite a lot of the utils/ vs tools/ division often
seen in collections of software. Having them both just confuses.
I would personally prefer Text:: leaving and String:: staying but
I guess we are stuck with Text:: because of the {Abbrev,ParseWords,
Soundex,Tabs}.

> ...String...strings...Japanese text...
> ...Text...human readable text...

Pardon me but this "Japanese vs human" sounds quite racist :-)
or should I say languagecist :-)

> ...String...handle strings...
> ...Text...common concepts of human readable text...

Still does not parse for me. Maybe it is just because me does not
English properly speek...

++jhi;
Re: $ModuleList =~ s/String::Scanf/Text::Scanf/g; [ In reply to ]
On Tue, 31 Oct 1995, Jarkko Hietaniemi wrote:

> - should Scanf be Text:: or String::
> (it turns scanf % formats into regexps and does the matching)
> - should Approx be Text:: or String::
> (it turns approximate strings into regexps and the matching)

Hmm... Do either use particular characteristics of human readable text,
like using commas or quotes or whitespace to separate items? No? Then
they go into String.

> [...]
>
> Still does not parse for me. Maybe it is just because me does not
> English properly speek...

No, just not clear on this end. A "String" function is anything that deals
with things. A "Text" module (again, IMO) is one that uses particular
characteristics of the way humans use strings to work with readable text.
Hence, English text (or Japanese text for that matter) both comes under
Text, & String. Random binary data would only come under String.

If this seems silly, then maybe you're right, and one or the other should
go.

> ++jhi;

--
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)
Re: $ModuleList =~ s/String::Scanf/Text::Scanf/g; [ In reply to ]
Kenneth Albanowski writes:
> On Tue, 31 Oct 1995, Jarkko Hietaniemi wrote:
>
> > - should Scanf be Text:: or String::
> > (it turns scanf % formats into regexps and does the matching)
> > - should Approx be Text:: or String::
> > (it turns approximate strings into regexps and the matching)
>
> Hmm... Do either use particular characteristics of human readable text,
> like using commas or quotes or whitespace to separate items? No? Then
> they go into String.

I agree, Scanf and Approx belong to String:: according to your
classification.

> > Still does not parse for me. Maybe it is just because me does not
> > English properly speek...
>
> No, just not clear on this end. A "String" function is anything that deals
> with things. A "Text" module (again, IMO) is one that uses particular
> characteristics of the way humans use strings to work with readable text.
> Hence, English text (or Japanese text for that matter) both comes under
> Text, & String. Random binary data would only come under String.
>
> If this seems silly, then maybe you're right, and one or the other should
> go.

I almost buy this. Almost, because I do not see why Abbrev is Text::,
then? Abbrev does not care which strings it is doing its abbrevs
from. Hysterical raisins, I guess. Some might also argue on
Soundex.pm, it is just a way to match things, so why it should not be
in String:: alongside Approx::? (ok, then there is the point that
Soundex is very narror-minded: it just reduces "human-readable" text
in such a way that _English_-speaking people are happy with the
matching results).

++jhi;
Re: $ModuleList =~ s/String::Scanf/Text::Scanf/g; [ In reply to ]
> From: Kenneth Albanowski <kjahds@kjahds.com>
>
> > Text::* arrived first. Then String::* got added, after some debate, for
> > Tom's String::Edit since the name Text::Edit was misleading.
> >
> > That leaves us (me) in a tricky position.
> > What's the difference between Text::* modules and String::* modules?
> >
> > I'm tempted to apply the general principle that String modules are more
> > abstract and likely to be used on small fragments of text. That's
> > why I put Scanf into String not Text.
>
> IMO, String modules handle strings. This can be DNA sequences, Japanese
> text, TIFF images, or a dirty limerick. Text modules are based around
> common concepts of human readable text, such as soundex, parsing based
> on quotes and spaces, or wrapping into a paragraph. "String" is an
> implementation style, "Text" is a particular usage of that style.

Umm, yes. I'm comfortable with that.

> From: Jarkko Hietaniemi <jhi@epsilon.hut.fi>
>
> I've lately released two modules, XXX::Scanf (which implements sscanf())
> and YYY::Approx (which implements agrepish approximate matching and
> substitution) and am now a little bit confused... XXX was originally
> String because Tim suggested so. Then I looked at my .../lib/ and
> saw no String/, only Text/. Rename. Then I wrote Approx with YYY
> as Text.
>
> - should Scanf be Text:: or String::
> (it turns scanf % formats into regexps and does the matching)
> - should Approx be Text:: or String::
> (it turns approximate strings into regexps and the matching)
>
> IMNSHO, either of them, String:: and Text:: must go and be gone. The
> division reminds me quite a lot of the utils/ vs tools/ division often
> seen in collections of software. Having them both just confuses.
> I would personally prefer Text:: leaving and String:: staying but
> I guess we are stuck with Text:: because of the {Abbrev,ParseWords,
> Soundex,Tabs}.

Well, by Kenneth's definition, I think all those could stay in Text::
without a problem.

> From: Kenneth Albanowski <kjahds@kjahds.com>
>
> > - should Scanf be Text:: or String::
> > (it turns scanf % formats into regexps and does the matching)
> > - should Approx be Text:: or String::
> > (it turns approximate strings into regexps and the matching)
>
> Hmm... Do either use particular characteristics of human readable text,
> like using commas or quotes or whitespace to separate items? No? Then
> they go into String.

I've retitled, reordered and reworked that section of the Module List into:

--
11) String Processing, Language Text Processing, Parsing and Searching

Name DSLI Description Info
----------- ---- -------------------------------------------- -----
String::
::Edit adpf Assorted handy string editing functions TOMC
::Scanf adpf Implemenation of C sscanf function JHI
::Approx afpf Implements approximate matching for strings JHI +

Language text related modules

Text::
::Abbrev Supf Builds hash of all possible abbreviations
::ParseWords Supf Parse strings containing shell-style quoting
::Soundex Supf Convert a string to a soundex value
::TeX cdpO TeX typesetting language input parser ILYAZ
::Stem adpf Porter algorithm for stemming English words IANPX
::Tabs Sdpf Expand and contract tabs ala expand(1) MUIR
::Wrap Rdpf Wraps lines to make simple paragraphs MUIR

Search::
::Dict Supf Search a dictionary ordered text file

Other Text:: modules (these should be under String:: but pre-date it)

Text::
::Trie adpf Find common heads and tails from strings ILYAZ
::Parser adpO String parser using patterns and states PATM
::Parity adpf Byte string parity (odd/even) functions WINKO
--

Putting String first and adding a clarifying headers seems to work.

Can we live with that arrangement?

Of the current Text:: modules I think only ::Trie, ::Parser and ::Parity
would fit better into String. I'll CC this to the authors but I'm not
that worried if they'd rather not change.

Tim.
Re: $ModuleList =~ s/String::Scanf/Text::Scanf/g; [ In reply to ]
On Wed, 1 Nov 1995, Jarkko Hietaniemi wrote:

> I almost buy this. Almost, because I do not see why Abbrev is Text::,
> then? Abbrev does not care which strings it is doing its abbrevs
> from. Hysterical raisins, I guess.

Considering that "String" only came into existance recently, I don't
think there was any though about it previously.

> Some might also argue on
> Soundex.pm, it is just a way to match things, so why it should not be
> in String:: alongside Approx::? (ok, then there is the point that
> Soundex is very narror-minded: it just reduces "human-readable" text
> in such a way that _English_-speaking people are happy with the
> matching results).

Ah, but it's only useful for readable English strings, hence Text, not
String.

> ++jhi;

--
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)