Mailing List Archive

Request for Comments: Wikitax
I've been workin' on a plaintext markup syntax for the purpose of providing
rich
formatting capabilities in article submission to laypeople. A syntax
similar to
the variants used in wikies, but much more consistent, much more coherent,
and
(I would say) much more intelligently designed. I'll start implementing the
begins of logic and regular expressions for parsing the syntax very soon. I
just want to get some preliminary feedback on what I've come up with thus
far.
There is a lot about Wikitax where I'm still not set in stone, or where I'm
questioning myself, or where I just haven't thought much about how to solve
that
particular problem. I could use other minds in helpin' to flesh out the
final
stages of Wikitax. Anybody wanna take a look and offer ideas, suggestions,
additions, etc.? I've put quite a lot of thought into the syntax, and why
things should be the way they are. Hopefully it's pretty obvious to
everybody,
but I would really appreciate input and help. I figure the more minds that
attack this problem, the better the solution will be.

Out of all the wiki syntaxes, I enjoy Wikipedia's syntax the most. I
remember
the first time I came across a WikiWikiWeb. The early syntax was just shit.
And most wikies still have really crappy syntax. Even though Wikipedia has
evolved the syntax in a number of nice ways, it still feels lacking in many
areas. It would also be nice to have a specification to standardize
plaintext
markups: a syntax so consistant, other projects will pick it up and use it.

I started work on this because I'm doing a lot of development on a second
generation community-developed/organized open publishing content management
system for Independent Media Centers. The particular codebase I'm working
on is
going to be a sort of Wikipedia + scoop + multimedia galleries/archives +
weblogs + newswire + Kronolith + etc.--all tightly rolled into one nice,
easy-to-use, and consistent package.

You'll find an attached text/plain, Wikitax.txt. The preliminary
specification
is contained therein.

If you're sincere about contributing, see also:
<http://meta.wikipedia.org/wiki/Wikitax>

Peace out, g-money homey lokes,

Derek

_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 3 months FREE*.
http://join.msn.com/?page=features/junkmail&xAPID=42&PS=47575&PI=7324&DI=7474&SU=
http://www.hotmail.msn.com/cgi-bin/getmsg&HL=1216hotmailtaglines_stopmorespam_3mf
Re: Request for Comments: Wikitax [ In reply to ]
It appears Hotmail may have messed up the text file I attached to the
original email. Stupid Hotmail. My server's down, so I'm stuck with
Hotmail temporarily. I can't wait to be back home with pine and vi.

In any case, if Wikitax.txt looks all sorts o' messed up, with lines wrapped
unintelligibly, just head over to:

http://meta.wikipedia.org/wiki/Wikitax

Okay, I'm done now,

Derek

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
Re: Request for Comments: Wikitax [ In reply to ]
On Sat, Dec 21, 2002 at 05:02:30PM -0600, Derek Moore wrote:
> It appears Hotmail may have messed up the text file I attached to the
> original email. Stupid Hotmail. My server's down, so I'm stuck with
> Hotmail temporarily. I can't wait to be back home with pine and vi.
>
> In any case, if Wikitax.txt looks all sorts o' messed up, with lines
> wrapped unintelligibly, just head over to:
>
> http://meta.wikipedia.org/wiki/Wikitax

Two things:
* openings and closings of tags should not be the same ('' and ''),
and they should not prefix-overlap (like '' and ''').
You're just asking for troubles by introducing more such
tags.
* don't do it with regular expresions. Be a real man and write
LALR syntax. It will be order of magnitude faster, much easier
to change, and correct (current wikipedia parser fails under some
more complex cases) that way.
Re: Request for Comments: Wikitax [ In reply to ]
>Two things:
>* openings and closings of tags should not be the same ('' and ''),
> and they should not prefix-overlap (like '' and ''').
> You're just asking for troubles by introducing more such
> tags.

Well, the whole single quotes syntax for emphasis is something I've haven't
liked much from the get-go. But are you also saying that '!!...!!' and
'//...//' and others are bad choices? What are some good, simple
alternatives? I tend to think that if the wiki syntax is going to be more
than something simple like '!!' or '==', we might as well forget about a
special wiki syntax, and just have people use the basic HTML markup.

But I see what you mean... Maybe something similar to [!bold!], [/italic/],
[=monospace=], [-strike-], etc., for text formatting and other similar tags?
Or {!bold!}, |!bold!|, (!bold!), `!bold!`, ~!bold!~, etc. Hell, I dunno.
Anyways, that's why I came to this list with these issues. *smile*

>* don't do it with regular expresions. Be a real man and write
> LALR syntax. It will be order of magnitude faster, much easier
> to change, and correct (current wikipedia parser fails under some
> more complex cases) that way.

Okay. *grin* I was just thinkin' regexes, 'cause I'm familiar enough with
'em. And because I'd know how to distinguish between '//' in a URL and '//'
in italicizing words, or '!!' in making stuff bold and '!!' at the end of a
sentence, etc. But I'll definately check into LALR. I just love learnin'
new crap.

Thanks for the input, yo.

Derek

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 3 months FREE*.
http://join.msn.com/?page=features/virus&xAPID=42&PS=47575&PI=7324&DI=7474&SU=
http://www.hotmail.msn.com/cgi-bin/getmsg&HL=1216hotmailtaglines_virusprotection_3mf
Re: Request for Comments: Wikitax [ In reply to ]
On Sat, Dec 21, 2002 at 04:45:37PM -0600, Derek Moore did utter:
> formatting capabilities in article submission to laypeople. A syntax
> similar to
> the variants used in wikies, but much more consistent, much more coherent,
> and

I couldn't agree more - I've always felt that wiki markup should
resemble well organised (and maybe slightly quirky) .txt "nomarkup"
formatting.

> I started work on this because I'm doing a lot of development on a second
> generation community-developed/organized open publishing content management
> system for Independent Media Centers. The particular codebase I'm working
> on is
> going to be a sort of Wikipedia + scoop + multimedia galleries/archives +
> weblogs + newswire + Kronolith + etc.--all tightly rolled into one nice,
> easy-to-use, and consistent package.

I'll be VERY interested to see this... (and here we get to the bulk of
my reply) ...since I've been having vague plans of creating (that would
be frankencreating, to anyone who has seen my software creations)
something along those lines you describe for the next generation of my
own personal website. (which is currently a mixture of a home-made CMS
hacked onto a pre-existing bloggish system (SIPS), gallery, and UseMod
wiki).

As part of my plans for this, I've been designing my own "nomarkup"
wiki-style markup for it's use. Currently I've only really got the basic
wiki markup stuff done, and nothing for more advanced handling of CMS
and multimedia handling. Nevertheless, I think we share similar goals.

So, I invite you to look over my markup ramblings on my own wiki:
http://www.nut.house.cx/cgi-bin/nemwiki.pl?NEWS/TextFormatting

(this is a UseMod wiki btw, so it should be fairly familiar to wikipedia
folks if anyone wants to edit in their own comments. :)

.../Nemo

--
Why, do you think I ought to have /\/ <nemo@nut.house.cx>
one? It seems odd to give a bundle \ ...
of vague sensory perceptions a name / DISCLAIMER: Use of advanced messaging
... \ technology does not imply an endorsement
http://www.nut.house.cx/~nemo /\/ of western industrial civilization
Re: Request for Comments: Wikitax [ In reply to ]
On Sun, Dec 22, 2002 at 12:15:37AM +0100, Tomasz Wegrzanowski wrote:
> Two things:
> * openings and closings of tags should not be the same ('' and ''),
> and they should not prefix-overlap (like '' and ''').
> You're just asking for troubles by introducing more such
> tags.
> * don't do it with regular expresions. Be a real man and write
> LALR syntax. It will be order of magnitude faster, much easier
> to change, and correct (current wikipedia parser fails under some
> more complex cases) that way.

And third thing:
Could we make it an official rule that no change in markup will
be even considered without considering how would it affect CJK and
right-to-left languages ?

And I can tell you how will it affect CJK - it will look much much worse.
Re: Request for Comments: Wikitax [ In reply to ]
>And third thing:
>Could we make it an official rule that no change in markup will
>be even considered without considering how would it affect CJK and
>right-to-left languages ?

I'd never thought specifically about that, but I suppose you're right. I
don't know jack about CJK or Hebrew or anything other than English (and a
wee bit o' French), so I wouldn't really know how to be sympathetic to
issues raised by those languages (until someone educates me on the topic, at
least).

In any case, I would agree. The markup should be comfortable for all
languages.

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 3 months FREE*.
http://join.msn.com/?page=features/virus&xAPID=42&PS=47575&PI=7324&DI=7474&SU=
http://www.hotmail.msn.com/cgi-bin/getmsg&HL=1216hotmailtaglines_virusprotection_3mf
Re: Request for Comments: Wikitax [ In reply to ]
>I couldn't agree more - I've always felt that wiki markup should
>resemble well organised (and maybe slightly quirky) .txt "nomarkup"
>formatting.

Exactly. That was the line of thought I used while organizing my ideas. In
emails, I'll often employ forward slashes to /emphasize/ a word or /two/.
Or if I really need to create good emphasis, I'll !yell! or *accent* the
word. In the plaintext world, the three tend to be more equal than <b> and
<i>, though. My original thought was, "Why can't wikies just use these
normal conventions?" After a bit more investigation, I convinced myself
that while it's a good idea to base the syntax on these conventions, it'd be
too confusing for the software if you didn't provide something more. You'd
definately need something about the syntax that says, "Look at me, I'm a
wiki markup."

For example, say we wanted to use /slashes/ for italics. How would the
following line be rendered:

"In UNIX-style operating systems, services' configuration files are located
in /etc/. The X Window System's configuration files, for example, are in
/etc/X11/."

How exactly is the software gonna know when
/[word-boundary]...[word-boundary]/ is to be italicized text, or a
UNIX-style path, or anything else where forward slashes are normally
employed and aren't meant to italicize? It'd be possible to work up some
logic so the software /could/ distinguish between paths and something that's
to be italicized... But what if you wanted to employ italics like this:

"/anti/dis/establishment/arianistically"

You might mean for it to render:

"<i>anti</i>dis<i>establishment</i>arianistically"

But how does the software know if it's a path or not? Plus, it looks really
confusing to the human eye. At a quick glance, one might easily assume that
'dis' was supposed to be italicized as well. Even the double forward slash
(//) syntax I orginally proposed is confusing in this way:

"//anti//dis//establishment//arianistically"

Much better is:

"[/anti/]dis[/establishment/]arianistically"

> > I started work on this because I'm doing a lot of development on a
>second
> > generation community-developed/organized open publishing content
>management
> > system for Independent Media Centers.
> > [...]
>
>I'll be VERY interested to see this...

Yeah, me too! *grin* It's still in the early stages. Some of the other
second generation projects are much further along, but they're also much
less ambitious. I'm hopefully gettin' a 5 PM to 5 AM night job where I'll
have a lot of free time to develop. If that's the case, a usable
pre-release should be out sooner than later. Right now I'm working a lot of
the conceptual stuff, the framework, the integration of components, etc.
You have to create a consistent theory before you can create a consistent
product. <g>

>As part of my plans for this, I've been designing my own "nomarkup"
>wiki-style markup for it's use. Currently I've only really got the basic
>wiki markup stuff done, and nothing for more advanced handling of CMS
>and multimedia handling.

I knew I wasn't the only one thinkin' about this stuff. *smile*

>Nevertheless, I think we share similar goals.

Absolutely. That's why I think we should all put our heads to together to
create a consistent standard for us all. Something so we won't have to
learn 23 dialects of wiki syntax just to contribute to our favorite
websites.

>So, I invite you to look over my markup ramblings on my own wiki:
>http://www.nut.house.cx/cgi-bin/nemwiki.pl?NEWS/TextFormatting

I'm sure your ideas are invaluable. And I'm sure I'll be incorporating your
best ideas into Wikitax.

Peace outside,

Derek

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 3 months FREE*.
http://join.msn.com/?page=features/virus&xAPID=42&PS=47575&PI=7324&DI=7474&SU=
http://www.hotmail.msn.com/cgi-bin/getmsg&HL=1216hotmailtaglines_virusprotection_3mf
Re: Request for Comments: Wikitax [ In reply to ]
On Sat, Dec 21, 2002 at 10:42:07PM -0600, Derek Moore wrote:
> >And third thing:
> >Could we make it an official rule that no change in markup will
> >be even considered without considering how would it affect CJK and
> >right-to-left languages ?
>
> I'd never thought specifically about that, but I suppose you're right. I
> don't know jack about CJK or Hebrew or anything other than English (and a
> wee bit o' French), so I wouldn't really know how to be sympathetic to
> issues raised by those languages (until someone educates me on the topic,
> at least).
>
> In any case, I would agree. The markup should be comfortable for all
> languages.

So quick and rather simplified guide to CJK:
* They generally use fixed-width fonts that occupy a square.
Some simple characters like katakana exist in full-width and
half-width versions, where half-width version occupies halfsquare,
like Latin fixed-width characters. This sometimes includes
Latin (Arabic) numbers and Latin characters, but sometimes not.
* They don't use italics, bold, or other such font modifications.
Fonts aren't meant to be modified in any way.
* They don't use underline. Idea of using underline for links is not good,
it makes some distinct characters look the same.
I think it would be better to use some colors instead.
* They don't use spaces, and their concept of "word" is different than
Western, so search engine must be fixed too.
* They don't use Western punctuation for normal text (they may be using it for
numbers and other things), or they use fixed-width versions that look
completely different.
* They are sometimes writted vertically top-down then right-left,
and sometimes horizontally like in Latin script.
* For pronunciation they usually use Furigana/Ruby (small font on top of word
in horizontal or on right in vertical text), not parenthesis.
* Western punctuation for formatting looks really ugly when inserted
into CJK text.

Now, I'd like to have someone knowing Arab, Hebrew, or some other
right-to-left language describe problems involving them.

It's not really surprising that we do't have many CJK or right-to-left
contributors, as our interface is very inconvenient for them.
Re: Request for Comments: Wikitax [ In reply to ]
Tomasz Wegrzanowski wrote:
>
> So quick and rather simplified guide to CJK:

What is "CJK" ?
Re: Re: Request for Comments: Wikitax [ In reply to ]
On Sun, Dec 22, 2002 at 03:00:50PM +0100, Giskart wrote:
> Tomasz Wegrzanowski wrote:
> >
> >So quick and rather simplified guide to CJK:
>
> What is "CJK" ?

From The Free On-line Dictionary of Computing (12 Sep 2002) [foldoc]:

CJK

<character> In {internationalisation}, a collective term for
Chinese, Japanese, and Korean.

These languages all share the fact that their writing systems
are based partly on {Han characters} (i.e., "hanzi" or
"{kanji}"), which are complex enough of a system to require
16-bit {character encodings}. CJK character encodings should
consist minimally of {Han characters} plus language-specific
phonetic scripts such as pinyin, bopomofo, hiragana, hangul,
etc.

{CJKV} is CJK plus {Vietnamese}.
Re: Re: Request for Comments: Wikitax [ In reply to ]
On Sunday 22 December 2002 09:00, Giskart wrote:
> Tomasz Wegrzanowski wrote:
> > So quick and rather simplified guide to CJK:
>
> What is "CJK" ?

Chinese Japanese Korean.

phma
Re: Request for Comments: Wikitax [ In reply to ]
On Sunday 22 December 2002 08:39, Tomasz Wegrzanowski wrote:
li'o
> Now, I'd like to have someone knowing Arab, Hebrew, or some other
> right-to-left language describe problems involving them.

Hebrew is written right-to-left. Latin is written left-to-right. Both start
at the top of the page. When a Hebrew sentence is quoted in a page written in
the Latin alphabet, and a few phrases of Russian are quoted inside the
Hebrew, and the line breaks in the middle of the Russian, it gets tricky.

Arabic numerals, unlike Arabic letters, are written left-to-right. Hebrews
use Arabic numerals too (they use the European version). Hebrew numerals,
which are letters assigned numerical values, are written right-to-left. So
numbers quoted in Hebrew have the same problem as Russian quoted in Hebrew.

When written in the bare abjad, as it usually is, many Hebrew words (usually,
but not always, different forms of the same word) look the same. Many Hebrew
words have variant spellings. For an example of the latter, look up [[Bible
Code]] and follow the "abulafia" link. There's a picture of the Abulafia
Synagogue with three different spellings of "Abulafia" in one picture!

Yiddish, which is written in the Hebrew alphabet, has its own oddities. Two
yoden can act as one letter and take one vowel, for instance.

phma
Re: Request for Comments: Wikitax [ In reply to ]
On Sat, Dec 21, 2002 at 11:20:17PM -0600, Derek Moore did utter:
>
> For example, say we wanted to use /slashes/ for italics. How would the
> following line be rendered:
>
> "In UNIX-style operating systems, services' configuration files are located
> in /etc/. The X Window System's configuration files, for example, are in
> /etc/X11/."
>
> How exactly is the software gonna know when
> /[word-boundary]...[word-boundary]/ is to be italicized text, or a
> UNIX-style path, or anything else where forward slashes are normally
> employed and aren't meant to italicize? It'd be possible to work up some
> logic so the software /could/ distinguish between paths and something
> that's to be italicized... But what if you wanted to employ italics like
> this:
>
> "/anti/dis/establishment/arianistically"

In my NEWS markup notes, you'll see that while I set the "/" to be the
italic control character, a single "/" is not the delimiter for starting
and stopping italic. Rather, a startline, whitespace or punctuation,
followed by "/" and then followed by a non-punctuation character - will
trigger the start of italic. Reverse for end-of-italic delimiter. (might
need some tweaking for usage - especially with regards to punctuation).

> You might mean for it to render:
>
> "<i>anti</i>dis<i>establishment</i>arianistically"

My system would currently have rendered it as
"/anti/dis/establishment/arianistically" exactly ;)

Sure, this means in-word italicing (and bold, etc) isn't possible - but
I think given the 10/90 rule (10 percent of features will satisfy 90% of
needs), this is a reasonable tradeoff. I'd maybe suggest allowing pure
HTML for any remaining cases?

Regarding file paths which could be italicised? I think throwing a
simple no-wiki syntax around it would be the best way to keep a path
clean of wiki markup :)

> Much better is:
>
> "[/anti/]dis[/establishment/]arianistically"

I think you're in danger of reinventing a html1.0 shaped wheel, however,
and losing the simplicity of .txt "nomarkup". I'd rather lose a feature
(or only provide it as html - eg, for italicing parts of words) than
overly complicate the markup. The thought-experiment I use is "could I
send this raw code to a non-geek friend, and be confident that they'd
not find the markup confusing, but rather, find it obvious?"

> pre-release should be out sooner than later. Right now I'm working a lot
> of the conceptual stuff, the framework, the integration of components, etc.
> You have to create a consistent theory before you can create a consistent
> product. <g>

Indeed - which is why I've been theorising my NEWS markup already.
You're further along than I am though. (and when it comes to product
development of my ideas, I'm astoundingly lax :/

> I'm sure your ideas are invaluable. And I'm sure I'll be incorporating
> your best ideas into Wikitax.

And similarly I'll likely nab wikitax ideas for NEWS. Ultimately, I hope
we find a single compromise between read/learn/use-ability and
features/complexity needed to power what we want :)

.../Nemo
--
Why, do you think I ought to have /\/ <nemo@nut.house.cx>
one? It seems odd to give a bundle \ ...
of vague sensory perceptions a name / DISCLAIMER: Use of advanced messaging
... \ technology does not imply an endorsement
http://www.nut.house.cx/~nemo /\/ of western industrial civilization