I am using Aycock's package to handle some parsing but I am having trouble
because the language I am parsing is highly context sensitive. I don't
have any trouble dealing with the context-sensitivity in the so-called
"context free grammar" part of the package (the parser) but in the scanner
it is killing me.
Let's pretend I am parsing a tagged (but non-SGML) language where there is
an element "URL". Within "URL" elements, the characters < and > are
illegal: they must be escaped as \< and \>.
Elsewhere they are not. Here is the grammar I would *like* to write
(roughly):
Element ::= <URL> urlcontent </URL>
urlcontent = (([^<>\/:]* ("\<"|"\>"|":"|"/"|"\\"))*
Element ::= <NOT-A-URL> anychar* </NOT-A-URL>
Of course this is a made-up syntax because I don't think you can put
regular expressions in Aycock's BNF. I've used tools that do allow this so
I'm not sure how to handle it. This is also a made-up (simplified) example
so demonstrating how I can do it all in the scanner is probably not
helpful.
I could handle it if I could switch scanners mid-stream (for URL elements)
but Aycock's scanner finishes up before the parser even gets under way!
Should I scan and then parse (at a high level) and then rescan and reparse
the URLs? Is there a package that allows me to mix the lexical and
syntactic levels more?
--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco
Diplomatic term: "We had a frank exchange of views."
Translation: Negotiations stopped just short of shouting and
table-banging. (Brill's Content, Apr. 1999)
because the language I am parsing is highly context sensitive. I don't
have any trouble dealing with the context-sensitivity in the so-called
"context free grammar" part of the package (the parser) but in the scanner
it is killing me.
Let's pretend I am parsing a tagged (but non-SGML) language where there is
an element "URL". Within "URL" elements, the characters < and > are
illegal: they must be escaped as \< and \>.
Elsewhere they are not. Here is the grammar I would *like* to write
(roughly):
Element ::= <URL> urlcontent </URL>
urlcontent = (([^<>\/:]* ("\<"|"\>"|":"|"/"|"\\"))*
Element ::= <NOT-A-URL> anychar* </NOT-A-URL>
Of course this is a made-up syntax because I don't think you can put
regular expressions in Aycock's BNF. I've used tools that do allow this so
I'm not sure how to handle it. This is also a made-up (simplified) example
so demonstrating how I can do it all in the scanner is probably not
helpful.
I could handle it if I could switch scanners mid-stream (for URL elements)
but Aycock's scanner finishes up before the parser even gets under way!
Should I scan and then parse (at a high level) and then rescan and reparse
the URLs? Is there a package that allows me to mix the lexical and
syntactic levels more?
--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco
Diplomatic term: "We had a frank exchange of views."
Translation: Negotiations stopped just short of shouting and
table-banging. (Brill's Content, Apr. 1999)