[ProgSoc] Context-affinitive parsing w/ context-free grammar

John Elliot jj5 at jj5.net
Thu Sep 24 00:28:09 EST 2009


I have a tricky problem that someone might be able to offer some advice 
on. I'm working on a parser using coco/r [1]. I've read the user manual 
[2]. Say I have an example program in my programming language like this 
[a] where I'm listing major items "foo" and "bar" and their sub items 
"a", "b", "reserved_word", etc.

Using an ATG like this [b], an executable like this [c], and compiling 
like this [d]; how do I go about supporting the sub item named 
"reserved_word"?

What I've done in the mean time is to re-write the ATG supporting things 
the C#esque way like this [e], but I think it would be cool if I could 
support the use of the reserved word as the name of a sub item. The only 
way I can think of to do that would be to take over the scanner and 
parse manually for sub items when in the context of a "reserved_word" 
(but that would be tricky, especially since I'd have to also support 
pragmas, comments, etc.), but that might be harder than necessary..? Is 
there some way to use coco/r and the ATG to support context-affinitive 
parsing, wherein the parsing of a "reserved_word" would exclude the 
possibility of encountering another "reserved_word" thus enabling the 
use of the symbol "reserved_word" as a sub item name?

Love,
Confused in Sydney.

[a:example.syn]

   reserved_word foo {

     sub_item a,
     sub_item b,
     sub_item reserved_word,

   }

   reserved_word bar {

     sub_item x,
     sub_item y,
     sub_item z

   }

[b:context.atg]

   COMPILER Example

   CHARACTERS
     letter = "abcdefghijklmnopqrstuvwxyz".
     cr  = '\r'.
     lf  = '\n'.
     tab = '\t'.

   TOKENS
     ident  = letter {letter}.

     // reserved words:
     lbrace = '{'.
     rbrace = '}'.
     comma = ','.
     reserved_word = "reserved_word".
     sub_item = "sub_item".

   IGNORE cr + lf + tab

   PRODUCTIONS
   ////////////////////////////////////////////////////////////////////
   Example
   =
     ( ReservedWord )
     .
   ////////////////////////////////////////////////////////////////////
   ReservedWord
   =                        (. String name = null; .)
     reserved_word          (.
     Console.WriteLine( "Matched reserved word." ); .)
     ident                  (.
     name = t.val; Console.WriteLine( "r:" + t.val ); .)
     lbrace
     {
       SubItem [ comma ]
     }
     rbrace
     .
   ////////////////////////////////////////////////////////////////////
   SubItem
   =                        (. String name = null; .)
     sub_item               (.
     Console.WriteLine( "Matched sub item." ); .)
     ident                  (.
     name = t.val; Console.WriteLine( "s:" + t.val ); .)
     .

   END Example.

[c:context.cs]

   using System;

   namespace example {

     internal static class EntryPoint {

       internal static void Main ( String[] args ) {

         Scanner scanner = new Scanner( args[0] );
         Parser parser = new Parser( scanner );
         parser.Parse();

       }
     }
   }

[d:build.sh]

   #!/bin/bash
   mono coco/coco.exe context.atg -namespace example
   gmcs -debug+ -warn:2 context.cs Scanner.cs Parser.cs
   mono context.exe example.syn

[e:example.fix]

   reserved_word foo {

     sub_item a,
     sub_item b,
     sub_item @reserved_word,

   }

   reserved_word bar {

     sub_item x,
     sub_item y,
     sub_item z

   }

[1] http://ssw.jku.at/coco/
[2] http://ssw.jku.at/Coco/Doc/UserManual.pdf




More information about the Progsoc mailing list