[ProgSoc] Context-affinitive parsing w/ context-free grammar
Roland Turner
raz at raz.cx
Thu Sep 24 17:08:20 EST 2009
John Elliot wrote:
> Roland Turner wrote:
>> Why not simply specify "this is a reserved word; that means it's
>> reserved so you can't use it as an identifier"?
>
> That's what I've done. However, there are two answers to your question.
> The first is that I'm curious as to what might be involved in supporting
> such a scenario (maybe there's an easy way that I'm ignorant of?);
I suspect that the answer to your question is fortran.
Chomksy gives us three interesting groups of languages to think about:
- regular,
- context-free and
- context-sensitive.
Regular languages can be described using regular expressions and
recognised using a finite-state automaton. This provides for blistering
speed (think: one memory-access-cycle per byte of input), fixed response
time and particularly compact description. Note that perl expressions
aren't regular (in particular, they include back-reference) despite
perl's use of the term.
Context-free languages can be described using productions and recognised
using a push-down automaton. The need for back-reference (where is the
'{' that this '}' refers to?) introduces the stack which brings with it
worse performance and more variable performance. It does keep the
description of the parser's rules exceptionally simple. Note that most
programming languages aren't strictly context-free - they include some
sort of identifier management - however this is pretty well trodden
ground. The handling of identifiers is very similar across all widely
used languages, so little per-project complexity is introduced.
Context-sensitive languages essentially require the full capability of a
Turing machine to describe and to recognise. This introduces a great
deal of complexity into the description, into the semantics of the
language - often in non-obvious ways (again, think fortran) - and into
runtime behaviour of the result.
In effect, a context-free grammar for a language with identifiers
actually describes a context-sensitive language, but one that we can
pretend most of the time is context free. The moment you "cross the
streams" and take away the ability of the parser to know which camp a
token belongs to (context-free parser or identifier dictionary), the
entire edifice collapses: you're left with a fortran-style
context-sensitive grammar with magnificent, but counter-productive,
ambiguities in it.
> and
> the second is that the use of reserved words as identifiers is required
> in my language,
That doesn't answer the question. Why is this required in your language
(and in particular, given that almost all widely-used languages live
without it, this seems likely to be a rather spurious "requirement")?
- Raz
More information about the Progsoc
mailing list