YaccConstructor


YARD

YARD is powerful translation specification language. It allows you to use EBNF, metarules and conditional generation. Also YARD support L and S attributed grammars.

Why YARD?

Comparison

YARD ANTLR YACC Bison
Parsing algorithm RNGLR, GLL LL(*) LALR(1) LALR(1), LR(1), IELR(1), GLR
Lexer + + - -
License Apache 2.0 BSD CPL & CDDL GNU GPL
Output languages F# C#, Java, Python (ANTLR 4)
C, C++, C#, Java & 7 more (ANTLR 3)
C + specific language implementations C, C++, Java, XML
Input grammar notation EBNF EBNF YACC (BNF) YACC (BNF)
Literals + + + +
Predicates + + - -
L-attributes + + - -
Parametrized rules + +(limited) [1] - -
Rules priority + + + +
Bindings to synthetized attributes + + - -
Grammar modularity support + + - -

[1] — in ANTLR rule are specified using the syntax of the generated language, meanwhile YARD can accept RegExps' or other rules too!

Lexing

Basically, you can use anything as a lexer and then transform it into suitable form. If you do not want things to get complicated, you can use modified version of FsLex that is bundled with YaccConstructor.

Grammar structure

Grammar definition

Definition in its general form looks like this:
  info
  tokens
  options    
  head   
  grammar 
  foot

Where
info: text that contains some information and would not be used anywhere.
tokens: tokens type specification
options: command-line arguments can be written here instead of being passed in command line
head: F# code that will be copied to the beginning of a generated F# file. Usually used for some open-s.
grammar: grammar description.
foot: F# code that will be copied to the end of a generated F# file.

Grammar description

Basically, grammar description is list of modules. A module in general form looks like this:

  [<AllPublic>]
  name
  openings
  rules

Where
AllPublic: this annotation makes all rules visible when using this module. By default, all rules are private except the ones explicitly marked public.
name: name of the module.
openings: usage (opening) of another grammars from .yrd files
rules: rules that are present in grammar

Rules

A rule in general form looks like this:

  [<Start>]
  public modifier
  name
  args
  body

Where
Start: this annotation makes this rule a starting one.
public: this modifier allows rule to be seen from other modules.
name: name of the rule
args: heritable rule's arguments
body: rule's body (production)

Syntax in examples

Grammar

grammar: ('{'header'}')? rules ('{'footer'}')?

Example of YARD grammar file:

{
let helperFunction x y = x+y
}
s: NUMBER;
{
let helperFunction2 x y = x*y
}

Sequence

s: e PLUS e;
or in EBNF-like notation you can use
s = e, PLUS, e;

Alternative

s: DECNUMBER|HEXNUMBER;
s: NUMBER (PLUS|MULT) NUMBER;
s: { None }|n=NUMBER{Some n};
Note that empty branch is an epsilon, so s can produce epsilon or NUMBER. It is important that empty branch must have action code as it is necessary for type checking.

Zero or more

grammar: rule*;
or in EBNF-like notation you can use
grammar = (: rule :);

One or more

grammar: rule+;

Option

s: PLUS?;
or
s: [PLUS];
Square-bracketed syntax is useful for big optional subexpressions. Instead of
s: (n PLUS expr)?;
you can write
s: [n PLUS expr];
which contains grouping and optionality.

Literal

stmt_block: 'BEGIN' stmt+ 'END';

Metarules

not_empty_list<item sep>: item (sep item)*;
statements: not_empty_list<statement SEMICOLON>;
args: not_empty_list<arg COMMA>;

Action code

s: n {printfn "n is detected!!!"}

Bindings

s: n=NUMBER {printfn "Number value is %A" n}
s: <hd::tl>=not_empty_list<NUMBER COLON> {List.fold myFunction hd tl}

L-attributes

proc: declarations=var_declarations IN expressions[declarations];
expressions[declarations]: 
     expr_lst=expression+ {check_undeclared_variables_in_expressions expr_lst declarations};

Conditional generation

exec_literal:
#if ms
("EXEC" | "EXECUTE")
#elif pl
"EXECUTE" "IMMEDIATE"
#endif
LBRACE LITERAL RBRACE
;
Means, that if you run generator with key -D "ms" then you get parser for grammar
exec_literal:
("EXEC" | "EXECUTE")
LBRACE LITERAL RBRACE
;
Else, if you run generator with key -D "pl" then you get parser for grammar
exec_literal:
"EXECUTE" "IMMEDIATE"
LBRACE LITERAL RBRACE
;

Labels

s: x (@l1(y z)| @l2(a b)) g;
Where @l1 and @l2 are labels. Labels are used for working with dialects.

Experimental syntax

Syntactic factor

Note that though syntactic factor is implemented in YARD, it is currently not supported by generators.
  strict_rule = 5 * " ";

Extended annotation for repetition rule:

  grammar = rule *[2..5] ;
Fork me on GitHub