YARD
YARD is powerful translation specification language. It allows you to use EBNF, metarules and conditional generation. Also YARD support L and S attributed grammars.
Why YARD?
Comparison
YARD | ANTLR | YACC | Bison | |
Parsing algorithm | RNGLR, GLL | LL(*) | LALR(1) | LALR(1), LR(1), IELR(1), GLR |
Lexer | + | + | - | - |
License | Apache 2.0 | BSD | CPL & CDDL | GNU GPL |
Output languages | F# | C#, Java, Python (ANTLR 4) C, C++, C#, Java & 7 more (ANTLR 3) |
C + specific language implementations | C, C++, Java, XML |
Input grammar notation | EBNF | EBNF | YACC (BNF) | YACC (BNF) |
Literals | + | + | + | + |
Predicates | + | + | - | - |
L-attributes | + | + | - | - |
Parametrized rules | + | +(limited) [1] | - | - |
Rules priority | + | + | + | + |
Bindings to synthetized attributes | + | + | - | - |
Grammar modularity support | + | + | - | - |
[1] — in ANTLR rule are specified using the syntax of the generated language, meanwhile YARD can accept RegExps' or other rules too!
Lexing
Basically, you can use anything as a lexer and then transform it into suitable form. If you do not want things to get complicated, you can use modified version of FsLex that is bundled with YaccConstructor.Grammar structure
Grammar definition
Definition in its general form looks like this:info tokens options head grammar foot
Where
info: text that contains some information and would not be used anywhere.
tokens: tokens type specification
options: command-line arguments can be written here instead of being passed in command line
head: F# code that will be copied to the beginning of a generated F# file. Usually used for some open-s.
grammar: grammar description.
foot: F# code that will be copied to the end of a generated F# file.
Grammar description
Basically, grammar description is list of modules. A module in general form looks like this:
[<AllPublic>] name openings rules
Where
AllPublic: this annotation makes all rules visible when using this module. By default, all rules are private except the ones explicitly marked public.
name: name of the module.
openings: usage (opening) of another grammars from .yrd files
rules: rules that are present in grammar
Rules
A rule in general form looks like this:
[<Start>] public modifier name args body
Where
Start: this annotation makes this rule a starting one.
public: this modifier allows rule to be seen from other modules.
name: name of the rule
args: heritable rule's arguments
body: rule's body (production)
Syntax in examples
Grammar
grammar: ('{'header'}')? rules ('{'footer'}')?
Example of YARD grammar file:
{ let helperFunction x y = x+y } s: NUMBER; { let helperFunction2 x y = x*y }
Sequence
s: e PLUS e;or in EBNF-like notation you can use
s = e, PLUS, e;
Alternative
s: DECNUMBER|HEXNUMBER; s: NUMBER (PLUS|MULT) NUMBER; s: { None }|n=NUMBER{Some n};Note that empty branch is an epsilon, so s can produce epsilon or NUMBER. It is important that empty branch must have action code as it is necessary for type checking.
Zero or more
grammar: rule*;or in EBNF-like notation you can use
grammar = (: rule :);
One or more
grammar: rule+;
Option
s: PLUS?;or
s: [PLUS];Square-bracketed syntax is useful for big optional subexpressions. Instead of
s: (n PLUS expr)?;you can write
s: [n PLUS expr];which contains grouping and optionality.
Literal
stmt_block: 'BEGIN' stmt+ 'END';
Metarules
not_empty_list<item sep>: item (sep item)*; statements: not_empty_list<statement SEMICOLON>; args: not_empty_list<arg COMMA>;
Action code
s: n {printfn "n is detected!!!"}
Bindings
s: n=NUMBER {printfn "Number value is %A" n} s: <hd::tl>=not_empty_list<NUMBER COLON> {List.fold myFunction hd tl}
L-attributes
proc: declarations=var_declarations IN expressions[declarations]; expressions[declarations]: expr_lst=expression+ {check_undeclared_variables_in_expressions expr_lst declarations};
Conditional generation
exec_literal: #if ms ("EXEC" | "EXECUTE") #elif pl "EXECUTE" "IMMEDIATE" #endif LBRACE LITERAL RBRACE ;Means, that if you run generator with key -D "ms" then you get parser for grammar
exec_literal: ("EXEC" | "EXECUTE") LBRACE LITERAL RBRACE ;Else, if you run generator with key -D "pl" then you get parser for grammar
exec_literal: "EXECUTE" "IMMEDIATE" LBRACE LITERAL RBRACE ;
Labels
s: x (@l1(y z)| @l2(a b)) g;Where @l1 and @l2 are labels. Labels are used for working with dialects.
Experimental syntax
Syntactic factor
Note that though syntactic factor is implemented in YARD, it is currently not supported by generators.strict_rule = 5 * " ";
Extended annotation for repetition rule:
grammar = rule *[2..5] ;