Summary

Future-proof the allowed

許可する、可能にする
forms
形式、形態、形作る
that input to an MBE can take
とる
by requiring certain delimiters following
下記の、次に続く、追従する
NTs in a matcher. In the future, it will be possible to lift these restrictions
制限
backwards compatibly if desired.

Key Terminology

  • macro: anything invokable as foo!(...) in source
    元の
    code.
  • MBE: macro-by-example, a macro defined
    定義する
    by macro_rules.
  • matcher: the left-hand-side of a rule in a macro_rules invocation,
    呼び出し
    or a subportion thereof.
  • macro parser: the bit of code in the Rust parser that will parse
    構文解析する
    the input using a grammar
    文法
    derived from all of the matchers.
  • fragment: The class
    分類
    of Rust syntax
    文法
    that a given
    与えられた
    matcher will accept
    受け付ける、受理する
    (or "match").
  • repetition
    繰り返し
    : a fragment that follows
    下記の、次に続く、追従する
    a regular
    普通の、正規の
    repeating pattern
  • NT: non-terminal,
    非終端記号
    the various
    さまざまな
    "meta-variables" or repetition
    繰り返し
    matchers that can appear
    現れる
    in a matcher, specified in MBE syntax
    文法
    with a leading $ character.
    文字
  • simple NT: a "meta-variable" non-terminal
    非終端記号
    (further discussion below).
  • complex
    複素数、複文の
    NT
    : a repetition
    繰り返し
    matching
    一致する、マッチさせる
    non-terminal,
    非終端記号
    specified via Kleene closure operators
    演算子
    (*, +).
  • token
    トークン
    : an atomic element
    要素
    of a matcher; i.e. identifiers,
    識別子
    operators,
    演算子
    open/close delimiters, and simple NT's.
  • token
    トークン
    tree
    : a tree structure formed
    形式、形態、形作る
    from tokens
    トークン
    (the leaves), complex
    複素数、複文の
    NT's, and finite sequences
    連なり、並び
    of token
    トークン
    trees.
  • delimiter token
    トークン
    : a token
    トークン
    that is meant to divide the end of one fragment and the start of the next fragment.
  • separator token
    トークン
    : an optional
    必須でない
    delimiter token
    トークン
    in an complex
    複素数、複文の
    NT that separates
    分割する
    each pair of elements
    要素
    in the matched
    一致する、マッチさせる
    repetition.
    繰り返し
  • separated complex
    複素数、複文の
    NT
    : a complex
    複素数、複文の
    NT that has its own separator token.
    トークン
  • delimited sequence
    連なり、並び
    : a sequence
    連なり、並び
    of token
    トークン
    trees with appropriate open- and close-delimiters at the start and end of the sequence.
    連なり、並び
  • empty
    空の
    fragment
    : The class
    分類
    of invisible Rust syntax
    文法
    that separates
    分割する
    tokens,
    トークン
    i.e. whitespace, or (in some lexical
    字句的
    contexts), the empty
    空の
    token
    トークン
    sequence.
    連なり、並び
  • fragment specifier: The identifier
    識別子
    in a simple NT that specifies
    特定する、指定する、規定する
    which fragment the NT accepts.
    受け付ける、受理する
  • language
    言語
    : a context-free language.
    言語

Example:

#![allow(unused)] fn main() { macro_rules! i_am_an_mbe { (start $foo:expr $($i:ident),* end) => ($foo) } }

(start $foo:expr $($i:ident),* end) is a matcher. The whole matcher is a delimited sequence

連なり、並び
(with open- and close-delimiters ( and )), and $foo and $i are simple NT's with expr and ident as their respective
それぞれの
fragment specifiers.

$(i:ident),* is also an NT; it is a complex

複素数、複文の
NT that matches
一致する、マッチさせる
a comma-seprated repetition
繰り返し
of identifiers.
識別子
The , is the separator token
トークン
for the complex
複素数、複文の
NT; it occurs
起こる
in between each pair of elements
要素
(if any) of the matched
一致する、マッチさせる
fragment.

Another example of a complex

複素数、複文の
NT is $(hi $e:expr ;)+, which matches
一致する、マッチさせる
any fragment of the form
形式、形態、形作る
hi <expr>; hi <expr>; ... where hi <expr>; occurs
起こる
at least once. Note that this complex
複素数、複文の
NT does not have a dedicated separator token.
トークン

(Note that Rust's parser ensures that delimited sequences

連なり、並び
always occur
起こる
with proper nesting of token
トークン
tree structure and correct matching
一致する、マッチさせる
of open- and close-delimiters.)

Motivation

In current Rust (version 0.12; i.e. pre 1.0), the macro_rules parser is very liberal in what it accepts

受け付ける、受理する
in a matcher. This can cause
起こす
problems, because it is possible to write an MBE which corresponds
照応する
to an ambiguous grammar.
文法
When an MBE is invoked,
呼び出す
if the macro parser encounters
出会う
an ambiguity
曖昧さ
while parsing,
構文解析する
it will bail out with a "local ambiguity"
曖昧さ
error. As an example for this, take
とる
the following
下記の、次に続く、追従する
MBE:

#![allow(unused)] fn main() { macro_rules! foo { ($($foo:expr)* $bar:block) => (/*...*/) }; }

Attempts to invoke this MBE will never succeed,

後続する/成功する
because the macro parser will always emit an ambiguity
曖昧さ
error rather than make a choice when presented
ある
an ambiguity.
曖昧さ
In particular, it needs to decide when to stop accepting expressions
for foo and look for a block for bar (noting that blocks are valid
有効な、正しい
expressions). Situations like this are inherent to the macro system. On the other hand, it's possible to write an unambiguous matcher that becomes ambiguous due to changes in the syntax
文法
for the various
さまざまな
fragments. As a concrete
具体的な/具象的な
example:

#![allow(unused)] fn main() { macro_rules! bar { ($in:ty ( $($arg:ident)*, ) -> $out:ty;) => (/*...*/) }; }

When the type syntax

文法
was extended
拡張する
to include the unboxed closure traits, an input such as FnMut(i8, u8) -> i8; became ambiguous. The goal of this proposal is to prevent
防ぐ
such scenarios in the future by requiring certain "delimiter tokens"
トークン
after an NT. When extending
拡張する
Rust's syntax
文法
in the future, ambiguity
曖昧さ
need only be considered
みなす、考慮する
when combined
合体する、組み合わせる
with these sets
セットする、集合
of delimiters, rather than any possible arbitrary
任意の
matcher.


Another example of a potential extension to the language

言語
that motivates a restricted
制限する
set
セットする、集合
of "delimiter tokens"
トークン
is (postponed) RFC 352, "Allow
許可する、可能にする
loops to return values other than ()", where the break expression
would now accept
受け付ける、受理する
an optional
必須でない
input expression:
break <expr>.

  • This proposed extension to the language,

    言語
    combined
    合体する、組み合わせる
    with the facts that break and { <stmt> ... <expr>? } are Rust expressions,
    implies that { should not be in the follow
    下記の、次に続く、追従する
    set
    セットする、集合
    for the expr fragment specifier.

  • Thus

    それゆえに、従って、
    in a slightly more ideal world the following
    下記の、次に続く、追従する
    program would not be accepted,
    受け付ける、受理する
    because the interpretation
    解釈
    of the macro could change if we were to accept
    受け付ける、受理する
    RFC 352:

    macro_rules! foo { ($e:expr { stuff }) => { println!("{:?}", $e) } } fn main() { loop { foo!(break { stuff }); } }

    (in our non-ideal world, the program is legal

    (文法的に)適格
    in Rust versions 1.0 through at least 1.4)

Detailed design
設計(する)

We will tend to use the variable

変数、ストレージ
"M" to stand for a matcher, variables
変数、ストレージ
"t" and "u" for arbitrary
任意の
individual
個々の、それぞれの
tokens,
トークン
and the variables
変数、ストレージ
"tt" and "uu" for arbitrary
任意の
token
トークン
trees. (The use of "tt" does present
ある
potential ambiguity
曖昧さ
with its additional
追加の
role as a fragment specifier; but it will be clear from context
文脈、背景
which interpretation
解釈
is meant.)

"SEP" will range

範囲
over separator tokens,
トークン
"OP" over the Kleene operators
演算子
* and +, and "OPEN"/"CLOSE" over matching
一致する、マッチさせる
token
トークン
pairs surrounding
囲う
a delimited sequence
連なり、並び
(e.g. [ and ]).

We also use Greek letters

文字
"α" "β" "γ" "δ" to stand for potentially empty
空の
token-tree sequences.
連なり、並び
(However, the Greek letter
文字
"ε" (epsilon) has a special role in the presentation and does not stand for a token-tree sequence.)

  • This Greek letter
    文字
    convention is usually just employed when the presence of a sequence
    連なり、並び
    is a technical detail; in particular, when I wish to emphasize that we are operating on a sequence
    連なり、並び
    of token-trees, I will use the notation
    記法
    "tt ..." for the sequence,
    連なり、並び
    not a Greek letter
    文字

Note that a matcher is merely a token

トークン
tree. A "simple NT", as mentioned above, is an meta-variable NT; thus
それゆえに、従って、
it is a non-repetition. For example, $foo:ty is a simple NT but $($foo:ty)+ is a complex
複素数、複文の
NT.

Note also that in the context

文脈、背景
of this RFC, the term
項、用語
"token"
トークン
generally includes simple NTs.

Finally, it is useful for the reader to keep in mind that according

according to 〜に応じて
to the definitions
定義
of this RFC, no simple NT matches
一致する、マッチさせる
the empty
空の
fragment, and likewise no token
トークン
matches
一致する、マッチさせる
the empty
空の
fragment of Rust syntax.
文法
(Thus, the only NT that can match
一致する、マッチさせる
the empty
空の
fragment is a complex
複素数、複文の
NT.)

The Matcher Invariant

This RFC establishes the following

下記の、次に続く、追従する
two-part invariant for valid
有効な、正しい
matchers

  1. For any two successive

    連続した
    token
    トークン
    tree sequences
    連なり、並び
    in a matcher M (i.e. M = ... tt uu ...), we must have FOLLOW(... tt) FIRST(uu ...)

  2. For any separated complex

    複素数、複文の
    NT in a matcher, M = ... $(tt ...) SEP OP ..., we must have SEP FOLLOW(tt ...).

The first part says that whatever actual

実際の
token
トークン
that comes after a matcher must be somewhere in the predetermined follow
下記の、次に続く、追従する
set.
セットする、集合
This ensures that a legal
(文法的に)適格
macro definition
定義
will continue to assign
代入する
the same determination as to where ... tt ends and uu ... begins, even as new syntactic forms
形式、形態、形作る
are added
たす
to the language.
言語

The second part says that a separated complex

複素数、複文の
NT must use a seperator token
トークン
that is part of the predetermined follow
下記の、次に続く、追従する
set
セットする、集合
for the internal contents of the NT. This ensures that a legal
(文法的に)適格
macro definition
定義
will continue to parse
構文解析する
an input fragment into the same delimited sequence
連なり、並び
of tt ...'s, even as new syntactic forms
形式、形態、形作る
are added
たす
to the language.
言語

(This is assuming that all such changes are appropriately restricted,

制限する
by the definition
定義
of FOLLOW
下記の、次に続く、追従する
below, of course.)

The above invariant is only formally meaningful if one knows what FIRST and FOLLOW

下記の、次に続く、追従する
denote. We address this in the following
下記の、次に続く、追従する
sections.

FIRST and FOLLOW,
下記の、次に続く、追従する
informally
非公式に

FIRST and FOLLOW

下記の、次に続く、追従する
are defined
定義する
as follows.
下記の、次に続く、追従する

A given

与えられた
matcher M maps to three sets:
セットする、集合
FIRST(M), LAST(M) and FOLLOW(M).

Each of the three sets

セットする、集合
is made up of tokens.
トークン
FIRST(M) and LAST(M) may also contain
含む
a distinguished non-token element
要素
ε ("epsilon"), which indicates that M can match
一致する、マッチさせる
the empty
空の
fragment. (But FOLLOW(M) is always just a set
セットする、集合
of tokens.)

Informally:

非公式に

  • FIRST(M): collects the tokens

    トークン
    potentially used first when matching
    一致する、マッチさせる
    a fragment to M.

  • LAST(M): collects the tokens

    トークン
    potentially used last when matching
    一致する、マッチさせる
    a fragment to M.

  • FOLLOW(M): the set

    セットする、集合
    of tokens
    トークン
    allowed
    許可する、可能にする
    to follow
    下記の、次に続く、追従する
    immediately
    直後に、直接的に
    after some fragment matched
    一致する、マッチさせる
    by M.

    In other words: t FOLLOW(M) if and only if there exists (potentially empty) token

    トークン
    sequences
    連なり、並び
    α, β, γ, δ where:

    • M matches
      一致する、マッチさせる
      β,
    • t matches
      一致する、マッチさせる
      γ, and
    • The concatenation
      結合、連接
      α β γ δ is a parseable Rust program.

We use the shorthand

簡略表現
ANYTOKEN to denote the set
セットする、集合
of all tokens
トークン
(including simple NTs).

  • (For example, if any token
    トークン
    is legal
    (文法的に)適格
    after a matcher M, then FOLLOW(M) = ANYTOKEN.)

(To review one's understanding of the above informal descriptions, the reader at this point may want to jump ahead to the examples of FIRST/LAST before reading their formal definitions.)

FIRST, LAST

Below are formal inductive definitions

定義
for FIRST and LAST.

"A B" denotes

指す、示す、表す
set
セットする、集合
union, "A B" denotes
指す、示す、表す
set
セットする、集合
intersection, and "A \ B" denotes
指す、示す、表す
set
セットする、集合
difference (i.e. all elements
要素
of A that are not present
ある
in B).

FIRST(M), defined

定義する
by case analysis
解析
on the sequence
連なり、並び
M and the structure of its first token-tree (if any):

  • if M is the empty

    空の
    sequence,
    連なり、並び
    then FIRST(M) = { ε },

  • if M starts with a token

    トークン
    t, then FIRST(M) = { t },

    (Note: this covers the case where M starts with a delimited token-tree sequence,

    連なり、並び
    M = OPEN tt ... CLOSE ..., in which case t = OPEN and thus
    それゆえに、従って、
    FIRST(M) = { OPEN }.)

    (Note: this critically relies on the property

    特徴、性質
    that no simple NT matches
    一致する、マッチさせる
    the empty
    空の
    fragment.)

  • Otherwise,

    さもなければ
    M is a token-tree sequence
    連なり、並び
    starting with a complex
    複素数、複文の
    NT: M = $( tt ... ) OP α, or M = $( tt ... ) SEP OP α, (where α is the (potentially empty) sequence
    連なり、並び
    of token
    トークン
    trees for the rest of the matcher).

    • Let sep_set = { SEP } if SEP present;

      ある
      otherwise
      さもなければ
      sep_set = {}.

    • If ε FIRST(tt ...), then FIRST(M) = (FIRST(tt ...) \ { ε }) sep_set FIRST(α)

    • Else if OP = *, then FIRST(M) = FIRST(tt ...) FIRST(α)

    • Otherwise

      さもなければ
      (OP = +), FIRST(M) = FIRST(tt ...)

Note: The ε-case above,

FIRST(M) = (FIRST(tt ...) \ { ε }) sep_set FIRST(α)

may seem complicated, so lets take

とる
a moment to break it down. In the ε case, the sequence
連なり、並び
tt ... may be empty.
空の
Therefore our first token
トークン
may be SEP itself (if it is present), or it may be the first token
トークン
of α); that's why the result
結果、戻り値
is including "sep_set FIRST(α)". Note also that if α itself may match
一致する、マッチさせる
the empty
空の
fragment, then FIRST(α) will ensure
保証する
that ε is included in our result,
結果、戻り値
and conversely, if α cannot match
一致する、マッチさせる
the empty
空の
fragment, then we must ensure
保証する
that ε is not included in our result;
結果、戻り値
these two facts together are why we can and should unconditionally remove ε from FIRST(tt ...).


LAST(M), defined

定義する
by case analysis
解析
on M itself (a sequence
連なり、並び
of token-trees):

  • if M is the empty

    空の
    sequence,
    連なり、並び
    then LAST(M) = { ε }

  • if M is a singleton token

    トークン
    t, then LAST(M) = { t }

  • if M is the singleton complex

    複素数、複文の
    NT repeating zero or more times, M = $( tt ... ) *, or M = $( tt ... ) SEP *

    • Let sep_set = { SEP } if SEP present;

      ある
      otherwise
      さもなければ
      sep_set = {}.

    • if ε LAST(tt ...) then LAST(M) = LAST(tt ...) sep_set

    • otherwise,

      さもなければ
      the sequence
      連なり、並び
      tt ... must be non-empty; LAST(M) = LAST(tt ...) { ε }

  • if M is the singleton complex

    複素数、複文の
    NT repeating one or more times, M = $( tt ... ) +, or M = $( tt ... ) SEP +

    • Let sep_set = { SEP } if SEP present;

      ある
      otherwise
      さもなければ
      sep_set = {}.

    • if ε LAST(tt ...) then LAST(M) = LAST(tt ...) sep_set

    • otherwise,

      さもなければ
      the sequence
      連なり、並び
      tt ... must be non-empty; LAST(M) = LAST(tt ...)

  • if M is a delimited token-tree sequence

    連なり、並び
    OPEN tt ... CLOSE, then LAST(M) = { CLOSE }

  • if M is a non-empty sequence

    連なり、並び
    of token-trees tt uu ...,

    • If ε LAST(uu ...), then LAST(M) = LAST(tt) (LAST(uu ...) \ { ε }).

    • Otherwise,

      さもなければ
      the sequence
      連なり、並び
      uu ... must be non-empty; then LAST(M) = LAST(uu ...)

NOTE: The presence or absence of SEP is relevant to the above definitions,

定義
but solely
単に/単独で
in the case where the interior of the complex
複素数、複文の
NT could be empty
空の
(i.e. ε FIRST(interior)). (I overlooked this fact in my first round of prototyping.)

NOTE: The above definition

定義
for LAST assumes that we keep our pre-existing rule that the seperator token
トークン
in a complex
複素数、複文の
NT is solely
単に/単独で
for separating
分割する
elements;
要素
i.e. that such NT's do not match
一致する、マッチさせる
fragments that end with the seperator token.
トークン
If we choose to lift this restriction
制限
in the future, the above definition
定義
will need to be revised accordingly.

Examples of FIRST and LAST

Below are some examples of FIRST and LAST. (Note in particular how the special ε element

要素
is introduced and eliminated based
基となる、基底(の)
on the interation between the pieces of the input.)

Our first example is presented

ある
in a tree structure to elaborate on how the analysis
解析
of the matcher composes. (Some of the simpler subtrees have been elided.)

INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g ~~~~~~~~ ~~~~~~~ ~ | | | FIRST: { $d:ident } { $e:expr } { h } INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ ~~~~~~~~~~~~~~~~~~ ~~~~~~~ ~~~ | | | FIRST: { $d:ident } { h, ε } { f } INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~ ~ | | | | FIRST: { $d:ident, ε } { h, ε, ; } { f } { g } INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | FIRST: { $d:ident, h, ;, f }

Thus:

それゆえに、従って、

  • FIRST($($d:ident $e:expr );* $( $(h)* );* $( f ;)+ g) = { $d:ident, h, ;, f }

Note however that:

  • FIRST($($d:ident $e:expr );* $( $(h)* );* $($( f ;)+ g)*) = { $d:ident, h, ;, f, ε }

Here are similar

似ている、同様の
examples but now for LAST.

  • LAST($d:ident $e:expr) = { $e:expr }
  • LAST($( $d:ident $e:expr );*) = { $e:expr, ε }
  • LAST($( $d:ident $e:expr );* $(h)*) = { $e:expr, ε, h }
  • LAST($( $d:ident $e:expr );* $(h)* $( f ;)+) = { ; }
  • LAST($( $d:ident $e:expr );* $(h)* $( f ;)+ g) = { g }

and again, changing the end part of matcher changes its last set

セットする、集合
considerably:

  • LAST($( $d:ident $e:expr );* $(h)* $($( f ;)+ g)*) = { $e:expr, ε, h, g }

FOLLOW(M)

Finally, the definition

定義
for FOLLOW(M) is built up incrementally atop more primitive functions.

We first assume a primitive mapping, FOLLOW(NT) (defined below) from a simple NT to the set

セットする、集合
of allowed
許可する、可能にする
tokens
トークン
for the fragment specifier for that NT.

Second, we generalize

一般
FOLLOW
下記の、次に続く、追従する
to tokens:
トークン
FOLLOW(t) = FOLLOW(NT) if t is (a simple) NT. Otherwise,
さもなければ
t must be some other (non NT) token;
トークン
in this case FOLLOW(t) = ANYTOKEN.

Finally, we generalize

一般
FOLLOW
下記の、次に続く、追従する
to arbitrary
任意の
matchers by composing
構成する
the primitive functions above:

FOLLOW(M) = FOLLOW(t1) ∩ FOLLOW(t2) ∩ ... ∩ FOLLOW(tN) where { t1, t2, ..., tN } = (LAST(M) \ { ε })

Examples of FOLLOW

下記の、次に続く、追従する
(expressed as equality
同一性、等式
relations between sets,
セットする、集合
to avoid
避ける、回避する
incorporating details of FOLLOW(NT) in these examples):

  • FOLLOW($( $d:ident $e:expr )*) = FOLLOW($e:expr)
  • FOLLOW($( $d:ident $e:expr )* $(;)*) = FOLLOW($e:expr) ANYTOKEN = FOLLOW($e:expr)
  • FOLLOW($( $d:ident $e:expr )* $(;)* $( f |)+) = ANYTOKEN

FOLLOW(NT)

Here is the definition

定義
for FOLLOW(NT), which maps every simple NT to the set
セットする、集合
of tokens
トークン
that are allowed
許可する、可能にする
to follow
下記の、次に続く、追従する
it, based
基となる、基底(の)
on the fragment specifier for the NT.

The current legal

(文法的に)適格
fragment specifiers are: item, block, stmt, pat, expr, ty, ident, path, meta, and tt.

  • FOLLOW(pat) = {FatArrow, Comma, Eq, Or, Ident(if), Ident(in)}
  • FOLLOW(expr) = {FatArrow, Comma, Semicolon}
  • FOLLOW(ty) = {OpenDelim(Brace), Comma, FatArrow, Colon, Eq, Gt, Semi, Or, Ident(as), Ident(where), OpenDelim(Bracket), Nonterminal(Block)}
  • FOLLOW(stmt) = FOLLOW(expr)
  • FOLLOW(path) = FOLLOW(ty)
  • FOLLOW(block) = any token
    トークン
  • FOLLOW(ident) = any token
    トークン
  • FOLLOW(tt) = any token
    トークン
  • FOLLOW(item) = any token
    トークン
  • FOLLOW(meta) = any token
    トークン

(Note that close delimiters are valid

有効な、正しい
following
下記の、次に続く、追従する
any NT.)

Examples of valid
有効な、正しい
and invalid matchers

With the above specification

仕様、仕様書
in hand, we can present
ある
arguments
引数
for why particular matchers are legal
(文法的に)適格
and others are not.

  • ($ty:ty < foo ,) : illegal,

    文法違反
    because FIRST(< foo ,) = { < } FOLLOW(ty)

  • ($ty:ty , foo <) : legal,

    (文法的に)適格
    because FIRST(, foo <) = { , } is FOLLOW(ty).

  • ($pa:pat $pb:pat $ty:ty ,) : illegal,

    文法違反
    because FIRST($pb:pat $ty:ty ,) = { $pb:pat } FOLLOW(pat), and also FIRST($ty:ty ,) = { $ty:ty } FOLLOW(pat).

  • ( $($a:tt $b:tt)* ; ) : legal,

    (文法的に)適格
    because FIRST($b:tt) = { $b:tt } is FOLLOW(tt) = ANYTOKEN, as is FIRST(;) = { ; }.

  • ( $($t:tt),* , $(t:tt),* ) : legal

    (文法的に)適格
    (though any attempt to actually use this macro will signal a local ambiguity
    曖昧さ
    error during expansion).

  • ($ty:ty $(; not sep)* -) : illegal,

    文法違反
    because FIRST($(; not sep)* -) = { ;, - } is not in FOLLOW(ty).

  • ($($ty:ty)-+) : illegal,

    文法違反
    because separator - is not in FOLLOW(ty).

Drawbacks

It does restrict

制限する
the input to a MBE, but the choice of delimiters provides
与える
reasonable freedom and can be extended
拡張する
in the future.

Alternatives

  1. Fix the syntax
    文法
    that a fragment can parse.
    構文解析する
    This would create a situation where a future MBE might not be able to accept
    受け付ける、受理する
    certain inputs because the input uses newer features than the fragment that was fixed at 1.0. For example, in the bar MBE above, if the ty fragment was fixed before the unboxed closure sugar was introduced, the MBE would not be able to accept
    受け付ける、受理する
    such a type. While this approach is feasible, it would cause
    起こす
    unnecessary confusion for future users of MBEs when they can't put certain perfectly valid
    有効な、正しい
    Rust code in the input to an MBE. Versioned fragments could avoid
    避ける、回避する
    this problem but only for new code.
  2. Keep macro_rules unstable. Given
    与えられた
    the great syntactical abstraction that macro_rules provides,
    与える
    it would be a shame for it to be unusable in a release version of Rust. If ever macro_rules were to be stabilized, this same issue would come up.
  3. Do nothing. This is very dangerous, and has the potential to essentially freeze Rust's syntax
    文法
    for fear of accidentally breaking a macro.

Edit History

  • Updated by https://github.com/rust-lang/rfcs/pull/1209, which added

    たす
    semicolons into the follow
    下記の、次に続く、追従する
    set
    セットする、集合
    for types.

  • Updated by https://github.com/rust-lang/rfcs/pull/1384:

    • replaced detailed design
      設計(する)
      with a specification-oriented presentation rather than an implementation-oriented algorithm.
    • fixed some oversights in the specification
      仕様、仕様書
      that led to matchers like $e:expr { stuff } being accepted
      受け付ける、受理する
      (which match
      一致する、マッチさせる
      fragments like break { stuff }, significantly
      著しく
      limiting future language
      言語
      extensions),
    • expanded the follows
      下記の、次に続く、追従する
      sets
      セットする、集合
      for ty to include OpenDelim(Brace), Ident(where), Or (since Rust's grammar
      文法
      already requires all of |foo:TY| {}, fn foo() -> TY {} and fn foo() -> TY where {} to work).
    • expanded the follow
      下記の、次に続く、追従する
      set
      セットする、集合
      for pat to include Or (since Rust's grammar
      文法
      already requires match
      一致する、マッチさせる
      (true,false) { PAT | PAT => {} }
      and |PAT| {} to work); see also RFC issue 1336. Also added
      たす
      If and In to follow
      下記の、次に続く、追従する
      set
      セットする、集合
      for pat (to make the specification
      仕様、仕様書
      match
      一致する、マッチさせる
      the old implementation).
  • Updated by https://github.com/rust-lang/rfcs/pull/1462, which added

    たす
    open square bracket into the follow
    下記の、次に続く、追従する
    set
    セットする、集合
    for types.

  • Updated by https://github.com/rust-lang/rfcs/pull/1494, which adjusted the follow

    下記の、次に続く、追従する
    set
    セットする、集合
    for types to include block nonterminals.

Appendices

Appendix A: Algorithm for recognizing valid
有効な、正しい
matchers.

The detailed design

設計(する)
above only sought to provide a specification
仕様、仕様書
for what a correct matcher is (by defining
定義する
FIRST, LAST, and FOLLOW,
下記の、次に続く、追従する
and specifying
特定する、指定する、規定する
the invariant relating FIRST and FOLLOW
下記の、次に続く、追従する
for all valid
有効な、正しい
matchers.

The above specification

仕様、仕様書
can be implemented
実装する
efficiently; we here give one example algorithm for recognizing valid
有効な、正しい
matchers.

  • This is not the only possible algorithm; for example, one could precompute a table mapping every suffix of every token-tree sequence

    連なり、並び
    to its FIRST set,
    セットする、集合
    by augmenting FirstSet below accordingly.

    Or one could store a subset

    部分集合
    of such information during the precomputation, such as just the FIRST sets
    セットする、集合
    for complex
    複素数、複文の
    NT's, and then use that table to inform a forward scan of the input.

    The latter is in fact what my prototype implementation

    実装
    does; I must emphasize the point that the algorithm here is not prescriptive.

  • The intent of this RFC is that the specifications

    仕様、仕様書
    of FIRST and FOLLOW
    下記の、次に続く、追従する
    above will take
    とる
    precedence
    優先順位
    over this algorithm if the two are found to be producing
    産出、産出する
    inconsistent results.
    結果、戻り値

The algorithm for recognizing valid

有効な、正しい
matchers M is named ValidMatcher.

To define

定義する
it, we will need a mapping from submatchers of M to the FIRST set
セットする、集合
for that submatcher; that is handled by FirstSet.

Procedure FirstSet(M)

input: a token

トークン
tree M representing
表現する
a matcher

output: FIRST(M)

Let M = tts[1] tts[2] ... tts[n]. Let curr_first = { ε }. For i in n down to 1 (inclusive): Let tt = tts[i]. 1. If tt is a token, curr_first := { tt } 2. Else if tt is a delimited sequence `OPEN uu ... ClOSE`, curr_first := { OPEN } 3. Else tt is a complex NT `$(uu ...) SEP OP` Let inner_first = FirstSet(`uu ...`) i.e. recursive call if OP == `*` or ε ∈ inner_first then curr_first := curr_first ∪ inner_first else curr_first := inner_first return curr_first

(Note: If we were precomputing a full table in this procedure, we would need a recursive invocation

呼び出し
on (uu ...) in step 2 of the for-loop.)

Predicate ValidMatcher(M)

To simplify the specification,

仕様、仕様書
we assume in this presentation that all simple NT's have a valid
有効な、正しい
fragment specifier (i.e., one that has an entry
項目
in the FOLLOW(NT) table above.

This algorithm works by scanning forward across the matcher M = α β, (where α is the prefix

接頭辞
we have scanned so far, and β is the suffix that remains to be scanned). We maintain LAST(α) as we scan, and use it to compute FOLLOW(α) and compare that to FIRST(β).

input: a token

トークン
tree, M, and a set
セットする、集合
of tokens
トークン
that could follow
下記の、次に続く、追従する
it, F.

output: LAST(M) (and also signals failure whenever M is invalid)

Let last_of_prefix = { ε } Let M = tts[1] tts[2] ... tts[n]. For i in 1 up to n (inclusive): // For reference: // α = tts[1] .. tts[i] // β = tts[i+1] .. tts[n] // γ is some outer token sequence; the input F represents FIRST(γ) 1. Let tt = tts[i]. 2. Let first_of_suffix; // aka FIRST(β γ) 3. let S = FirstSet(tts[i+1] .. tts[n]); 4. if ε ∈ S then // (include the follow information if necessary) first_of_suffix := S ∪ F 5. else first_of_suffix := S 6. Update last_of_prefix via case analysis on tt: a. If tt is a token: last_of_prefix := { tt } b. Else if tt is a delimited sequence `OPEN uu ... CLOSE`: i. run ValidMatcher( M = `uu ...`, F = { `CLOSE` }) ii. last_of_prefix := { `CLOSE` } c. Else, tt must be a complex NT, in other words, `NT = $( uu .. ) SEP OP` or `NT = $( uu .. ) OP`: i. If SEP present, let sublast = ValidMatcher( M = `uu ...`, F = first_of_suffix ∪ { `SEP` }) ii. else: let sublast = ValidMatcher( M = `uu ...`, F = first_of_suffix) iii. If ε in sublast then: last_of_prefix := last_of_prefix ∪ (sublast \ ε) iv. Else: last_of_prefix := sublast 7. At this point, last_of_prefix == LAST(α) and first_of_suffix == FIRST(β γ). For each simple NT token t in last_of_prefix: a. If first_of_suffix ⊆ FOLLOW(t), then we are okay so far. </li> b. Otherwise, we have found a token t whose follow set is not compatible with the FIRST(β γ), and must signal failure. // After running the above for loop on all of `M`, last_of_prefix == LAST(M) Return last_of_prefix

This algorithm should be run on every matcher in every macro_rules invocation,

呼び出し
with F = { EOF }. If it rejects a matcher, an error should be emitted and compilation should not complete.
完全な