Summary
Future-proof the allowed
Key Terminology
macro
: anything invokable asfoo!(...)
in source元のcode.MBE
: macro-by-example, a macro defined定義するbymacro_rules
.matcher
: the left-hand-side of a rule in amacro_rules
invocation,呼び出しor a subportion thereof.macro parser
: the bit of code in the Rust parser that will parse構文解析するthe input using a grammar文法derived from all of the matchers.fragment
: The class分類of Rust syntax文法that a given与えられたmatcher will accept受け付ける、受理する(or "match").repetition
: a fragment that follows繰り返し下記の、次に続く、追従するa regular普通の、正規のrepeating patternNT
: non-terminal,非終端記号the variousさまざまな"meta-variables" or repetition繰り返しmatchers that can appear現れるin a matcher, specified in MBE syntax文法with a leading$
character.文字simple NT
: a "meta-variable" non-terminal非終端記号(further discussion below).complex
: a repetition複素数、複文のNT繰り返しmatching一致する、マッチさせるnon-terminal,非終端記号specified via Kleene closure operators演算子(*
,+
).token
: an atomic elementトークン要素of a matcher; i.e. identifiers,識別子operators,演算子open/close delimiters, and simple NT's.token
: a tree structure formedトークンtree形式、形態、形作るfrom tokensトークン(the leaves), complex複素数、複文のNT's, and finite sequences連なり、並びof tokenトークンtrees.delimiter token
: a tokenトークントークンthat is meant to divide the end of one fragment and the start of the next fragment.separator token
: an optionalトークン必須でないdelimiter tokenトークンin an complex複素数、複文のNT that separates分割するeach pair of elements要素in the matched一致する、マッチさせるrepetition.繰り返しseparated complex
: a complex複素数、複文のNT複素数、複文のNT that has its own separator token.トークンdelimited sequence
: a sequence連なり、並び連なり、並びof tokenトークンtrees with appropriate open- and close-delimiters at the start and end of the sequence.連なり、並びempty
: The class空のfragment分類of invisible Rust syntax文法that separates分割するtokens,トークンi.e. whitespace, or (in some lexical字句的contexts), the empty空のtokenトークンsequence.連なり、並びfragment specifier
: The identifier識別子in a simple NT that specifies特定する、指定する、規定するwhich fragment the NT accepts.受け付ける、受理するlanguage
: a context-free language.言語言語
Example:
(start $foo:expr $($i:ident),* end)
is a matcher. The whole matcher is a delimited sequence(
and )
), and $foo
and $i
are simple NT's with expr
and ident
as their respective
$(i:ident),*
is also an NT; it is a complex,
is the separator token
Another example of a complex$(hi $e:expr ;)+
, which matcheshi <expr>; hi <expr>; ...
where hi <expr>;
occurs
(Note that Rust's parser ensures that delimited sequences
Motivation
In current Rust (version 0.12; i.e. pre 1.0), the macro_rules
parser is very liberal in what it accepts
Attempts to invoke this MBE will never succeed,foo
and look for a block for bar
(noting that blocks are valid
When the type syntaxFnMut(i8, u8) -> i8;
became ambiguous. The goal of this proposal is to prevent
Another example of a potential extension to the language()
", where the break
expressionbreak <expr>
.
-
This proposed extension to the language,
言語combined合体する、組み合わせるwith the facts thatbreak
and{ <stmt> ... <expr>? }
are Rust expressions,式implies that{
should not be in the follow下記の、次に続く、追従するsetセットする、集合for theexpr
fragment specifier. -
Thus
それゆえに、従って、in a slightly more ideal world the following下記の、次に続く、追従するprogram would not be accepted,受け付ける、受理するbecause the interpretation解釈of the macro could change if we were to accept受け付ける、受理するRFC 352:macro_rules! foo { ($e:expr { stuff }) => { println!("{:?}", $e) } } fn main() { loop { foo!(break { stuff }); } }
(in our non-ideal world, the program is legal
(文法的に)適格in Rust versions 1.0 through at least 1.4)
Detailed design設計(する)
We will tend to use the variable
"SEP" will range*
and +
, and "OPEN"/"CLOSE" over matching[
and ]
).
We also use Greek letters
- This Greek letter文字convention is usually just employed when the presence of a sequence連なり、並びis a technical detail; in particular, when I wish to emphasize that we are operating on a sequence連なり、並びof token-trees, I will use the notation記法"tt ..." for the sequence,連なり、並びnot a Greek letter文字
Note that a matcher is merely a token$foo:ty
is a simple NT but $($foo:ty)+
is a complex
Note also that in the context
Finally, it is useful for the reader to keep in mind that according
The Matcher Invariant
This RFC establishes the following
-
For any two successive
連続したtokenトークンtree sequences連なり、並びin a matcherM
(i.e.M = ... tt uu ...
), we must have FOLLOW(... tt
) ⊇ FIRST(uu ...
) -
For any separated complex
複素数、複文のNT in a matcher,M = ... $(tt ...) SEP OP ...
, we must haveSEP
∈ FOLLOW(tt ...
).
The first part says that whatever actual... tt
ends and uu ...
begins, even as new syntactic forms
The second part says that a separated complextt ...
's, even as new syntactic forms
(This is assuming that all such changes are appropriately restricted,
The above invariant is only formally meaningful if one knows what FIRST and FOLLOW
FIRST and FOLLOW,下記の、次に続く、追従する informally非公式に
FIRST and FOLLOW
A given
Each of the three sets
Informally:
-
FIRST(M): collects the tokens
トークンpotentially used first when matching一致する、マッチさせるa fragment to M. -
LAST(M): collects the tokens
トークンpotentially used last when matching一致する、マッチさせるa fragment to M. -
FOLLOW(M): the set
セットする、集合of tokensトークンallowed許可する、可能にするto follow下記の、次に続く、追従するimmediately直後に、直接的にafter some fragment matched一致する、マッチさせるby M.In other words: t ∈ FOLLOW(M) if and only if there exists (potentially empty) token
トークンsequences連なり、並びα, β, γ, δ where:- M matches一致する、マッチさせるβ,
- t matches一致する、マッチさせるγ, and
- The concatenation結合、連接α β γ δ is a parseable Rust program.
- M matches
We use the shorthand
- (For example, if any tokenトークンis legal(文法的に)適格after a matcher M, then FOLLOW(M) = ANYTOKEN.)
(To review one's understanding of the above informal descriptions, the reader at this point may want to jump ahead to the examples of FIRST/LAST before reading their formal definitions.)
FIRST, LAST
Below are formal inductive definitions
"A ∪ B" denotes
FIRST(M), defined
-
if M is the empty
空のsequence,連なり、並びthen FIRST(M) = { ε }, -
if M starts with a token
トークンt, then FIRST(M) = { t },(Note: this covers the case where M starts with a delimited token-tree sequence,
連なり、並びM = OPEN tt ... CLOSE ...
, in which caset = OPEN
and thusそれゆえに、従って、FIRST(M) = {OPEN
}.)(Note: this critically relies on the property
特徴、性質that no simple NT matches一致する、マッチさせるthe empty空のfragment.) -
Otherwise,
さもなければM is a token-tree sequence連なり、並びstarting with a complex複素数、複文のNT:M = $( tt ... ) OP α
, orM = $( tt ... ) SEP OP α
, (whereα
is the (potentially empty) sequence連なり、並びof tokenトークンtrees for the rest of the matcher).-
Let sep_set = { SEP } if SEP present;
あるotherwiseさもなければsep_set = {}. -
If ε ∈ FIRST(
tt ...
), then FIRST(M) = (FIRST(tt ...
) \ { ε }) ∪ sep_set ∪ FIRST(α
) -
Else if OP =
*
, then FIRST(M) = FIRST(tt ...
) ∪ FIRST(α
) -
Otherwise
さもなければ(OP =+
), FIRST(M) = FIRST(tt ...
)
-
Note: The ε-case above,
FIRST(M) = (FIRST(
tt ...
) \ { ε }) ∪ sep_set ∪ FIRST(α
)
may seem complicated, so lets takett ...
may be empty.SEP
itself (if it is present), or it may be the first tokenα
); that's why the resultα
)". Note also that if α
itself may matchα
) will ensureα
cannot matchtt ...
).
LAST(M), defined
-
if M is the empty
空のsequence,連なり、並びthen LAST(M) = { ε } -
if M is a singleton token
トークンt, then LAST(M) = { t } -
if M is the singleton complex
複素数、複文のNT repeating zero or more times,M = $( tt ... ) *
, orM = $( tt ... ) SEP *
-
Let sep_set = { SEP } if SEP present;
あるotherwiseさもなければsep_set = {}. -
if ε ∈ LAST(
tt ...
) then LAST(M) = LAST(tt ...
) ∪ sep_set -
otherwise,
さもなければthe sequence連なり、並びtt ...
must be non-empty; LAST(M) = LAST(tt ...
) ∪ { ε }
-
-
if M is the singleton complex
複素数、複文のNT repeating one or more times,M = $( tt ... ) +
, orM = $( tt ... ) SEP +
-
Let sep_set = { SEP } if SEP present;
あるotherwiseさもなければsep_set = {}. -
if ε ∈ LAST(
tt ...
) then LAST(M) = LAST(tt ...
) ∪ sep_set -
otherwise,
さもなければthe sequence連なり、並びtt ...
must be non-empty; LAST(M) = LAST(tt ...
)
-
-
if M is a delimited token-tree sequence
連なり、並びOPEN tt ... CLOSE
, then LAST(M) = {CLOSE
} -
if M is a non-empty sequence
連なり、並びof token-treestt uu ...
,-
If ε ∈ LAST(
uu ...
), then LAST(M) = LAST(tt
) ∪ (LAST(uu ...
) \ { ε }). -
Otherwise,
さもなければthe sequence連なり、並びuu ...
must be non-empty; then LAST(M) = LAST(uu ...
)
-
NOTE: The presence or absence of SEP is relevant to the above definitions,
NOTE: The above definition
Examples of FIRST and LAST
Below are some examples of FIRST and LAST. (Note in particular how the special ε element
Our first example is presented
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g
~~~~~~~~ ~~~~~~~ ~
| | |
FIRST: { $d:ident } { $e:expr } { h }
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+
~~~~~~~~~~~~~~~~~~ ~~~~~~~ ~~~
| | |
FIRST: { $d:ident } { h, ε } { f }
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~ ~
| | | |
FIRST: { $d:ident, ε } { h, ε, ; } { f } { g }
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
FIRST: { $d:ident, h, ;, f }
Thus:
- FIRST(
$($d:ident $e:expr );* $( $(h)* );* $( f ;)+ g
) = {$d:ident
,h
,;
,f
}
Note however that:
- FIRST(
$($d:ident $e:expr );* $( $(h)* );* $($( f ;)+ g)*
) = {$d:ident
,h
,;
,f
, ε }
Here are similar
- LAST(
$d:ident $e:expr
) = {$e:expr
} - LAST(
$( $d:ident $e:expr );*
) = {$e:expr
, ε } - LAST(
$( $d:ident $e:expr );* $(h)*
) = {$e:expr
, ε,h
} - LAST(
$( $d:ident $e:expr );* $(h)* $( f ;)+
) = {;
} - LAST(
$( $d:ident $e:expr );* $(h)* $( f ;)+ g
) = {g
}
and again, changing the end part of matcher changes its last set
- LAST(
$( $d:ident $e:expr );* $(h)* $($( f ;)+ g)*
) = {$e:expr
, ε,h
,g
}
FOLLOW(M)
Finally, the definitionFOLLOW(M)
is built up incrementally atop more primitive functions.
We first assume a primitive mapping, FOLLOW(NT)
(defined below) from a simple NT to the set
Second, we generalize
Finally, we generalize
FOLLOW(M) = FOLLOW(t1) ∩ FOLLOW(t2) ∩ ... ∩ FOLLOW(tN)
where { t1, t2, ..., tN } = (LAST(M) \ { ε })
Examples of FOLLOW
- FOLLOW(
$( $d:ident $e:expr )*
) = FOLLOW($e:expr
) - FOLLOW(
$( $d:ident $e:expr )* $(;)*
) = FOLLOW($e:expr
) ∩ ANYTOKEN = FOLLOW($e:expr
) - FOLLOW(
$( $d:ident $e:expr )* $(;)* $( f |)+
) = ANYTOKEN
FOLLOW(NT)
Here is the definition
The current legalitem
, block
, stmt
, pat
, expr
, ty
, ident
, path
, meta
, and tt
.
FOLLOW(pat)
={FatArrow, Comma, Eq, Or, Ident(if), Ident(in)}
FOLLOW(expr)
={FatArrow, Comma, Semicolon}
FOLLOW(ty)
={OpenDelim(Brace), Comma, FatArrow, Colon, Eq, Gt, Semi, Or, Ident(as), Ident(where), OpenDelim(Bracket), Nonterminal(Block)}
FOLLOW(stmt)
=FOLLOW(expr)
FOLLOW(path)
=FOLLOW(ty)
FOLLOW(block)
= any tokenトークンFOLLOW(ident)
= any tokenトークンFOLLOW(tt)
= any tokenトークンFOLLOW(item)
= any tokenトークンFOLLOW(meta)
= any tokenトークン
(Note that close delimiters are valid
Examples of valid有効な、正しい and invalid matchers
With the above specification
-
($ty:ty < foo ,)
: illegal,文法違反because FIRST(< foo ,
) = {<
} ⊈ FOLLOW(ty
) -
($ty:ty , foo <)
: legal,(文法的に)適格because FIRST(, foo <
) = {,
} is ⊆ FOLLOW(ty
). -
($pa:pat $pb:pat $ty:ty ,)
: illegal,文法違反because FIRST($pb:pat $ty:ty ,
) = {$pb:pat
} ⊈ FOLLOW(pat
), and also FIRST($ty:ty ,
) = {$ty:ty
} ⊈ FOLLOW(pat
). -
( $($a:tt $b:tt)* ; )
: legal,(文法的に)適格because FIRST($b:tt
) = {$b:tt
} is ⊆ FOLLOW(tt
) = ANYTOKEN, as is FIRST(;
) = {;
}. -
( $($t:tt),* , $(t:tt),* )
: legal(文法的に)適格(though any attempt to actually use this macro will signal a local ambiguity曖昧さerror during expansion). -
($ty:ty $(; not sep)* -)
: illegal,文法違反because FIRST($(; not sep)* -
) = {;
,-
} is not in FOLLOW(ty
). -
($($ty:ty)-+)
: illegal,文法違反because separator-
is not in FOLLOW(ty
).
Drawbacks
It does restrict
Alternatives
- Fix the syntax文法that a fragment can parse.構文解析するThis would create a situation where a future MBE might not be able to accept受け付ける、受理するcertain inputs because the input uses newer features than the fragment that was fixed at 1.0. For example, in the
bar
MBE above, if thety
fragment was fixed before the unboxed closure sugar was introduced, the MBE would not be able to accept受け付ける、受理するsuch a type. While this approach is feasible, it would cause起こすunnecessary confusion for future users of MBEs when they can't put certain perfectly valid有効な、正しいRust code in the input to an MBE. Versioned fragments could avoid避ける、回避するthis problem but only for new code. - Keep
macro_rules
unstable. Given与えられたthe great syntactical abstraction thatmacro_rules
provides,与えるit would be a shame for it to be unusable in a release version of Rust. If evermacro_rules
were to be stabilized, this same issue would come up. - Do nothing. This is very dangerous, and has the potential to essentially freeze Rust's syntax文法for fear of accidentally breaking a macro.
Edit History
-
Updated by https://github.com/rust-lang/rfcs/pull/1209, which added
たすsemicolons into the follow下記の、次に続く、追従するsetセットする、集合for types. -
Updated by https://github.com/rust-lang/rfcs/pull/1384:
- replaced detailed design設計(する)with a specification-oriented presentation rather than an implementation-oriented algorithm.
- fixed some oversights in the specification仕様、仕様書that led to matchers like
$e:expr { stuff }
being accepted受け付ける、受理する(which match一致する、マッチさせるfragments likebreak { stuff }
, significantly著しくlimiting future language言語extensions), - expanded the follows下記の、次に続く、追従するsetsセットする、集合for
ty
to includeOpenDelim(Brace), Ident(where), Or
(since Rust's grammar文法already requires all of|foo:TY| {}
,fn foo() -> TY {}
andfn foo() -> TY where {}
to work). - expanded the follow下記の、次に続く、追従するsetセットする、集合for
pat
to includeOr
(since Rust's grammar文法already requiresmatch
and一致する、マッチさせる(true,false) { PAT | PAT => {} }|PAT| {}
to work); see also RFC issue 1336. Also addedたすIf
andIn
to follow下記の、次に続く、追従するsetセットする、集合forpat
(to make the specification仕様、仕様書match一致する、マッチさせるthe old implementation).
- replaced detailed design
-
Updated by https://github.com/rust-lang/rfcs/pull/1462, which added
たすopen square bracket into the follow下記の、次に続く、追従するsetセットする、集合for types. -
Updated by https://github.com/rust-lang/rfcs/pull/1494, which adjusted the follow
下記の、次に続く、追従するsetセットする、集合for types to include block nonterminals.
Appendices
Appendix A: Algorithm for recognizing valid有効な、正しい matchers.
The detailed design
The above specification
-
This is not the only possible algorithm; for example, one could precompute a table mapping every suffix of every token-tree sequence
連なり、並びto its FIRST set,セットする、集合by augmentingFirstSet
below accordingly.Or one could store a subset
部分集合of such information during the precomputation, such as just the FIRST setsセットする、集合for complex複素数、複文のNT's, and then use that table to inform a forward scan of the input.The latter is in fact what my prototype implementation
実装does; I must emphasize the point that the algorithm here is not prescriptive. -
The intent of this RFC is that the specifications
仕様、仕様書of FIRST and FOLLOW下記の、次に続く、追従するabove will takeとるprecedence優先順位over this algorithm if the two are found to be producing産出、産出するinconsistent results.結果、戻り値
The algorithm for recognizing validM
is named ValidMatcher.
To defineFirstSet
.
Procedure FirstSet(M)
input: a tokenM
representing
output: FIRST(M)
Let M = tts[1] tts[2] ... tts[n].
Let curr_first = { ε }.
For i in n down to 1 (inclusive):
Let tt = tts[i].
1. If tt is a token, curr_first := { tt }
2. Else if tt is a delimited sequence `OPEN uu ... ClOSE`,
curr_first := { OPEN }
3. Else tt is a complex NT `$(uu ...) SEP OP`
Let inner_first = FirstSet(`uu ...`) i.e. recursive call
if OP == `*` or ε ∈ inner_first then
curr_first := curr_first ∪ inner_first
else
curr_first := inner_first
return curr_first
(Note: If we were precomputing a full table in this procedure, we would need a recursive invocation
Predicate ValidMatcher(M)
To simplify the specification,
This algorithm works by scanning forward across the matcher M = α β, (where α is the prefix
input: a tokenM
, and a setF
.
output: LAST(M) (and also signals failure whenever M is invalid)
Let last_of_prefix = { ε }
Let M = tts[1] tts[2] ... tts[n].
For i in 1 up to n (inclusive):
// For reference:
// α = tts[1] .. tts[i]
// β = tts[i+1] .. tts[n]
// γ is some outer token sequence; the input F represents FIRST(γ)
1. Let tt = tts[i].
2. Let first_of_suffix; // aka FIRST(β γ)
3. let S = FirstSet(tts[i+1] .. tts[n]);
4. if ε ∈ S then
// (include the follow information if necessary)
first_of_suffix := S ∪ F
5. else
first_of_suffix := S
6. Update last_of_prefix via case analysis on tt:
a. If tt is a token:
last_of_prefix := { tt }
b. Else if tt is a delimited sequence `OPEN uu ... CLOSE`:
i. run ValidMatcher( M = `uu ...`, F = { `CLOSE` })
ii. last_of_prefix := { `CLOSE` }
c. Else, tt must be a complex NT,
in other words, `NT = $( uu .. ) SEP OP` or `NT = $( uu .. ) OP`:
i. If SEP present,
let sublast = ValidMatcher( M = `uu ...`, F = first_of_suffix ∪ { `SEP` })
ii. else:
let sublast = ValidMatcher( M = `uu ...`, F = first_of_suffix)
iii. If ε in sublast then:
last_of_prefix := last_of_prefix ∪ (sublast \ ε)
iv. Else:
last_of_prefix := sublast
7. At this point, last_of_prefix == LAST(α) and first_of_suffix == FIRST(β γ).
For each simple NT token t in last_of_prefix:
a. If first_of_suffix ⊆ FOLLOW(t), then we are okay so far. </li>
b. Otherwise, we have found a token t whose follow set is not compatible
with the FIRST(β γ), and must signal failure.
// After running the above for loop on all of `M`, last_of_prefix == LAST(M)
Return last_of_prefix
This algorithm should be run on every matcher in every macro_rules
invocation,F
= { EOF
}. If it rejects a matcher, an error should be emitted and compilation should not complete.