Conventions¶
The textual format for WebAssembly modules is a rendering of their abstract syntax into S-expressions.
Like the binary format, the text format is defined by an attribute grammar. A text string is a well-formed description of a module if and only if it is generated by the grammar. Each production of this grammar has at most one synthesized attribute: the abstract syntax that the respective character sequence expresses. Thus, the attribute grammar implicitly defines a parsing function. Some productions also take a context as an inherited attribute that records bound identifiers.
Except for a few exceptions, the core of the text grammar closely mirrors the grammar of the abstract syntax. However, it also defines a number of abbreviations that are “syntactic sugar” over the core syntax.
The recommended extension for files containing WebAssembly modules in text format is “\(\mathtt{.wat}\)”. Files with this extension are assumed to be encoded in UTF-8, as per Unicode (Section 2.5).
Grammar¶
The following conventions are adopted in defining grammar rules of the text format. They mirror the conventions used for abstract syntax and for the binary format. In order to distinguish symbols of the textual syntax from symbols of the abstract syntax, \(\mathtt{typewriter}\) font is adopted for the former.
Terminal symbols are either literal strings of characters enclosed in quotes or expressed as Unicode scalar values: \(\def\mathdef2735#1{\mbox{‘}\mathtt{#1}\mbox{’}}\mathdef2735{module}\), \(\def\mathdef2736#1{\mathrm{U{+}#1}}\mathdef2736{0A}\). (All characters written literally are unambiguously drawn from the 7-bit ASCII subset of Unicode.)
Nonterminal symbols are written in typewriter font: \(\mathtt{valtype}, \mathtt{instr}\).
\(T^n\) is a sequence of \(n\geq 0\) iterations of \(T\).
\(T^\ast\) is a possibly empty sequence of iterations of \(T\). (This is a shorthand for \(T^n\) used where \(n\) is not relevant.)
\(T^+\) is a sequence of one or more iterations of \(T\). (This is a shorthand for \(T^n\) where \(n \geq 1\).)
\(T^?\) is an optional occurrence of \(T\). (This is a shorthand for \(T^n\) where \(n \leq 1\).)
\(x{:}T\) denotes the same language as the nonterminal \(T\), but also binds the variable \(x\) to the attribute synthesized for \(T\).
Productions are written \(\mathtt{sym} ::= T_1 \Rightarrow A_1 ~|~ \dots ~|~ T_n \Rightarrow A_n\), where each \(A_i\) is the attribute that is synthesized for \(\mathtt{sym}\) in the given case, usually from attribute variables bound in \(T_i\).
Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.
If the same meta variable or non-terminal symbol appears multiple times in a production (in the syntax or in an attribute), then all those occurrences must have the same instantiation.
A distinction is made between lexical and syntactic productions. For the latter, arbitrary white space is allowed in any place where the grammar contains spaces. The productions defining lexical syntax and the syntax of values are considered lexical, all others are syntactic.
Note
For example, the textual grammar for number types is given as follows:
The textual grammar for limits is defined as follows:
The variables \(n\) and \(m\) name the attributes of the respective \(\href{../text/values.html#text-int}{\def\mathdef2721#1{{\mathtt{u}#1}}\mathdef2721{\mathtt{32}}}\) nonterminals, which in this case are the actual unsigned integers those parse into. The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.
Abbreviations¶
In addition to the core grammar, which corresponds directly to the abstract syntax, the textual syntax also defines a number of abbreviations that can be used for convenience and readability.
Abbreviations are defined by rewrite rules specifying their expansion into the core syntax:
These expansions are assumed to be applied, recursively and in order of appearance, before applying the core grammar rules to construct the abstract syntax.
Contexts¶
The text format allows the use of symbolic identifiers in place of indices. To resolve these identifiers into concrete indices, some grammar productions are indexed by an identifier context \(I\) as a synthesized attribute that records the declared identifiers in each index space. In addition, the context records the types defined in the module, so that parameter indices can be computed for functions.
It is convenient to define identifier contexts as records \(I\) with abstract syntax as follows:
For each index space, such a context contains the list of identifiers assigned to the defined indices. Unnamed indices are associated with empty (\(\epsilon\)) entries in these lists.
An identifier context is well-formed if no index space contains duplicate identifiers.
Conventions¶
To avoid unnecessary clutter, empty components are omitted when writing out identifier contexts. For example, the record \(\{\}\) is shorthand for an identifier context whose components are all empty.
Vectors¶
Vectors are written as plain sequences, but with a restriction on the length of these sequence.