Path resolution
Lockjaw need to know the fully qualified path of a type, so they can be compared against each other.
In Rust, all a proc_macro
can see is tokens, which is too early to resolve the type path. When a
Foo
identifier is encountered, it is difficult for the macro to understand whether it is a type
declared in the local module, or a type from somewhere else brought in by a use
declaration. Rust
don't even tell the macro what the local module is.
Base mod
of the file
The first problem is the proc_macro
doesn't even know where the source being compiled is at. The
file!()
and module_path!()
would be a perfect
solution to this, but eager macro expansion is
required for a proc_macro
to be able to utilize it.
proc_macro2::Span::source_file()
also exists, but it is nightly feature and requires procmacro2_semver_exempt
which is contagious.
Since the with cross macro communication hacks the user only need to
do this once per file, we've decided to let the user pass the current path with
the prologue!()
macro. We will need
to parse the whole file later anyway, so we take the file name and derive the mod
path from it.
To make sure the prologue!()
macro is called in every file, it declares a hidden symbol locally
which all other Lockjaw proc_macro
will try to use, so if the prolouge!()
is missing compilation
will fail. In later steps we also verify prologue!()
is the first Lockjaw macro called in the
file, as the current file info is stored in global memory and must be reset in each file.
prolouge!()
also generates a test to make sure the path passed in matches what file!()
would
give. However using the wrong path will usually cause Lockjaw to fail miserably since all type info
are messed up, and the test will not even be run, which makes it not too useful.
mod
structure and use
declarations
A file can still contain nested mod
in it, each importing more symbols with the use
declaration.
For a given token, lockjaw needs to know which mod
it is in, and what symbols are brought into
that scope. This requires parsing the whole file, so we can keep what the span of each mod
is and
what use
are inside it.
syn::parse_file()
sounds like a good fit for
this, however the tokens it produces does not record
proper spans, so we cannot use it
to find the position of mod
.
Lockjaw handles this by using another AST parser (tree_sitter) to parse the file.
Finding which mod
a token is in
Position info of a token is encoded in
the span object, but currently it
is opaque. Lockjaw forcefully extract the data from
span's Debug
representation, which contains
the token's byte range. No need to say this is an awful thing to do.
Once the byte position is know, it can be used to find the deepest enclosing mod
.
Handling file mod
Rust currently handles file mod
as includes internally. It inserts the content of the file
directly into the token stream, and ends up in a giant stream for the whole crate. The consequence
of this is the byte position the proc_marco
token actually have is shifted around by inserted
files, and will not match its byte position inside the file the 3p AST parser sees.
Fortunately the Lockjaw proc_macro
has the span of the prologue!()
macro itself, and it knows
the macro must appear only once inside the file. Lockjaw is able to inspect the AST to find the file
position of the prologue!()
macro, and calculate the offset between file position and token
position.
One of the effect is Lockjaw has to ban file mod
that appears after prologue!()
, as it will
invalidate the offset. Theoretically Lockjaw can recursively calculate the size of the file mod
,
but limiting the position of file mod
does not seem too bad.