Implementation

Tokens

quasiquotes works by hooking into the file encoding logic. Every file is marked with an encoding type, defaulting to utf-8. This is shown with the # coding: <encoding> coments at the top of some files. This encoding defines the functions needed to convert the raw bytes that come in from the filesystem into python str objects. Users are also able to register their own encoding types by providing their own conversion functions. quasiquotes sits on top of the utf-8 encoding functions; however, it tokenizes the files coming in so that it can rewrite certian patterns.

Let’s look at some source code and the tokens that come out of it:

with $qq:
    this should not parse
    but it will
NAME('with')
ERROR(' ')
ERROR('$')
NAME('qq')
OP(':')
NEWLINE('\n')
<body>
DEDENT

This says we have the string ‘with’ followed by 2 errors. These tokens appear as ERROR because this would normally be an invalid token in python. The next part is the actual name of the quasiquoter you would want to use. Finally we have the colon and newline. The body is whatever sequence of tokens make up the indented region in the quasiquoter, and then we have the DEDENT token marking the end of the body.

By manipulating the tokens, we can change this into something that looks like:

cc._quote_stmt(0,'    this should not parse\n    but it will')

Here the 0 is the column offset of this quoted expression, and the string is the body of the context manager. The lack of space after the comma accuratly reflects the column offsets of the tokens that the quasiquotes tokenizer emits.

Note

The original indentation is preserved.

We can do this because we still have access to the raw text that makes up each line between the NEWLINE and the DEDENT.

Let’s also look at the quoted expressions:

[$qq|this is also invalid|]
  OP('[']
  ERROR('$')
  NAME('qq')
  OP('|')
  <body>
  OP('|')
  OP(']')


Just like with quoted statements, we can rewrite this to look more like:


.. code-block:: python

   qq._quote_expr(0,'    this is also invalid')

Note

Indentation is also preserved in a quoted expression.

Runtime Lookups

An important thing to notice about the implementation is that it builds source that has method calls of a dynamic object. While we are doing static work to make the parser see the quoted block as valid python, we do not load the quasiquoter until the function is being executed and we have a running frame. This means that the current value for the name of the quasiquoter will be used.

Expressions as QuasiQuoters

QuasiQuoters are instances, so one might think that they should be able to do:

with $MyQQ(some_arg=some_value):
    ...

Unfortunately, this changes the token stream. We no longer have an OP(':'), NEWLINE('\n') following the name of the quoter. Currently, we do not detect this case and the normal python syntax error will be thrown. This is also true for quoted expressions.