It’s time to show off a new aspect of Toccata. Scripting.
Toccata was originally envisioned to be a compiled language that produced executables to run as native apps. That process looks like
As I was implementing all this, I realized I could do something else with the AST. I could interpret it directly without generating and compiling C code. Obviously, the performance is much less. But depending on the script, it could start executing immediately. For little utilities that do tasks on “human scale” time, this could be very useful. I already have the AST, so how hard could it be to add the interpreter?
Yeah, it was that hard. A basic LISP interpreter is pretty easy. But once you start adding things you need for real life, the complexity jumps by leaps and bounds. And I’m pretty sure there are still loose ends to clean up. But it’s close enough I can share it.
Since the compiler already had the code to produce the AST, I just added a module to interpret it, so the same executable does double duty. You just have to add the --script
option before the Toccata file name. So let’s look at a script I put together.
1 #! /home/jim/toccata/toccata --script
2 (defprotocol Write
3 (write [_]
4 ""))
5 (defn extract-doc [ast]
6 (-> ast
7 .doc
8 .lines
9 (remove empty?)
10 (interpose "\n")
11 to-str))
12 (extend-type ast/fn-ast
13 Write
14 (write [ast]
15 (for [sym (.fn-sym ast)
16 doc-str (-> ast
17 .arities
18 first
19 (map extract-doc))]
20 (file/stdout ["\n" (str sym) "\n" doc-str "\n"]))))
21 (extend-type ast/definition-ast
22 Write
23 (write [ast]
24 (map (.value-exprs ast) write)))
25 (extend-type ast/prototype-ast
26 Write
27 (write [ast]
28 (let [lines (-> ast
29 .doc
30 .lines
31 (remove empty?))]
32 (file/stdout [(str "\n" (.fn-name ast) "\n")
33 (extract-doc ast)
34 "\n"]))))
35 (extend-type ast/protocol-ast
36 Write
37 (write [ast]
38 (map (.prototypes ast)
39 (fn [prototype]
40 (write (.fn-name prototype (str (.protocol-sym ast) "/"
41 (.fn-name prototype))))))))
42 (def extract-asts (parse/parser (grammar/none-or-more
43 (map reader/top-level write))))
44 (main [[_ file-name]]
45 (extract-asts {'file-name file-name
46 'line-number 0}
47 (file/slurp file-name)))
This script reads a Toccata source file and extracts the names and docstrings for top level functions from their AST’s. Let’s go through it line by line.
As with any shell script, the file starts with a shebang specifying the path to the compiler executable and any options. The name of the file gets appended to this command line and passed to the script.
Lines 2 - 4 is a standard Toccata protocol definition. The write
protocol function is what will extract the needed info and output it. It does nothinng by default, so any value that it’s called on that doesn’t explicitly implement it will do nothing.
Lines 5 - 11 is a function that will extract the doc string from a given AST node. Any symbol preceded by a “.” is a data type field getter/setter. So
(.doc ast)
is how you get the value of the doc
field of the ast
value. Doc strings are the first comment block in the body of a function. Like this
(defn some-fn [x y]
;; This is the doc string
;; and it might span multiple lines
(do-something-amazing-with x y))
The ->
expression is the same as in Clojure. It threads the result of each step into the first argument slot of the next, adding parentheses where needed. So
(-> ast
.doc
.lines
(remove empty?)
(interpose "\n")
to-str)
is equivalent to
(to-str (interpose (remove (.lines (.doc ast)) empty?) "\n"))
but easier to follow what’s going on. For the AST nodes we’ll call extract-doc
with, the doc
field contains an ast/block-comment-ast
value which has a lines
field. This field is a Vector
of strings. So this function extracts that vector, removes any empty strings, sticks a newline between the lines and constructs a single string value.
Lines 12 - 20 implements write
for a function AST node. This requires a side track.
One of my main goals in Toccata was to make the compiler modular and make those modules accessible to Toccata programmers so they could easily write tooling. These modules are brought together in the compiler.Since the compiler already has these avavilable, it would be absurd to require the interpreter to load new copies of them when interpreting scripts. Though it turns out, this would have been much, much easier.
There are a number of these modules provided to scripts as ‘pre-loaded’ namespaces that can be referenced using prefixed symbols. The ones used in this script (with their prefixes) are:
With that in mind, lines 12 - 20 extend the ast/fn-ast
AST node type so that it extracts the information from the node and sends it to file/stdout
. Breaking this down, line 15 is the beginning of a for
comprehension. This is similar to Clojure’s ‘for’ macro for sequence comprehensions, except in Toccata any data type that implements the flat-map
and wrap
core protocol functions can be used in a comprehension. In this case, it’s the Maybe
data type.
Looking at the type definition of the fn-ast
type, you’ll see it has such a field and also one named arities
. And the fn-sym
field is required to be of a Maybe type. So line 15 extracts this value. If it’s not nothing
, that is, there’s a value inside it, this inner value is extracted and bound to sym
. Otherwise the comprehension quits early. Then on line 16, the ‘->’ expression gets the arities
field, which is a vector of ast/fn-arity-ast
values and gets the first one. The first
function always returns a Maybe
value that contains the first arity if the vector isn’t empty. The call to map
then applys extract-doc
to that arity value, returning a string inside a Maybe
. This inner string is then extracted by the rules of the comprehension and bound to doc-str
.
Finally, on line 20, the various strings are put into a vector and fed to file/stdout
which writes them out. I’ll have more to say about STDIN and STDOUT in a future post.
And whew, lot of verbiage for a short function. The rest will go quicker.
Lines 21 - 24 implement write
for the definition AST node. An ast/definition-ast
node has two fields; sym
and value-exprs
. The value expressions might be any number of comment blocks with one expression that actually produces a value in there somwhere. And we’re only interested in the case where an ast/fn-ast
node is in there, so we map over the vector of expressions and apply write
to each one.
Hopefully, you can see how protocol definitions are handled in lines 25 - 41
And now we come to actually parsing a source file. Lines 42 and 43 build the parser. Starting from the inside, reader/top-level
is the grammar that specifies all the possible expressions that can appear at the top level of a Toccata source file. In Toccata, map
can be implemented for any data type, not just collections. Ordinarily, reader/top-level
specifies that an AST node is the result of parsing a top level expression. However, calling map
on it creates a new grammar that applys the write
function to the parsed AST node and produces the result when a top level expression is parsed. But in this case, write
sends strings to STDOUT, so we don’t really care about the result. Yes I know it’s not pure, so sue me.
Then grammar/none-or-more
takes the grammar to parse (and write) a single expression and produces a new grammar that says “parse until you can’t parse no more”. Finally, parse/parser
produces an actual parsing function and assigns it to extract-asts
.
And now, the top level, the main event, where it all comes together and the work gets done. And it’s very anti-climatic. The parameter to the main function is a list of strings that come from the command line. The first value is always the name of the file being interpreted. In this case, this list is destructured and the second value, which should be the name of a Toccata file, is bound to file-name
. This file is slurp
ed into a string which is passed to the parser along with a hash-map of values the parser uses to track progress. We don’t really use that info in this application, but might later.
And that’s it. Next time, I intend to show how extend this script to produce an HTML file of documentation.