Chronicles of a Restless Mind

Wednesday, 2013-10-30

The Proto Interpreter for J

After spending another evening or two I incorporated a bunch of corrections into a better write-up; you probably want to read that instead.

This comes from Appendix A of An Implementation of J by Roger Hui via Eugene Wallingford's post at Knowing and Doing. He commented that I think it will take me a week of hard work to grok this code and...well...that felt like a challenge to see how fast I could work through the code (about two and a half hours, it turns out). I wrote it up as I went along, so here's my attempt at commentary.

Of course, I haven't tried actually using the language, so I can't really say I grok it. It does (amazingly enough) compile on a recent gcc C compiler, so I'll have to play around with it.

Here we go:

Everything is a single letter—digits 0-9, variable names (‘a’-‘z’), and operators (‘+’, ‘{’, ‘~’, ‘<’, ‘#’, and ‘,’). Operators can have both a binary (infix) and a unary version, though currently there are only five operators of each type (and six operator characters). '=' is used for assignment and is built into the interpreter.

The single data type is an array (supposed to be 0 to 3 dimensional AFAICT).

I suspect it's fairly useless as it has no arithmetic or logical operations other than addition (not even negation). And I don't see any way to define functions or procedures or whatever. (10/31: also no parenthesis or other mechanisms for grouping—it's strict left-to-right evaluation with no operator precedence, so you have to assign any partial results to a variable and then use them.) But it's a neat example of an interpreter.

Some abbreviations to support the terse style:

typedef char C;typedef long I;
#define P printf
#define R return

The Array data type:

typedef struct a{
	I t,r,d[3],p[2];

Function prototypes (in K&R style?) for unary and binary operators. That's f(A w) and f(A a, A w).

#define V1(f) A f(w)A w;
#define V2(f) A f(a,w)A a,w;

This is totally straightforward: do x n times.

#define DO(n,x) {I i=0,_n=(n);for(;i<_n;++i){x;}}

Allocate memory (assuming 4-byte integers), and copy n integers from destination to source (odd argument order...I'm guessing it mirrors the assembly-language for whatever his preferred CPU was...).

I *ma(n) { R(I*)malloc(n*4); }
mv(d,s,n) I *d,*s;{

Get the size of an array (number of integers), multiplying the r dimensions from d. I have no idea why it's called tr.

tr(r,d)I *d;{
	I z=1;
	R z;

Get Array”? Allocate a new A from the r dimensions at d, setting its t, r, and d parameters to the arguments given. Note that the allocation size ignores the p[2] field of the A structure.

Note also that tr(0,0) = 1, so this always gives an array with at least one value. It uses 0-dimensional values for scalars, so this is important.

A ga(t,r,d)I *d;{
	A z=(A)ma(5+tr(r,d));
	z->t=t, z->r=r, mv(z->d,d,r);
	R z;

V1(iota) declares a function iota(A w). This treats w as a single integer and returns an array whose values are the sequence 0..n-1. Note that the i there is the loop index provided by the DO macro.

V1(iota) {
	I n=*w->p;
	A z=ga(0,1,&n);
	R z;

Add two arrays element by element, allocating a new array for the result.

V2(plus) {
	I r=w->r, *d=w->d, n=tr(r,d);
	A z=ga(0,r,d);
	DO(n, z->p[i]=a->p[i]+w->p[i]);
	R z;

Remember the V2 prototype: this is from(a, w). It treats a as a single integer and returns w[a]. Basically it drops the first dimension of w and uses the remaining dimensions to compute the size of the new array.

V2(from) {
	I r=w->r-1, *d=w->d+1, n=tr(r,d);
	A z=ga(w->t,r,d);
	mv(z->p, w->p + (n * *a->p), n);
	R z;

Return a single-element array containing a pointer to w

V1(box) {
	A z=ga(1,0,0);
	R z;

Concatenate a and w, as you would expect. Note that it uses the t from the second argument, but you should presumably never use it on arrays with different types, so it shouldn't be a problem.

V2(cat) {
	I an=tr(a->r,a->d), wn=tr(w->r,w->d), n=an+wn;
	A z=ga(w->t,1,&n);
	R z;

The first parameter, a, gives dimensions (a scalar or a one-dimensional array) for a new array. The new array is initialized from the contents of w, which are repeated as necessary to fill the space. Again, I don't understand the name...

Edit 10/31: rsh(a, w) reshapes w to the dimensions given in a.

V2(rsh) {
	I r=a->r?*a->d:1, n=tr(r,a->p), wn=tr(w->r,w->d);
	A z=ga(w->t, r, a->p);
	mv(z->p, w->p, wn=n>wn?wn:n);
	if(n-=wn) mv(z->p+wn,z->p,n);
	R z;

Get the dimensions (edit 10/31: that is, the shape) of an array.

V1(sha) {
	A z=ga(0,1,&w->r);
	R z;

The identity function, obviously.

V1(id) { R w; }

The size of an array (number of elements, i.e. the first dimension). This gives 1 for a scalar (instead of 0), but should otherwise be the same as the 0th element of sha(w).

V1(size) {
	A z=ga(0,0,0);
	R z;

Print integers, newlines, and arrays. Note that this is what tells us the function of the t field in the array header: if it is set, we treat the values as arrays and print them recursively, otherwise we print them as integers.

pi(i) { P("%d ",i); }
nl() { P("\n"); }
pr(w) A w; {
	I r=w->r,*d=w->d, n=tr(r,d);
	if(w->t) DO(n, P("< "); pr(w->p[i]))
	else DO(n, pi(w->p[i]));

The character table (operator names) and the vtables. Each operator has corresponding two- (vd) and one-argument (vm) functions.

10/31: That's “verb table”, I think. I also believe that the function-pointer tables are for “ditransitive” and “monotransitive” verbs...?

C vt[]="+{~<#,";

The state table holds the values of the variables (‘a’ through ‘z’)

I st[26];

Is the token a variable name? (edit 10/31: I think that he thinks of variables as pronouns, because they refer to a value (noun). So this is an abbreviation for “question-pronoun”?)

qp(a) { R a>='a' && a<='z'; }

Is the token an operator (vtable index)? (edit 10/31: “question-verb”) This is an extremely sloppy check, as we will then use the value to index into the vm table (which has 7 elements), but most of this code does no checking at all, so that's ok.

qv(a) { R a<'a'; }

Take a tokenized string which represents an expression and execute it recursively. Returns an array result.

We split the string, giving the value of the first token and the tail of the expression.

If value is a variable name, then (if the next token is ‘=’) we set the variable to ex(tail), otherwise we resolve the name to its value and continue.

Now, if the value is an operator, then do vm[value](ex(tail)). Otherwise, we look at the next token (call it op) and the new tail. If op is null, return value, otherwise do vd[op](value, ex(tail)).

A ex(e) I *e; {
	I a=*e;
	if(qp(a)) {
		if(e[1]=='=') R st[a-'a']=ex(e+2);
		a=st[ a-'a'];
	R qv(a) ?
		(*vm[a])(ex(e+1)) :
		e[1] ? (*vd[e[1]])(a,ex(e+2)) : (A)a;

When tokenizing a string, a noun is a scalar from 1 to 9 (0 means failure). (10/31: oops, 0 is included, too. An integer 0 means failure, a pointer to a scalar A containing 0 is fine.) A verb is a valid operator character (as a 1-based index into the function tables). Other characters are passed through unchanged.

noun(c) {
	A z;
	if(c<'0'||c>'9') R 0;
	R z;

verb(c) {
	I i=0;
	for(;vt[i];) if(vt[i++]==c) R i;
	R 0;

I *wd(s)C *s; {
	I a, n=strlen(s), *e=ma(n+1);
	C c;
	DO(n, e[i] = (a=noun(c=s[i])) ? a : (a=verb(c))?a:c );
	R e;

Then main is the obvious read/tokenize/execute/print loop.

main() {
	C s[99];
	while(gets(s)) pr(ex(wd(s)));

--Josh Grams <>