Introduction To StrongForth

Preface

This introduction to StrongForth has been written for those who already have collected some experience with Forth. Although StrongForth is as close to ANS Forth as possible, it is not required that the reader has worked with an ANS compliant Forth system.

The basic idea behind StrongForth is the wish to add strong static type checking to a Forth system. Previous Forth systems and standards (including ANS) were supposed to be typeless or untyped, which means they do not do any type checking at all. The interpreter and the compiler generally accept any word to be applied to the operands on the data and return stack. This behaviour grants total freedom to the programmer, but on the other side it is rather often a reason for type errors, which frequently cause system crashes and other more or less strange behaviour throughout the whole development phase.

StrongForth does not guarantee bug-free programs. It does not even grant the absence of crashes. But type errors will be greatly reduced. Furthermore, since interpreter and compiler know about the data types of the operands on the stack, they are able to chose the appropriate version of a word, if the dictionary contains several words with the same name, but different input parameter types. This is called operator overloading. As will be shown in this introduction, operator overloading allows a much more comfortable way of programming. Additionally, it is no longer necessary for you to invent individual names for words with the same semantics, but different data types.

Of course, strong static typing has some drawbacks, which might keep traditional Forth programmers from using it. First, it requires a higher degree of discipline, because all words having stack-effects have to be provided with precise stack diagrams. Second, interpreter and compiler will prohibit not only dirty tricks, but sometimes also just unusual operations. For example, adding a flag to an address is not possible, although it might be useful in some cases. And third, relying on a system that does all the type-checking itself, might lead to more careless programming.

Nevertheless, the advantages and disadvantages of strong static type checking have already been discussed in the Forth community. The availability of StrongForth will certainly put more practical aspects into the previously rather theoretical discussion, allowing you to simply try it out by yourself.

First Steps

Let's begin with a few examples out of the first chapter of Leo Brodie's famous textbook Starting Forth:

15 SPACES                 OK

When interpreting the number 15, the interpreter pushes this value on the data stack and remembers that it is an unsigned single integer. SPACES is a word that requires an unsigned single integer as input parameter. Here's a possible definition of SPACES:

: SPACES ( UNSIGNED -- )
  0 ?DO SPACE LOOP ;

Well, this is not very exciting. At a first look, the only more or less interesting thing about it is the stack diagram. Standard Forth systems use ( n -- ), which is nothing but a comment. In StrongForth, it is interpreted source code, which compiles the stack diagram of SPACES into the dictionary. Additionally, it tells the compiler, that the definition starts with an item of data type UNSIGNED on the data stack, and is expected to remove this item on exiting. Generally, each word in the dictionary includes full information about its stack effect.

So let us now try a second example:

42 EMIT * OK

EMIT is a word that expects a number on the stack and displays the ASCII character associated with this number. We can also write

CHAR * EMIT * OK

instead, because a character is some kind of a number. Even the following code works well:

CHAR * . * OK

But wait ... Isn't . supposed to display a number, and not a character? Let's see:

42 . 42  OK

Yes, this still works. But how does . know whether it should print a number or an ASCII character? StrongForth actually provides more than one version of .. There's one version for displaying numbers, and there's one version for displaying characters. The interpreter and the compiler take care of selecting the version that is suited best for the purpose. In this case, a number is displayed as a number, and a character is displayed as a character. When we write 42, the interpreter pushes 42 onto the data stack and keeps in mind that this is a number. When we write CHAR *, the interpreter pushes exactly the same value onto the stack, but this time it makes a note that the item on top of the stack is a character. This note later allows the interpreter to select the correct version of .. EMIT doesn't make this difference. It displays each and every parameter as an ASCII character.

There are several other versions of . in StrongForth's dictionary. Just have a look at these:

3 4 = . FALSE  OK
-16 . -16  OK

In this example, = takes the two items of data type UNSIGNED and returns an item of data type FLAG. A special version of . for flags delivers the appropriate result. The second example seems to be straight-forward, but it is not. Remember that 15, 42, 3 and 4 produced items of data type UNSIGNED. -16 produces an item of data type SIGNED, and the interpreter finds a version of . suited for signed numbers. To enter a positive signed number, you have to precede it with a sign, for example +16. The advantage of distinguishing between signed and unsigned numeric literals becomes obvious when we try larger numbers:

4000000000 . 4000000000  OK
+4000000000 . -294967296  OK

A standard 32-bit Forth system would always display -294967296, because it can not distinguish signed and unsigned numbers. You'd have to explicitly use U. in order to display 4000000000 as an unsigned number.

With the knowledge obtained so far, let's try out the compiler, still sticking to the examples in Leo Brodie's Starting Forth:

: STAR [CHAR] * . ;  OK
STAR * OK
CR
 OK
CR STAR CR STAR CR STAR
*
*
* OK
: STARS 0 DO STAR LOOP ;
: STARS 0 DO ? undefined word
UNSIGNED
  OK

Oops. What's that? DO tried to compile its runtime semantics, which expects two numbers of the same data type on the stack, but there was only one. Thus, the compiler could not find an appropriate runtime word DO in the dictionary, and throws an exception. Yes, we have to supply a stack diagram to STARS:

: STARS ( UNSIGNED -- ) 0 DO STAR LOOP ;  OK
5 STARS ***** OK
STARS
STARS ? undefined word

So, the compiler starts with an UNSIGNED on the stack, adds another one (0), and now DO's runtime word gets its input parameters. The last line just shows that STARS will not be found in the dictionary, if the stack is empty.

Finally, let's complete Leo Brodie's example:

: MARGIN CR 30 SPACES ;  OK
: BLIP MARGIN STAR ;  OK
: BAR MARGIN 5 STARS ;  OK
: F BAR BLIP BAR BLIP BLIP CR ;  OK
F
                              *****
                              *
                              *****
                              *
                              *
 OK

Data Types

In the previous section, we have introduced four data types: UNSIGNED, SIGNED, CHARACTER, and FLAG. Actually, StrongForth knows a lot more data types, and it is even possible to define new, application-specific data types.

Data Type Structure

Having several different data types is certainly useful, but a large, unstructured quantity of data types would cause a serious problem. Since it should be possible to apply words like DUP and DROP to almost every data type, it would be necessary to supply a separate version of these words for each of them. Words with two input parameters, like SWAP, would have to be defined for each possible combination of data types, which makes already 1024 versions for 32 data types! ROT would be even worse.

To solve this problem, StrongForth arranges all data types in a hierarchical structure. There are four data types at the root of this hierarchy: SINGLE. DOUBLE, TUPLE, and SYS. All other data types are direct of indirect subtypes of these four so-called anchestor data types. The complete data type structure looks like this:

SINGLE
|
+-- INTEGER
|   |
|   +-- UNSIGNED
|   |
|   +-- SIGNED
|   |
|   +-- CHARACTER
|
+-- ADDRESS
|   |
|   +-- CADDRESS
|
+-- LOGICAL
|   |
|   +-- FLAG
|
+-- DEFINITION
|
+-- TOKEN
|   |
|   +-- SEARCH-CRITERION
|
+-- FILE
|
+-- FAM
|
+-- WID
|
+-- R-SIZE
|
+-- CONTROL-FLOW

DOUBLE
|
+-- INTEGER-DOUBLE
|   |
|   +-- UNSIGNED-DOUBLE
|   |	|
|   |	+-- NUMBER-DOUBLE
|   |
|   +-- SIGNED-DOUBLE
|
+-- DATA-TYPE
    |
    +-- STACK-DIAGRAM

TUPLE
|
+-- INPUT-SOURCE

SYS
|
+-- ORIG/DEST
|   |
|   +-- ORIG
|   |
|   +-- DEST
|
+-- COLON-SYS
|   |
|   +-- DOES-SYS
|
+-- DO-SYS
|
+-- CASE-SYS
|
+-- OF-SYS

Whenever the interpreter or compiler tries to find a word in the dictionary, it accepts not only a word whose input parameters match the data types of the items on the stack exactly, but also a word whose input parameters are parents of those. Thus, only two versions of DUP and DROP are required for data types SINGLE and DOUBLE and all their respective subtypes. If, for example, the item on top of the data stack has data type UNSIGNED, the version of DUP for data type SINGLE would match, because UNSIGNED is a (second-generation) subtype of SINGLE. Similarly, four versions of SWAP and eight versions of ROT (instead of 22ł = 10648) are enough:

SWAP ( SINGLE SINGLE -- )
SWAP ( SINGLE DOUBLE -- )
SWAP ( DOUBLE SINGLE -- )
SWAP ( DOUBLE DOUBLE -- )

ROT ( SINGLE SINGLE SINGLE -- )
ROT ( SINGLE SINGLE DOUBLE -- )
ROT ( SINGLE DOUBLE SINGLE -- )
ROT ( SINGLE DOUBLE DOUBLE -- )
ROT ( DOUBLE SINGLE SINGLE -- )
ROT ( DOUBLE SINGLE DOUBLE -- )
ROT ( DOUBLE DOUBLE SINGLE -- )
ROT ( DOUBLE DOUBLE DOUBLE -- )

Well, these are already a lot of versions for ROT, but remember that only two of these eight versions, the first and the last, are defined in ANS Forth (the last one is actually 2ROT). And finally, ROT is one of very few words in StrongForth with so may different versions.

Integers

Now, let's have a closer look at the data type structure. Some of the data types seem familiar to those explicitly specified in ANS Forth: UNSIGNED is u, SIGNED is n and CHARACTER is char. These three data types are subtypes of data type INTEGER, and INTEGER itself is a direct subtype of SINGLE. INTEGER is rarely used explicitly, but it is most useful as a common parent to the three data types. For example,

ALLOT ( INTEGER -- )

can be applied to items of all three data types, without having to define separate versions, but it may not be directly applied to addresses or flags. An even better example might be

+ ( INTEGER INTEGER -- INTEGER )

but this word is actually defined in a different way, as will be explained later.

Addresses

An ADDRESS is not the same as an INTEGER, because an address may not be added to another address (only subtracted, giving an INTEGER). There are several other restrictions, like multiplication, but also some special features that only apply to addresses.

One of the many ANS Forth words returning an address is BASE:

BASE @ . 10  OK

Okay, that worked as expected. Of course, the address returned by BASE has to be of data type ADDRESS. But how does @ know that the address on top of the data stack is the address of an unsigned single number? Obviously, the interpreter chose the correct version of . to display an unsigned number. Let's try something else:

STATE @ . FALSE  OK

Same question: How does @ know ...? The easiest way to get an answer is to get acquainted with StrongForth's version of .S:

45 -3 TRUE CHAR B .S UNSIGNED SIGNED FLAG CHARACTER  OK

Surprise, surprise! Instead of displaying the data values, .S shows the data types of the items on the stack. Well, what else did you expect from a strongly typed system? The information on data types is in many cases more useful than the actual numerical values. Now things are getting exciting:

DROP DROP DROP DROP \ CLEAN STACK  OK
BASE .S ADDRESS -> UNSIGNED  OK

What does that mean? UNSIGNED, FLAG, CHARACTER, DATA and so on are so-called basic data types. DATA -> UNSIGNED is a compound data type, meaning a data address pointing to an unsigned single number. Since StrongForth is strongly typed, addresses have to be specific in the sense that addresses to different data types have to be distinguishable. The rest is easy to understand:

@ .S UNSIGNED  OK
. 10  OK

When @ is supplied with an item of data type DATA -> UNSIGNED, it knows that is has to fetch a single cell from memory and return it as UNSIGNED..

But what about variables? An ANS Forth variable is typeless. It can store anything from a signed number to an execution token. In StrongForth, the word VARIABLE has to be supplied with information about the data type that is supposed to be stored in it. This information can easily be supplied by doing a small modification to the semantics of VARIABLE. In StrongForth, VARIABLE initializes the just created variable with the value of the item on top of the data stack, while simultaneously assuming its data type:

CHAR C VARIABLE X  OK
X .S @ . ADDRESS -> CHARACTER C OK

An item to be stored into a variable must always have exactly the same data type as the one with which the variable had been initialized:

CHAR D X !  OK
X @ . D OK
-13 X !
-13 X ! ? undefined word
SIGNED ADDRESS -> CHARACTER

The error message means that the interpreter cannot find a word with the name ! that accepts the two input parameters SIGNED and DATA -> CHARACTER. Note that the second line of an error message always displays the data types of the items on the data stack at the time the error was detected.

To show how powerful the concept of compound data types is, let's continue playing with variables:

X VARIABLE Y  OK
Y .S ADDRESS -> ADDRESS -> CHARACTER  OK
@ .S ADDRESS -> CHARACTER  OK
@ .S CHARACTER  OK
. D OK

Thus, a compound data type can consist of an arbitrary number of basic data types chained by ->. It is therefore possible to store addresses of specific items in variables and generally operate with addresses of addresses of addresses and so on.

CADDRESS is a subtype of ADDRESS that points to items of character size. To allow fetching and storing characters from and to character sized memory locations, without having to remember each time to use the ANS Forth words C@ and C! for this purpose, an address of data type CADDRESS tells the interpreter and the compiler that it points to a character sized item. C@ and C! do not exist in StrongForth. Instead, special versions of @ and ! dealing with CADDRESS addresses are provided.

CADDRESS ist not only useful in connection with characters, but also with other integers or with logical values. A large array of integers can thus efficiently be stored in memory as character size values, provided all elements of the array can be represented by 8 bits. With addresses, this would obviously not work. Furthermore, special versions of @ for the data types CADDRESS -> SIGNED and CADDRESS -> FLAG are supplied, which do the proper sign extension.

A good example of CADDRESS addresses can be constructed with PAD and TYPE:

CHAR F PAD !  OK
CHAR O PAD 1+ !  OK
CHAR H PAD 4 + !  OK
CHAR T PAD 3 + !  OK
CHAR R PAD 2 + !  OK
PAD .S CADDRESS -> CHARACTER  OK
5 TYPE FORTH OK

Logicals

An item of data type LOGICAL is just a collection of individual bits in a single cell. Naturally, such items may not be involved in arithmetic operations like +, -, *, / or NEGATE. On the other hand, logical operations like AND, OR, XOR and INVERT may only be applied if the top item on the stack is of data type LOGICAL:

HEX 1234 55AA OR .
HEX 1234 55AA OR ? undefined word
UNSIGNED UNSIGNED
HEX 1234 55AA CAST LOGICAL OR . 57BE  OK
DECIMAL  OK

CAST performs an explicit type cast of the item on top of the data stack to an arbitrary data type.

Performing a logical operation on an integer is not necessarily dangerous, but you have to pay attention, because it is a programming trick. Doing the same operation with /, MOD and /MOD is in most cases clearer, because these operations fit better to the nature of integers. For example, lets assume >IN contains an offset to a block of 16 lines, each 64 characters long. Now we want to return to the beginning of the current line. Here are two alternatives, that both work well in StrongForth:

: RETURN >IN @ -64 CAST LOGICAL AND >IN ! ;  OK
: RETURN >IN @ DUP 64 MOD - >IN ! ;  OK

Considering that the type cast does not compile any code, the first version is probably faster. Now let's assume the line length is an unsigned constant with the value 64:

64 CONSTANT C/L  OK
: RETURN >IN @ C/L NEGATE CAST LOGICAL AND >IN ! ;  OK
: RETURN >IN @ DUP C/L MOD - >IN ! ;  OK

This will still work perfectly in both cases. Unless we decide to set the line length to 72, or any other number that is not a power of 2 ...

Of course, LSHIFT and RSHIFT are also words that can only be applied to items of data type LOGICAL. Integers should use *, /, 2* and 2/. Again: logical operations are for logicals, arithmetic operations are for integers.

Finally, FLAG is a subtype of LOGICAL. This secures that all logical operations can directly be applied to flags, i. e., without the ugly type casts. A FLAG is just a LOGICAL with all bits set to the same value.

Definitions And Tokens

An item of data type DEFINITION is the identifier of a word in StrongForth's dictionary. It will be produced by words like ' and :NONAME, while words like >BODY expect one of it as input parameter. Unlike in ANS Forth, ' and :NONAME do not directly deliver execution tokens. There are several reasons for this. The most important one is, that an execution token can not directly be executed, because it has no information about the stack diagram of the definition associated with it. More information about this rather complicated subject will be supplied later in connection with a detailed explanation of EXECUTE.

For now, the most interesting thing about DEFINITION is, that StrongForth provides another overloaded version of . for it. No, this is not the StrongForth synonym for SEE. It just displays the name and the complete stack diagram of a definition. Here are some examples:

' PARSE-WORD . PARSE-WORD ( -- CADDRESS -> CHARACTER UNSIGNED )  OK
' NUMBER . NUMBER ( CADDRESS -> CHARACTER UNSIGNED -- INTEGER-DOUBLE DATA-TYPE )  OK
' >BODY . >BODY ( DEFINITION -- ADDRESS )  OK

However, the existence of data type DEFINITION does not mean that there are no execution tokens in StrongForth. Items of data type TOKEN are exactly what is abbreviated with xt in ANS Forth. A definition's execution token can be obtained with the word >TOKEN, but only few words actually use it. In particular, it is not possible to directly EXECUTE a bare execution token. Actually, you can only execute qualified execution tokens like items of data type SEARCH-CRITERION. You'll find a detailled explanation of this mechanism at the end of this introduction.

Files

Data type FILE is used for file handles. It is directly derived from data type SINGLE, because this excludes arithmetic and logical operations to be applied to file handles. The value returned by SOURCE-ID has data type FILE as well. Two special values of this data type, 0 and -1, indicate that the input source is the user input device or a character string, respectively. All other values are real file handles.

Tightly related to files is data type FAM, which is also directly derived from data type SINGLE. FAM stands for file access method. Words creating and opening files expect a parameter of this data type on the stack. R/O, W/O and R/W are system dependent constants of data type FAM.

Wordlists

WID is an abbreviation for word list identifier. Although this term is first introduced in the ANS Forth Search-Order word set, items of data type WID are already used in StrongForth's Core word set. Even without the Search-Order word set, StrongForth knows three different word lists:

The Forth word list is the usual dictionary, which contains all words that are a part of StrongForth.
The Local word list contains locals, which are dynamically defined during compilation of a new word. Locals are automatically discarded after the definition is finished.
The Environment word list contains one definition for each environment query string. Using a separate word list for environment query strings simplifies the implementation of environment queries and new string definitions.

R-Sizes

R-SIZE is a special data type which is only used by >R and R>. A detailed description follows later in connection with the explanation of the return stack and locals.

Control Flow

Items of data type CONTROL-FLOW are involved whenever control-flow words are executed during compilation:

IF AHEAD BLSE THEN
BEGIN AGAIN UNTIL WHILE REPEAT
CASE OF ENDOF ENDCASE
DO ?DO LOOP +LOOP

The ANS Forth host system manages the compilation of control-flow words using the special data types ORIG, DEST, CASE-SYS, OF-SYS and DO-SYS, which are all direct or indirect subtypes of the ancestor data type SYS. StrongForth, however, has an additional task to accomplish, which is ensuring data type consistency. For this purpose, a copy of the complete compiler data type heap is created whenever a new program block starts, e. g., at the beginning of an IF branch or a DO loop. An item of data type CONTROL-FLOW is a handle to the saved contents of the compiler data type heap. In order to avoid interference with the data structures created by the host system, these handles must be kept on a dedicated stack. But unless you don't intend to implement your own control-flow words, it is not necessary for you to bother about items of data type CONTROL-FLOW. You won't even see them in the stack diagrams of control-flow words. But you can see what they do whenever you try to compile code containing unbalanced conditional branches or loop bodies. For example, the two branches of an IF ... ELSE ... THEN conditional clause must have exactly the same stack effect with respect to data types:

: MISTAKE ( UNSIGNED -- )
  DUP 5 > IF ." GREATER THAN 5" ELSE . THEN ;
  DUP 5 > IF ." GREATER THAN 5" ELSE . THEN ? data types not congruent
: CORRECT ( UNSIGNED -- )
  DUP 5 > IF ." GREATER THAN 5" DROP ELSE . THEN ;  OK

Double Numbers

All members of the DOUBLE branch of the data type structure occupy two cells on the data stack and in memory. This is not new to Forth. The big difference between StrongForth and ANS Forth regarding double numbers is the fact, that ANS Forth requires special names for those words that deal with double numbers, while StrongForth simply overloads the corresponding single number words. To duplicate two double numbers, one has to write 2DUP in ANS Forth and DUP in StrongForth. Adding two double numbers is done with D+ in ANS Forth and + in StrongForth, as can be seen in this example:

6000000000. DUP + . 12000000000  OK

. is overloaded as well. In ANS Forth, we would have to write D.. Overloading makes programming a lot easier. Actually, the complete StrongForth Double-Number word set consists of overloaded words. Since interpreter and compiler know about the data types of the items on the data stack, they will always select the proper words.

In analogy to single numbers, StrongForth provides the predefined data types INTEGER-DOUBLE, SIGNED-DOUBLE and UNSIGNED-DOUBLE. The number 1000000. in the above example is an UNSIGNED-DOUBLE. When prefixed with a positive or negative sign, it will be interpreted as SIGNED-DOUBLE.

A new data type is NUMBER-DOUBLE. It is only used between <# and #>, i. e., <# creates an item of this data type, while #> consumes it:

<# ( UNSIGNED-DOUBLE -- NUMBER-DOUBLE )
#> ( NUMBER-DOUBLE -- CDATA -> CHARACTER UNSIGNED )

This is an easy way to ensure that these two words are always paired. Since # and #S also work with items of data type NUMBER-DOUBLE, syntax violations will immediately be detected by the compiler. You've already seen a similar technique being used with the control-flow words. As an example using <#, #, #S and #>, here is a possible definition of . for signed double numbers:

: . ( SIGNED-DOUBLE -- )
  DUP 0< SWAP ABS
  <# #S SWAP SIGN #> TYPE SPACE ;

Note that SIGN, other than in ANS Forth, requires an item of data type FLAG as input parameter.

Data Types

Another subtype of DOUBLE is DATA-TYPE. An item of data type DATA-TYPE is, well, a data type. Words using items of this data type as input or output parameters are extensively used by the interpreter and the compiler. One of the most obvious applications of DATA-TYPE is NUMBER. Although NUMBER is not a word specified in ANS Forth, most Forth systems have more or less identical versions of it. In StrongForth, NUMBER looks like this:

NUMBER ( CADDRESS -> CHARACTER UNSIGNED -- INTEGER-DOUBLE DATA-TYPE )

The input parameters of NUMBER are the memory address of the first character of a character string and the character count, which can together be considered as representing a character string. Provided the character string contains a valid number, NUMBER returns its numerical value in an item of data type INTEGER-DOUBLE, and its data type in an item of data type DATA-TYPE. Depending on the character string, the latter one can be one of these: SIGNED, UNSIGNED, SIGNED-DOUBLE, and UNSIGNED-DOUBLE. Remember that numbers consisting only of figures are UNSIGNED, while prefixing them with a sign makes them SIGNED. Appending a period makes a number either UNSIGNED-DOUBLE or SIGNED-DOUBLE, depending on the presence of a sign. Let's try it:

PARSE-WORD +5 NUMBER . . SIGNED 5  OK
PARSE-WORD 102030405060708090. NUMBER . . UNSIGNED-DOUBLE 102030405060708090  OK

PARSE-WORD simply gets the next space-delimited word from the input source and returns its address and character count, so NUMBER can directly be applied to PARSE-WORD's output parameters. But there's another interesting detail hidden in this example. Since the data types are displayed in a user-readable form, there must be an overloaded version of . that takes an item of data type DATA-TYPE as input parameter. This word is also used within the definition of .S.

DATA-TYPE has itself a subtype called STACK-DIAGRAM. Items of data type STACK-DIAGRAM pass on some tracking information during the creation of a stack diagram between ( and ), for example, the number of basic data types that have so far been added to the stack diagram, and whether input or output parameters are being generated. Although a stack diagram is not a data type, STACK-DIAGRAM has still been made a subtype of DATA-TYPE, because the internal structure of items of these two data types is quite similar.

Tuples

According to the diagram of data types, data type TUPLE is not a subtype of SINGLE or DOUBLE. Does this mean that tuples are neither one nor two cells long? Correct. So, what size does a tuple have? Well, it depends. A tuple is a data type whose size may vary at runtime, and can thus not be determined at compile time. In fact, you can combine an arbitrary number of single-cell and double-cell items in one item of data type TUPLE, and handle them as an entity. The size of a tuple is an attribute depending on the number of cells that have been added to it.

There are only a small number of words that can be applied to tuples. You can create an empty tuple, add items to it, extract items from it, query its size, and drop it. Other operations may optionally be defined, but they are not included in StrongForth.

What are tuples good for? Primarily, they are required for implementing the ANS Forth words SAVE-INPUT, RESTORE-INPUT, GET-ORDER and SET-ORDER. The stack diagrams of these words in the ANS Forth specification contain the following sequence:

x1 ... xn n

StrongForth cannot deal with stack diagrams that consist of an arbitrary number of parameters. Using a tuple instead of this sequence resolves the problem, because a tuple can be represented by a single well-defined data type. The image of a tuple on the data stack is exactly the same as the above sequence, with an arbitrary number of cells and the count on top of the stack. Thus, the stack diagram of SAVE-INPUT and RESTORE-INPUT can easily be expressed in StrongForth:

SAVE-INPUT ( -- INPUT-SOURCE )           \ ANS: ( -- xn ... x1 n )
RESTORE-INPUT ( INPUT-SOURCE -- FLAG )   \ ANS: ( xn ... x1 n -- flag )

INPUT-SOURCE is a direct subtype of TUPLE. Using INPUT-SOURCE instead of TUPLE enforces that SAVE-INPUT and RESTORE-INPUT are always used in pairs.

User-Defined Data Types

33 predefined data types seems to be quite a lot. Anyway, there will be situations in which you wish to add some more, application-specific data types to the list. A typical situation is when you want to ensure that certain words are always used according to a specific syntax. Typical applications of this technique are the pre-defined data types NUMBER-DOUBLE and R-SIZE. Or let's assume you want to create an abstract data type with a well-defined set of operations. For example, items of a data type called BCD shall contain BCD coded numbers with the size of two cells. In order to make DUP, DROP, SWAP, OVER, ROT, NIP and TUCK available for the new data type, it makes sense to make it a subtype of data type DOUBLE:

DT DOUBLE PROCREATES BCD  OK

That's all. You now have a new data type called BCD and can start defining arithmetic operations on it, along with an overloaded version of . and other operations you need. DT DOUBLE returns DOUBLE as an item of data type DÁTA-TYPE. It works very similar to the other parsing words ' and CHAR, and StrongForth even provides an immediate variant [DT] to be used during compilation.

In this example, DOUBLE is the parent of BCD. You can determine the parent of any data type with PARENT, and the ancestor with ANCESTOR:

DT BCD PARENT . DOUBLE  OK
DT FLAG PARENT . LOGICAL  OK
DT FLAG ANCESTOR . SINGLE  OK

If you want to define a new data type that does not inherit any operation from an already existing data type, you can define a new ancestor data type by providing PROCREATES with a null data type:

NULL DATA-TYPE PROCREATES QUAD 4 ,  OK

NULL DATA-TYPE is just a shortcut for 0 CAST DATA-TYPE. 4 , tells StrongForth that items of the new ancestor data type have a size of four cells. You thus have created a new data type suitable to hold very large integers. Your next action would typically be to define stack movement words like DUP and DROP for it. But that's left as an exercise for advanced students ...

Stack Diagrams

When experimenting with displaying stack diagrams by using . for definitions, you might have found out that ' always finds the most recent definition in the dictionary that matches the given name. Since many StrongForth words are overloaded, there typically exist multiple occurences of a name in the dictionary. This is a difference to ANS Forth. In order to enable you to display all overloaded versions of a name, let's define a word that searches the complete dictionary:

: WORDS ( -- )
  LATEST PARSE-WORD LOCALS| COUNT ADDR |
  BEGIN DUP
  WHILE COUNT
     IF DUP ADDR COUNT ROT NAME COMPARE 0= ELSE TRUE THEN
     IF DUP CR . THEN PREV
  REPEAT DROP ;

If WORDS is executed without a name to parse, it simply displays all words in the dictionary. But this is not the semantics we're interested in at this point. So here's a first example that displays all overloaded versions of .:

WORDS .
. ( FLAG -- )
. ( DATA-TYPE -- )
. ( DEFINITION -- )
. ( CHARACTER -- )
. ( SIGNED -- )
. ( SINGLE -- )
. ( SIGNED-DOUBLE -- )
. ( DOUBLE -- )  OK

When experimenting with WORDS, you will almost certainly run into some rather strange stack diagrams that look like these:

WORDS DUP
DUP ( DOUBLE -- 1ST 1ST )
DUP ( SINGLE -- 1ST 1ST )  OK

Looking again at the data type structure, you'll find out that 1ST is not one of the predefined data types, neither is 2ND, 3RD and TH in the following examples:

' >NUMBER . >NUMBER ( INTEGER-DOUBLE CADDRESS -> CHARACTER UNSIGNED -- 1ST 2ND 4 TH )  OK
' ACCEPT . ACCEPT ( CADDRESS -> CHARACTER INTEGER -- 3RD )  OK

So, these words obviously must have a special meaning. Let's assume we define XDUP as follows and try it out on an unsigned single number:

: XDUP ( SINGLE -- SINGLE SINGLE ) DUP ;  OK
4 XDUP .S SINGLE SINGLE  OK

Now we have two items of data type SINGLE on the stack instead of two items of data type UNSIGNED. Trying, for example, to add those two items will fail, because + is only defined on INTEGER and ADDRESS, not on SINGLE. That's why we have to use 1ST in the stack diagram of DUP. When interpreting or compiling a word with 1ST as an output parameter, the data type of this parameter will be replaced with the data type of the first actual input parameter:

DROP DROP \ remove the two SINGLE's  OK
4 DUP .S . . UNSIGNED UNSIGNED 4 4  OK
CHAR J DUP .S . . CHARACTER CHARACTER JJ OK
BASE DUP .S . . ADDRESS -> UNSIGNED ADDRESS -> UNSIGNED 4261476 4261476  OK

Now it works as expected. As can be seen in the last line of the example, 1ST also works correctly if the first input parameter has a compound data type. 2ND and 3RD work in a similar way, but reference the second or third basic data type in the input parameter list, respectively. To reference the fourth, fifth, sixth basic data type (and so on), an unsigned number followed by TH has to be used, as in the stack diagram of >NUMBER. This feature is perhaps one of most important keys to strong static typing in StrongForth. Many words use 1ST, 2ND, 3RD and TH in their stack diagrams.

You might have noticed a small but important detail in the explanation of 2ND, 3RD and TH. They do not reference the second (or third ...) input parameter, but the second (or third ...) basic data type in the input parameter list of a stack diagram. The necessity for making this difference becomes clear when having a closer look at the stack diagrams of @:

WORDS @
@ ( CADDRESS -> FLAG -- 2ND )
@ ( CADDRESS -> SIGNED -- 2ND )
@ ( CADDRESS -> SINGLE -- 2ND )
@ ( ADDRESS -> DOUBLE -- 2ND )
@ ( ADDRESS -> SINGLE -- 2ND )  OK

Let's only look at the last one in the list. Although @ has only one input parameter, 2ND references SINGLE, or, more generally, the tail of the compound data type standing for the first input parameter. Thus, when @ is applied to a data address of an unsigned single number, the data type of the output parameter is really that of an unsigned single number. As has been shown in the previous examples with VARIABLE X and VARIABLE Y, it works as expected even if the tail of the referenced input parameter is itself a compound data type.

Another good example is >NUMBER, because this word has quite a lot of parameters:

' >NUMBER . >NUMBER ( INTEGER-DOUBLE CADDRESS -> CHARACTER UNSIGNED -- 1ST 2ND 4 TH )  OK

The first input parameter is of data type INTEGER-DOUBLE, the second one is of data type CDATA -> CHARACTER and the third one is of data type UNSIGNED. Only the second input parameter has a compound data type. When the input parameter list is decomposed into basic data types, we get:

INTEGER-DOUBLE
CDATA
CHARACTER
UNSIGNED

1ST references the first basic data type, which is INTEGER-DOUBLE and nothing else. 2ND references CDATA. But since the basic data type CDATA in this input parameter list is the head of a compound data type, 2ND actually references the whole compound data type, namely CDATA -> CHARACTER. 3RD would reference the third basic data type, CHARACTER, which is the tail of the second input parameter. Finally, 4 TH references UNSIGNED. UNSIGNED is both the third input parameter and the fourth basic data type within the input parameter list.

Now it should be clear how several other words are defined. Have a look at the common arithmetic operators. As a general rule, the data type of the output parameter is the same as that of the first input parameter, thus allowing for example adding an integer to a character and still having a character on the stack afterwards. This should answer the question, why + is not defined as

+ ( INTEGER INTEGER -- INTEGER ) \ wrong!

but as

+ ( INTEGER INTEGER -- 1ST )

The most common application for data type references is in the output parameter list of stack diagrams. But data type references may also be used in the input parameter list, where they have a different meaning. Look at the stack diagrams of the various versions of !:

WORDS !
! ( SINGLE CADDRESS -> 1ST -- )
! ( DOUBLE ADDRESS -> 1ST -- )
! ( SINGLE ADDRESS -> 1ST -- )  OK

Again, it's only the last line we shall investigate. 1ST means here, that the second input parameter is a data address, which points to an item of exactly the same data type as the first input parameter. This is actually a restriction to the interpreter or compiler when trying to find a suitable version of ! in the dictionary. It prevents you from trying to store something into a memory address that doesn't belong there. A simple example might clarify what this means:

CHAR C VARIABLE X  OK
CHAR D X .S CHARACTER ADDRESS -> CHARACTER  OK
!  OK
69 X .S UNSIGNED ADDRESS -> CHARACTER  OK
!
! ? undefined word
UNSIGNED ADDRESS -> CHARACTER

The second ! failed to match, because an unsigned single number may not be stored into a character variable. If for whatever reason you want to do that, you have to use an explicit type cast:

69 CAST CHARACTER X .S CHARACTER ADDRESS -> CHARACTER  OK
!  OK
X @ . E OK

Data Type Heaps

To keep track of the data types of the items on the data stack, StrongForth has two data type heaps. Why two? Isn't there just one data stack? Yes, but we need one data type heap for the interpreter and one for the compiler.

The contents of the interpreter's data type heap can be displayed with .S. The items on the data type heap are mapped one to one to the items on the data stack. If we have three items on the data stack, we also have three data types on the data type heap, which can be either basic or compound data types. Note that having three items on the data stack does not necessarily mean that they occupy three cells. One or more of them can be double numbers or even tuples, so three cells is just the minimum size three items occupy on the data stack. On the data type heap, three data types occupy a minimum of six cells, because DATA-TYPE is a subtype of DOUBLE. If one or more of them is a compound data type, the data type heap can even be higher. There is no fixed limit except the size allocated for the data type heap.

The interpreter's data type heap is only used by the interpreter. There is no explicit type checking at runtime, because this would cause a tremendous performance penalty. That's the main difference between systems with static and dynamic type checking. Instead of doing dynamic type checking at runtime, StrongForth's compiler does static type checking at compile time. The compiler has its own data type heap, where it keeps the data types of the items that will be on the data stack at runtime. Thus, the compiler data type heap at compile time maps to the data stack at runtime.

Since the interpreter is constantly present during compilation, having two separate data type heaps during compilation is a necessity. Immediate words generally use the interpreter data type heap, because they are immediately executed. All other words are compiled, and use the compiler data type heap. Let's view an example:

: TEST 3 4 .S UNSIGNED UNSIGNED
+ .S UNSIGNED
. .S
;  OK

.S is an immediate word. In interpretation state, it displays the contents of the interpreter data type heap. In compilation state, it displays the contents of the compiler data type heap, as in this example. After having compiled two numeric literals, the compiler data type heap contains two times the data type UNSIGNED. + is not immediate. The compiler finds a version of + that accepts two unsigned single integers, and compiles it. It also updates the compiler data type heap by replacing the data types corresponding to the input parameters of + with the data type that corresponds to +'s output parameter, which is UNSIGNED. . is also non-immediate. The compiler finds a version suitable for an unsigned single number and removes the data type of its input parameter from the compiler data type heap. Since . has no output parameter, the compiler data type heap is now left empty. ; is immediate. Before compiling its runtime semantics, it checks that the contents of the compiler data type heap matches the assumed output parameter list of TEST. Both are empty, so there is no error.

Here's a second example:

: COUNTER ( UNSIGNED -- ) 0 DO I . LOOP ;  OK
10 COUNTER 0 1 2 3 4 5 6 7 8 9  OK

By default, a new definition is assumed to have no stack effect. This time, we have specified an explicit stack diagram. ) initializes the compiler data type heap with one item of data type UNSIGNED, so compilation starts with this item. Compiling 0 adds another UNSIGNED, and the runtime code of DO consumes both of them. I pushes UNSIGNED on the data type heap, and . consumes it. LOOP checks that the contents of the compiler data type heap is the same as it was after DO was executed, before compiling its own runtime semantics. Finally ; checks the congruence between the compiler data type heap and the output parameter list of COUNTER.

That's what happens on the compiler data type heap. But what about the interpreter data type heap? We can easily watch it with .S by temporarily switching to interpretation state:

: COUNTER [ .S ] COLON-SYS
( UNSIGNED -- ) [ .S ] COLON-SYS
0 DO [ .S ] COLON-SYS DO-SYS CONTROL-FLOW
I . LOOP [ .S ] COLON-SYS
;  OK

COLON-SYS, which : pushes onto the interpreter data type heap, identifies the current definition. DO pushes two more items onto the data stack and the interpreter data type heap, which are supposed to contain information for LOOP or +LOOP. Both are actually consumed by LOOP, and ; finally consumes COLON-SYS. If we had tried to execute ; before LOOP, the interpreter would not have found it in the dictionary, because ; requires its input parameter COLON-SYS to be on top of the stack.

Other Differences to ANS Forth

This section is a summary of the main differences between ANS Forth and StrongForth, as far as they have not been mentioned yet. These are generally things a beginner should keep in mind when writing applications in StrongForth or when porting programs from ANS Forth to StrongForth.

Comments

Since the ANS Forth word ( is now used to initiate a stack diagram, it is no longer available for starting a comment. Operator overloading doesn't help here, because the ANS version of ( does not have any parameters that could distinguish the two words. But we are lucky. The ANS Forth Core Extensions word set specifies the word \ to skip the rest of the line.

\ has the same semantics in StrongForth, with one small extension: The comment ends before the end of the line, when a second \ is parsed. Of course, this is not exactly the same as the ANS Forth semantics. But it's pretty near. Using different words than ( and ) for stack diagrams, for example braces, would look even more strange. Thus you can write:

3 4 + . \ this does not require a comment 7  OK
' BASE >BODY -> SIGNED \ address of first parameter \ @ ,  OK

Integer Arithmetic

In ANS Forth, words performing arithmetics often exist in a standard version for signed single numbers and in modified versions for unsigned single numbers and for double numbers. The version for unsigned single numbers is generally prefixed with U, and the version for double number is prefixed with D. Since StrongForth allows operator overloading, these prefixes are no longer required. All versions share the same name, as in this example:

WORDS <
< ( SIGNED-DOUBLE 1ST -- FLAG )
< ( INTEGER-DOUBLE 1ST -- FLAG )
< ( ADDRESS 1ST -- FLAG )
< ( SIGNED 1ST -- FLAG )
< ( INTEGER 1ST -- FLAG )  OK

< actually replaces the ANS Forth words <, U<, D<, and DU<. In many cases, overloaded versions are even provided for arithmetic words for which ANS Forth does not specify unsigned and double versions. A good example is /:

WORDS /
/ ( UNSIGNED-DOUBLE UNSIGNED -- 1ST )
/ ( SIGNED SIGNED -- 1ST )
/ ( UNSIGNED UNSIGNED -- 1ST )  OK

Have a look at StrongForth's dictionary to find out about overloaded arithmetic words.

Address Arithmetic

Address arithmetic is strongly restricted in StrongForth. It is generally not possible to perform multiplication, division or negation on addresses, because that yields almost always meaningless results. Neither is it allowed to add two addresses, or to add an address to an integer. So, the only kind of allowed addition in conjunction with addresses is to add an integer to an address.

It looks somewhat differently with subtraction. Of course, it is allowed to subtract an integer from an address, but it is also possible to subtract two addresses, giving an item of data type SIGNED:

- ( ADDRESS 1ST -- SIGNED )

Another interesting feature of StrongForth is the operand size adaption. Whenever an integer is added to an address that explicitly points to a single or double cell, the integer is automatically multiplied by the number of address units per single or double cell before the actual addition takes place. It is for example no longer necessary to explicitly use CELLS before using an integer as an offset to an array of single numbers, as the following example demonstrates:

CREATE ARRAY ( -- ADDRESS -> UNSIGNED ) 3 , 9 , 27 , 81 , 243 ,  OK
ARRAY @ . 3  OK
ARRAY 1+ @ . 9  OK
ARRAY 3 + @ . 81  OK

This feature does not only work with + and -, but with all words doing address arithmetic, like +!, 1+, 1-, +LOOP, and so on. It can easily be seen by having a look at all variants of a word:

WORDS +
+ ( INTEGER-DOUBLE SIGNED -- 1ST )
+ ( INTEGER-DOUBLE INTEGER -- 1ST )
+ ( INTEGER-DOUBLE INTEGER-DOUBLE -- 1ST )
+ ( CADDRESS INTEGER -- 1ST )
+ ( ADDRESS -> DOUBLE INTEGER -- 1ST )
+ ( ADDRESS -> SINGLE INTEGER -- 1ST )
+ ( ADDRESS INTEGER -- 1ST )
+ ( INTEGER INTEGER -- 1ST )  OK

Here, the different versions of + for data types ADDRESS, ADDRESS -> SINGLE and ADDRESS -> DOUBLE show, that these data types are not treated the same way. There's even a version for adding an integer to a character address. Note also that the version for plain addresses comes last. This is obviously the default, which only applies when the previous versions of + (those that were compiled later) failed to match the data types of the items on the parameter stack.

By the way, with this address arithmetic feature at hand, CELL+ and CHAR+ are no longer required. They have both been replaced by proper versions of 1+. A nice side effect is, that the semantics of 1- corresponds to CELL- or CHAR-. These words are not even specified in ANS Forth, but might be useful anyway.

Double-Semantic Words

The ANS Forth words 2!, 2@, 2DROP, 2DUP, 2OVER, 2SWAP and 2ROT have two different semantics. First, they perform the semantics of !, @, DROP, DUP, OVER, SWAP and ROT on double numbers. Second, they apply these words on pairs of single numbers. In StrongForth, double numbers are not just a pair of single numbers. Instead, double numbers are separate data types, which can generally not be interpreted as two single numbers.

The double number semantics of the above mentioned words is implemented in StrongForth by overloading the respective words for single numbers. That's obvious. Keeping 2!, 2@, 2DROP, 2DUP, 2OVER, 2SWAP and 2ROT for pairs of single numbers has been considered, but has finally been dismissed. They can easily be defined anyway. Here are two pretty obvious examples:

: 2DUP ( SINGLE SINGLE -- 1ST 2ND 1ST 2ND )
  OVER OVER ;

: 2DROP ( SINGLE SINGLE -- )
  DROP DROP ;

Words With Ambigous Stack Effects

Strong tatic typing works only if the interpreter and the compiler know about the precise stack effect of each word. Nevertheless, this necessity collides with some ANS Forth words, namely

?DUP
PICK and ROLL
ENVIRONMENT?
FIND
CATCH

The ANS Forth stack diagram of ?DUP is

( x -- 0 | x x )

The stack effect of ?DUP depends on its input parameter at runtime. But since the value of this input parameter is generally not known at compile time, the compiler wouldn't be able to continue after compiling ?DUP. As a result, ?DUP cannot be made available in StrongForth. The most frequent applications of ?DUP are immediately before IF, UNTIL, and WHILE. In the case of IF using ?DUP can make the ELSE branch obsolete. So, the only penalty in StrongForth consists of having the add the ELSE branch in which the superfluous item is dropped..Another possible solution is to use ?IF as a direct replacement for ?DUP IF.

: ?IF ( -- ORIG CONTROL-FLOW )
  POSTPONE DUP
  POSTPONE 0=
  POSTPONE IF
  POSTPONE DROP
  POSTPONE ELSE ; IMMEDIATE  OK
: TEST ( UNSIGNED -- )
  ?IF . ." is a non-zero number" THEN ;  OK
3 TEST 3 is a non-zero number OK
0 TEST  OK

The case of PICK and ROLL is different. But again, since the compiler does not know the value of the input parameter at compile time, it has no means to determine the stack effect of these two words. Like ?DUP, these two words are not available in StrongForth. This is not considered being especially regrettable, because the necessity to get access to items buried deep down in the stack arises almost always as a result of bad factoring. If this can not be avoided, using locals is an alternative to messing around with PICK and ROLL.

In ANS Forth, the data types returned by ENVIRONMENT? depend on the contents of the query string. For a strongly typed system, ENVIRONMENT? definitely is not the ideal means to do an environment query. Nevertheless, ENVIRONMEMT? has been implemented in StrongForth. This is the stack diagram:

ENVIRONMENT? ( CADDRESS -> CHARACTER UNSIGNED -- ADDRESS FLAG )

If the query string exists, FLAG is TRUE and ADDRESS is the address of the value of the queried environment parameter. After casting ADDRESS to a more specifc data type, like ADDRESS -> UNSIGNED for the /PAD environment parameter, the actual value of the parameter can be fetched. If the query string does not exist, FLAG is false and ADDRESS is undefined. Here's a small example:

PARSE-WORD /PAD ENVIRONMENT? . -> UNSIGNED @ . TRUE 256  OK
PARSE-WORD /XYZ ENVIRONMENT? . . FALSE 0  OK

FIND has been replaced by SEARCH-ALL, which has an unambiguous stack diagram. For details please see the separate section on FIND and SEARCH-ALL below.

In ANS Forth, the stack effect of CATCH depends on whether an exception is thrown at runtime:

( i*x xt -- j*x 0 | i*x n ) \ ANS Forth

Of course, for the sake of data type consistency, this had to be changed. StrongForth's version of CATCH always has the same stack effect, depending only on the stack effect of the word whose execution token is given:

( i*x xt -- j*x n ) \ StrongForth

Actually, CATCH is an immediate state-smart word that calculates the resulting stack depth in advance. For details, please see the StrongForth.f glossary.

Return Stack And Locals

Due to the potential dangers of direct return stack manipulations, the usage of >R, R@ and R> is strongly restricted in StrongForth. R@ actually is a local that is created by >R and removed by R>. Thus, >R and R> are immediate words, to be used only during compilation.

Furthermore, >R and R> have to be used pairwise with respect to control-flow structures like IF ... ELSE ... THEN, DO ... LOOP and BEGIN ... UNTIL. For example,

... IF ... >R ... THEN ... R> ...

is not possible, while

... IF ... >R ... R> ... THEN ...

as well as

... >R ... IF ... THEN ... R> ...

are allowed. To implement this restriction, >R leaves an item of a special data type, R-SIZE, on the stack, which is consumed by R>. A disadvantage of prohibiting any direct access to the return stack is, that some special techniques like backtracking can not be implemented in the usual easy (and non-portable) way.

However, even though >R, R@ and R> are present in StrongForth, their usage is discouraged in favour of locals. Locals are a much more powerful tool, especially since StrongForth provides single-cell as well as double-cell locals, which can be mixed arbitrarily within a LOCALS| ... | phrase. Locals can have a descriptive name and do not need to be explicitly removed before exiting the definition. Furthermore, the contents of locals can be changed any time by using TO. The only drawback is that locals have to be definied in one block near the beginning of a definition.

The advantages of using locals over direct return stack access has lead to the decision to implement not only R@, but also the loop indices I and J as locals. This is the reason why they can not be found in StrongForth's dictionary. They are simply created dynamically as locals and are automatically removed by LOOP or +LOOP.

Strings

ANS Forth supports two kinds of character strings. Strings represented by the address of its first character and the character count, and so-called counted strings. Since the use of counted strings is discouraged by the ANS standard, the decision has been made to get totally rid of them in StrongForth. Counted strings are still a part of ANS Forth for historical reasons. StrongForth has no such history and is therefore not bound to existing programming techniques. Porting existing programs to StrongForth will anyway require several modifications to the source code, so the additional effort of exchanging counted strings seems to be tolerable.

Only a few words are affected:

COUNT has been removed without any replacement.
FIND has been replaced by SEARCH-ALL, which expects two separate input parameters (CDATA -> CHARACTER UNSIGNED) instead of a single counted string.
C" has been removed, and S" has been renamed to ".
WORD has been replaced by PARSE from the ANS Forth Core Extension word set, and PARSE-WORD as described in paragraph A.6.2008 of the ANS Forth standard. PARSE-WORD is more or less a replacement for the frequently used sequence BL WORD COUNT.

`EXECUTE`

EXECUTE is one of those words that produce stack effects which are normally not known at compile time. Simply removing EXECUTE is not an option, because this would deprive StrongForth of one of the most powerful features of Forth.

The StrongForth interpreter uses an internal word named (EXECUTE) to execute words. (EXECUTE) expects an execution token as input parameter. It does exactly what EXECUTE does in ANS Forth, i. e., it executes a word without taking regard of possible stack effects. But directly using (EXECUTE) in StrongForth would almost certainly corrupt the data type system. What is required is a version of EXECUTE that has the runtime semantics of (EXECUTE) and takes care of the stack effects already at compile time. But this is difficult, because the runtime value of an execution token is generally not known at compile time. How can it be accomplished?

As usual, the data type system provides a solution. Although the compiler does not know the runtime value of the execution token, it should know the stack effect of the word associated with it. For each stack effect of the words being executed, a separate subtype of TOKEN and a separate version of EXECUTE has to be created. This is what )PROCREATES does:

( UNSIGNED -- FLAG )PROCREATES (U--F)  OK
LATEST PREV . (U--F) ( STACK-DIAGRAM -- 1ST )  OK
LATEST . EXECUTE ( UNSIGNED (U--F) -- FLAG )  OK

(U--F) is a direct subtype of TOKEN and is called a qualified token. In order to create execution tokens of data type (U--F), which can be executed with the specialized version of EXECUTE, you can use ?TOKEN:

' ?TOKEN . ?TOKEN ( DATA-TYPE -- TOKEN )  OK
DT (U--F) ?TOKEN 0= .S TOKEN  OK
5 SWAP CAST (U--F) EXECUTE . FALSE  OK

?TOKEN expects an item of data type DATA-TYPE on the stack, which is supposed to be a subtype of TOKEN. It parses the input stream for the name of a word and tries to find a definition with this name, whose stack diagram matches the one represented by the data type. In this example, ?TOKEN tries to find an (overloaded) version of 0= that can be applied to the stack diagram associated with (U--F). ?TOKEN throws an exception if it doesn't find a word with the given name and a suitable stack diagram in the dictionary. After casting this token to the data type of the qualified token, it can be executed by EXECUTE. Note that the specialized versions of EXECUTE can be used during compilation as well.

Only one qualified token is predefined by StrongForth: Items of data type SEARCH-CRITERION is used to pass additional search criteria to SEARCH and SEARCH-ALL, the words that perform the dictionary search in StrongForth. See the next section for details.

`FIND`

FIND is not available in StrongForth. As a replacement, you have to use SEARCH-ALL:

SEARCH-ALL ( CADDRESS -> CHARACTER UNSIGNED SINGLE SEARCH-CRITERION -- DEFINITION SIGNED )

It is actually not easy to compare the stack diagram of SEARCH-ALL with the original ANS Forth stack diagram of FIND:

( c-addr -- c-addr 0  |  xt 1  |  xt -1 )

These are the differences:

The counted string c-addr has been replaced with a string that is represented by the address CADDRESS -> CHARACTER of the first character and the character count UNSIGNED. As you already know, counted strings are no longer supported by StrongForth.
SEARCH-ALL has two additional input parameters of data types SINGLE and SEARCH-CRITERION. These are necessary in order to distinguish between different search criteria. SEARCH-CRITERION is the qualified token of a word that implements additional search criteria, and SINGLE is an optional parameter for it. For example, if the execution token of ONLY-NAME is provided as SEARCH-CRITERION, FIND just looks for the first occurence of a word whose name matches the string supplied to SEARCH-ALL. The value of SINGLE doesn't matter in this case. This is the semantics of the ANS Forth word FIND, because no additional search criteria are applied. If SEARCH-CRITERION is the execution token of MATCH, and SINGLE is zero, SEARCH-ALL performs an additional input parameter match on the contents of the interpreter or compiler data type heap. This is the search criterion used by the StrongForth interpreter. Several other search criteria are available and are described in the StrongForth glossary.
Instead of an execution token, SEARCH-ALL returns an item of data type DEFINITION. This item allows accessing the word's stack diagram and all other attributes, including the execution token.
Unlike FIND's stack diagram, the one of SEARCH-ALL does not depend on the search result. If SEARCH-ALL is not successful, it returns zero as DEFINITION and zero as SIGNED.

MOVE

The StrongForth word MOVE exists in three overloaded versions:

WORDS MOVE
MOVE ( CADDRESS -> SINGLE CADDRESS -> 2ND UNSIGNED -- )
MOVE ( ADDRESS -> DOUBLE ADDRESS -> 2ND UNSIGNED -- )
MOVE ( ADDRESS -> SINGLE ADDRESS -> 2ND UNSIGNED -- )  OK

MOVE can copy single-cell, double-cell and character size items. Applying 2* on the count value when moving double cells is not required, because MOVE knows the size of the items to be moved. Since MOVE also allows moving character size items, it can replace CMOVE and CMOVE> from the ANS Forth String word set in most cases. These two words are therefore not included in StrongForth.

DEPTH

Unlike in ANS Forth, StrongForth's version of DEPTH delivers the height of the data type heap. More acurately, it deliveres the number of basic data types on the data type heap, with compound data types counting as two or more basic data types. STATE determines whether the interpreter or the compiler data type heap is meant. Information about the data type heap is considered to be more important than information about the data stack, because it is possible to calculate the depth of the data stack from the contents of the data type heap, but not the other way around. Since the interpreter updates the data type heap before executing a word, the number DEPTH returns also counts itself, if DEPTH is actually executed by the interpreter. Here's an example:

7 BASE DEPTH .S UNSIGNED ADDRESS -> UNSIGNED UNSIGNED  OK
. 4  OK

File-Access Word Set

Many words from the File-Access word set have been renamed in StrongForth. Especially FILE as a prefix or suffix in the names of words has been deleted, because a system supporting operator overloading does not need to indicate in the name of a word on which kinds of operands it can be applied. For example, if an item of data type FILE is on top of the stack when interpreting or compiling SIZE, StrongForth will assume that the size of the given file and not the size of a data type or the size of a tuple is to be determined. It is not necessary to distinguish the three versions of SIZE by their name.

WORDS SIZE
SIZE ( FILE -- UNSIGNED-DOUBLE SIGNED )
SIZE ( DATA-TYPE -- UNSIGNED )
SIZE ( TUPLE -- 1ST UNSIGNED )  OK

For the same reason, the ANS Forth words INCLUDE-FILE and INCLUDED have both been renamed to INCLUDE. StrongForth can easily distinguish the two words by their input parameters and thus allows giving them the same name.

Other Obsolete Words

Several words from the ANS Forth Core Extension word set have not been implemented in StrongForth, because their use is discouraged. They are included in ANS Forth for compatibility reasons only. Compatibility with older versions is obviously not an issue in StrongForth. Thus, the following words do not exist in StrongForth:

CONVERT
EXPECT
QUERY
SPAN

Dr. Stephan Becher - February 2nd, 2009