#* 

ABOUT 

   translate.java.pss 

   Translate nom scripts into stand-alone compilable java code.

   This is a nom script which translates nom scripts into java
   code, using the 'pep' tool. The script creates a standalone 
   compilable java program.
   
   The virtual machine and engine is implemented in plain c at
   http://bumble.sf.net/books/pars/object/pep.c  This implements a script
   language with a syntax reminiscent of [sed] and awk (more similar
   to sed than to [awk] )
   
   This code was originally created in a straightforward manner by adapting
   the code in 'translate.js.pss' which compiles scripts to javascript 

NOTES
   
   We use labelled loops and break/continue to implement the 
   parse> label and .reparse .restart commands. Breaks are also
   used to implement the quit and bail commands.

TODO

   Convert grammar to the one used in translate.perl.pss
   which is much simpler and allows bracket expression tests.

   Implement the "write <filename>;" and "append <filename>;"
   commands (which are already implemented in the pep interpreter).

   Convert the parsing code to a method which takes an input
   stream as a parameter. This way the same parser/compiler 
   can be used with a string/file/stdin etc and can also be 
   used by other classes/objects.

SEE ALSO
   
   At http://bumble.sf.net/books/pars/

   - /tr/nom.toperl.pss
     A similar script for compiling scripts into the perl language
     But this (the perl translator) is more recent and much better 
     than the java script. All translation scripts in the /tr/ folder with
     the name nom.to<lang>.pss are more recent and much better organised
     than the scripts whose name is translate.<lang>.pss (although the 
     new scripts also have a symlink to the old style name).
   - translate.py.pss
     A script translator for python.
   - /eg/nom.to.pss
     A nom script that can translate any script into any available language
     with an plain English prompt (limited). eg:
     >> pep -f nom.to.pss -i "translate palindrome.pss to ruby"

TESTING

   * testing the multiple escaped until bug
   >> pep.jas 'r;until"c";add".";t;d;' 'ab\\cab\cabc'

   Complex scripts can be translated into java and work,
   including this script itself. 

   eg/natural.language.pss seems to translate well to java.

  * the following is working!
  ----
    pep -f translate.java.pss eg/mark.html.pss > Machine.java
    javac Machine.java
    cat pars-book.txt | java Machine
  ,,,,

   The following is working!:
   -----
     pep -f translate.java.pss translate.java.pss > Machine.java
     javac Machine.java
     cat eg/exp.tolisp.pss | java Machine > Machine.java
     javac Machine.java
     echo "(a+2)*3+4" | java Machine
   ,,,

   This is fairly complex. The script translates itself into
   java, and then that translator is used to translate 
   another script into java, which is then executed....

   But even more complex stuff is also working, such as
   self referentiality cubed!!! 
   -----
     pep -f translate.java.pss translate.java.pss > Machine.java
     javac Machine.java
     cat translate.java.pss | java Machine > Machine.java
     javac Machine.java
     cat eg/exp.tolisp.pss | java Machine > Machine.java
     javac Machine.java
     echo "(a+2)*3+4" | java Machine
   ,,,

   The script can be tested with something like
   ----
     pep -f translate.java.pss -i "r;[aeiou]{a '=vowel\n';t;}d;" > Machine.java
     javac Machine.java; 
     echo "abcdefhijklmnop" | java Machine
   ,,, 

   The output will be java code which is equivalent to the 
   script provided to the -i switch.

   * a very comprehensive test is to run it on itself
   >> pep -f translate.java.pss translate.java.pss > Machine.java

   This is the "shangrilah" of pep scripts.

   And then we could do!!
   >> cat translate.java.pss | java Machine
   which is self-referentiality squared, but I am not sure what
   its use is.

GOTCHAS

  I was trying to run 
  >> pep -e "r;a'\\';print;d;" -i "abc"
  and I kept getting an non-terminated quote message, which I thought I
  had fixed in machine.interp.c (until code). But the problem was actually
  the bash shell which resolves \\ to \ in double quotes, but not single quotes!

BUGS
   
  Xdigit not valid class.

  Its a bit strange to talk about a multicharacter string being "escaped"
  (eg when calling 'until') but this is allowed in the pep engine.

  add "\{"; will generate an "illegal escape character" error
  when trying to compile the generated java code. I need to 
  consider what to do in this situation (eg escape \ to \\ ?)

  check "go/mark" code. what happens if the mark is not found?? 
  throw error and exit I think.

SOLVED BUGS
 
  Found a bug in "replace" code, which was returning from inline code.

  Found and fixed a bug in the (==) code ie in java (stringa == stringb)
  doesnt work. 

  Found and fixed a bug in java nom://whilenot /while. The code exits if the 
  character is not found, which is not correct.

  delimiter was hard-coded in nom://push
  solved an nom://until bug where the java code did not read 
  at least one character.

TASKS 

HISTORY
    
  26 Aug 2025
    Added the pep://accumulator as an exit code for scripts
    This entire script should be rewritten in the format of 
    /tr/nom.torust.pss or /tr/nom.toperl.pss (these are more recent, and
    much better translation scripts).

  21 Mar 2025
    added addMark() function to remove duplicate mark bug

  18 Feb 2025
    Allowed go; (go to mark named on tape cell), 
    This go syntax is important in type checking
    I believe.

    Having another look. This can be reorganised. I like the idea
    of adding a "script*" token at (eof) which indicates a 
    successful parse. Would like to rename this "nom.to.java.pss"
    and all the other scripts as well. Also convert to the perl 
    grammar, much better.

  17 June 2022
    Converted the tape and marks arrays to the ArrayList type so that
    they can grow dynamically.

  15 July 2021
    probably fixed the multiple escape char "until" bug with
    the countEscaped() function.

  10 July 2021
    Trying to fix the 'until' code so that we can write 'add "x \\\\";'
    or add "x\\\"x"; fixed in translate.pss, now fix in translate.java.pss

  30 July 2020
    Found "bug", that a begin block with no other code 
    is not allowed as a script.

  29 July 2020
    found a bug in "go" (not getting text from tape).
    Also, delimiter was hard-coded in "push"
    Found a bug in "clop" (a return statement)
    
  25 July 2020

    The translation of /eg/mark.html.pss is now working. That means that many
    complex scripts now work with this script.  (Ed 2025: I now use
    /eg/text.tohtml.pss for rendering html from text) 

    Found another bug in the matches code. Classes must match
    the whole string, not just one character, so they need to 
    be eg: "^[a-z]+$" not just "[a-z]"

    Found a bug in the code for "tapetest*" and "nottapetest*"
    ie (==) and !(==). I was using the wrong equals operator for 
    java. I found this bug by using a new vim command on 
    code in the pars-book.txt. This command translates to java and 
    then compiles and runs pep fragments. This is a useful debugging
    technique.

  24 July 2020

    Very great advances today. See the testing heading for
    a strange but true self compilation example.

    The script successfully translates itself into java!!
    So the following works
    -----
      pep -f translate.java.pss translate.java.pss > Machine.java
      javac Machine.java
      echo "nop;r;t;t;d;" | java Machine
    ,,,,

    This script translates eg/json.parse.pss into a seemingly
    correct java program. It translates eg/mark.html.pss into
    compilable java code, but doesn't transform the text to 
    html correctly.

    Completely changed the way andtestset* and ortestset* tokens
    are parsed. This has greatly simplified the logic.
    First tests show that the script is working, although there will
    be bugs.

  23 July 2020
    
    Extensive revision of this script. Rewriting methods as "inline".
    But revision is incomplete. This script should become 
    a good template for writing similar scripts in other languages.

  22 July 2020

    Changed the stack code to use the java.util.Stack class.  In the process of
    rethinking this script and reforming it. I will include the Machine class
    within the output of the script, so that there are no dependencies on
    external code. . Also, I will remove trivial methods from the class.

  Oct 2019
    Made functions ppjjs, ppjjss, ppjjf in helpers.pars.sh so that java
    scripts can be easily run.

  30 sept 2019
    basic scripts working. whilenotPeep and whilePeep need to 
    be written properly. Also, translate unicode categories in
    [:text:] format to java regex.

  27 sept 2019
    Began to adapt this script from translate.js.pss

*# 

  read;
  #--------------
  [:space:] {
    [\n] { nochars; }
    clear; !(eof) { .restart } .reparse
  }

  #---------------
  # We can elide all these single character tests, because
  # the stack token is just the character itself with a *
  # Braces {} are used for blocks of commands, ',' and '.' for concatenating
  # tests with OR or AND logic. 'B' and 'E' for begin and end
  # tests, '!' is used for negation, ';' is used to terminate a 
  # command.
  "{", "}", ";", ",", ".", "!", "B", "E" {
    put; add "*"; push; .reparse 
  }

  #---------------
  # format: "text"
  "\"" {
    # save the start line number (for error messages) in case 
    # there is no terminating quote character.
    clear; add "line "; lines; add " (character "; chars; add ") ";
    put; clear; add '"';
    until '"'; 
    !E'"' { 
      clear; add 'Un-terminated quote character (") starting at ';
      get; add ' !\n'; 
      print; quit;
    }
    put; clear;
    add "quote*"; push;
    .reparse 
  }

 #---------------
 # format: 'text', single quotes are converted to double quotes
 # but we must escape embedded double quotes.
  "'" {
    # save the start line number (for error messages) in case 
    # there is no terminating quote character.
    clear; add "line "; lines; add " (character "; chars; add ") ";
    put; clear;
    until "'"; 
    !E"'" { 
      clear; add "Unterminated quote (') starting at ";
      get; add '!\n'; 
      print; quit;
    }
    clip; escape '"'; put; clear;
    add "\""; get; add "\"";
    put; clear; add "quote*";
    push; .reparse 
  }

  #---------------
  # formats: [:space:] [a-z] [abcd] [:alpha:] etc 
  # should class tests really be multi-line??!
  "[" {
    # save the start line number (for error messages) in case 
    # there is no terminating bracket character.
    clear; add "line "; lines; add " (character "; chars; add ") ";
    put; clear; add "[";
    until "]"; 
    "[]" {
      clear; add "pep script error at line "; lines;
      add " (character "; chars; add "): \n";
      add "  empty character class [] \n";
      print; quit;
    }
    !E"]" { 
      clear; add "Unterminated class text ([...]) starting at "; get; 
      add "
      class text can be used in tests or with the 'while' and 
      'whilenot' commands. For example: 
        [:alpha:] { while [:alpha:]; print; clear; }
      ";
      print; quit;
    }

    # need to escape quotes so they don't interfere with the
    # quotes java needs for .matches("...")
    escape '"';
    # the caret is not a negation operator in pep scripts
    replace "^" "\\\\^";
    # need to solve this fix:
    # replace "\\[" "\\\\["; replace "\\]" "\\\\]";

    # save the class on the tape
    put;
    clop; clop;
    !B"-" {
      # not a range class, eg [a-z] so need to escape '-' chars
      # java requires a double escape
      clear; get; replace '-' '\\\\-'; put;
    }
    B"-" {
      # a range class, eg [a-z], check if it is correct
      clip; clip; 
      !"-" {
        clear;
        add "Error in pep script at line "; lines;
        add " (character "; chars; add "): \n";
        add " Incorrect character range class "; get;
        add "
   For example:
     [a-g]  # correct
     [f-gh] # error! \n";
        print; clear; quit;

      }
    }
    clear; get;  # restore class text
    B"[:".!E":]" { 
      clear; add "malformed character class starting at ";
      get; add '!\n'; 
      print; quit;
    }
    B"[:".!"[:]" {
      clip; clip; clop; clop;
      # Unicode posix character classes in java 
      # Also, abbreviations
      "alnum","N" { clear; add "\\\\p{Alnum}"; }
      "alpha","A" { clear; add "\\\\p{Alpha}"; }
      "ascii","I" { clear; add "\\\\p{ASCII}"; }
      "blank","B" { clear; add "\\\\p{Blank}"; }
      "cntrl","C" { clear; add "\\\\p{Cntrl}"; }
      "digit","D" { clear; add "\\\\p{Digit}"; }
      "graph","G" { clear; add "\\\\p{Graph}"; }
      # or equiv to graph [^\p{Z}\p{C}] as suggested on stack overflow
      "lower","L" { clear; add "\\\\p{Lower}"; }
      "print","P" { clear; add "\\\\p{Print}"; }
      "punct","T" { clear; add "\\\\p{Punct}"; }
      "space","S" { clear; add "\\\\p{Space}"; }
      "upper","U" { clear; add "\\\\p{Upper}"; }
      "xdigit","X" { clear; add "\\\\p{XDigit}"; }
      !B"\\\\p{" {
        put; clear;
        add "Pep script syntax error near line "; lines;
        add " (character "; chars; add "): \n";
        add "Unknown character class '"; get; add "'\n";
        print; clear; quit;
      }
    }
    #*
     alnum - alphanumeric like [0-9a-zA-Z] 
     alpha - alphabetic like [a-zA-Z] 
     blank - blank chars, space and tab 
     cntrl - control chars, ascii 000 to 037 and 177 (del) 
     digit - digits 0-9 
     graph - graphical chars same as :alnum: and :punct: 
     lower - lower case letters [a-z] 
     print - printable chars ie :graph: + space 
     punct - punctuation ie !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~. 
     space - all whitespace, eg \n\r\t vert tab, space, \f 
     upper - upper case letters [A-Z] 
     xdigit - hexadecimal digit ie [0-9a-fA-F] 
    *#

    put; clear;
    # add quotes around the class and limits around the 
    # class so it can be used with the string.matches() method
    # (must match the whole string, not just one character)
    add '"^'; get; add '+$"'; put; clear;
    add "class*"; push;
    .reparse 
  }

 #---------------
 # formats: (eof) (EOF) (==) etc. 
  "(" {
    clear; until ")"; clip;
    put; 
    "eof","EOF" { clear; add "eof*"; push; .reparse } 
    "==" { clear; add "tapetest*"; push; .reparse } 
    add " << unknown test near line "; lines;
    add " of script.\n";
    add " bracket () tests are \n";
    add "   (eof) test if end of stream reached. \n";
    add "   (==)  test if workspace is same as current tape cell \n";
    print; clear;
    quit;
  }

  #---------------
  # multi-line and single line comments, eg #... and #* ... *#
  "#" {
    clear; read;
    "\n" { clear; .reparse }

    # checking for multi-line comments of the form "#* \n\n\n *#"
    # these are just ignored at the moment (deleted) 
    "*" { 
      # save the line number for possible error message later
      clear; lines; put; clear;
      until "*#"; 
      E"*#" {
        # convert to /* ... */ java multi-line comment
        clip; clip;
        put; clear; add "/*"; get; add "*/";
        # create a "comment" parse token
        put; clear; 
        # comment-out this line to remove multi-line comments from the 
        # compiled java.
        # add "comment*"; push; 
        .reparse  
      }
      # make an unterminated multi-line comment an error
      # to ease debugging of scripts.
      clear; 
      add "unterminated multi-line comment #* ... *# \n";
      add "stating at line number "; get; add "\n";
      print; clear;
      quit;
    }

    # single line comments. some will get lost.
    put; clear; add "//"; get; until "\n"; clip;
    put; clear; add "comment*"; push; 
    .reparse 
  }

 #----------------------------------
 # parse command words (and abbreviations)

 # legal characters for keywords (commands)
 ![abcdefghijklmnopqrstuvwxyzBEKGPRUWS+-<>0^] {
   # error message about a misplaced character
   put; clear;
   add "!! Misplaced character '";
   get;
   add "' in script near line "; lines;
   add " (character "; chars; add ") \n";
   print; clear; quit;
 }

   # my testclass implementation cannot handle complex lists
   # eg [a-z+-] this is why I have to write out the whole alphabet

   while [abcdefghijklmnopqrstuvwxyzBEOFKGPRUWS+-<>0^];
   #----------------------------------
   # KEYWORDS 
   # here we can test for all the keywords (command words) and their
   # abbreviated one letter versions (eg: clip k, clop K etc). Then
   # we can print an error message and abort if the word is not a 
   # legal keyword for the parse-edit language

   # make ll an alias for "lines" and cc an alias for chars
   "ll" { clear; add "lines"; }
   "cc" { clear; add "chars"; }
   # one letter command abbreviations
   "a" { clear; add "add"; }
   "k" { clear; add "clip"; }
   "K" { clear; add "clop"; }
   "D" { clear; add "replace"; }
   "d" { clear; add "clear"; }
   "t" { clear; add "print"; }
   "p" { clear; add "pop"; }
   "P" { clear; add "push"; }
   "u" { clear; add "unstack"; }
   "U" { clear; add "stack"; }
   "G" { clear; add "put"; }
   "g" { clear; add "get"; }
   "x" { clear; add "swap"; }
   ">" { clear; add "++"; }
   "<" { clear; add "--"; }
   "m" { clear; add "mark"; }
   "M" { clear; add "go"; }
   "r" { clear; add "read"; }
   "R" { clear; add "until"; }
   "w" { clear; add "while"; }
   "W" { clear; add "whilenot"; }
   "n" { clear; add "count"; }
   "+" { clear; add "a+"; }
   "-" { clear; add "a-"; }
   "0" { clear; add "zero"; }
   "c" { clear; add "chars"; }
   "l" { clear; add "lines"; }
   "^" { clear; add "escape"; }
   "v" { clear; add "unescape"; }
   "z" { clear; add "delim"; }
   "S" { clear; add "state"; }
   "q" { clear; add "quit"; }
   "s" { clear; add "write"; }
   "o" { clear; add "nop"; }
   "rs" { clear; add "restart"; }
   "rp" { clear; add "reparse"; }

   # some extra syntax for testeof and testtape
   "<eof>","<EOF>" { put; clear; add "eof*"; push; .reparse }
   "<==>" { put; clear; add "tapetest*"; push; .reparse }

   "jump","jumptrue","jumpfalse",
   "testis","testclass","testbegins","testends",
   "testeof","testtape" {
     put; clear;
     add "The instruction '"; get; add "' near line "; lines; 
     add " (character "; chars; add ")\n";
     add "can be used in pep assembly code but not scripts. \n";
     print; clear; quit;
   }
   
   # show information if these "deprecated" commands are used
   "Q","bail","state" {
     put; clear;
     add "The instruction '"; get; add "' near line "; lines; 
     add " (character "; chars; add ")\n";
     add "is no longer part of the pep language (july 2020). \n";
     add "use 'quit' instead of 'bail', and use 'unstack; print;' \n";
     add "instead of 'state'. \n";
     print; clear; quit;
   }
   
   "add","clip","clop","replace","upper","lower","cap","clear","print",
   "pop","push","unstack","stack","put","get","swap",
   "++","--","mark","go","read","until","while","whilenot",
   "count","a+","a-","zero","chars","lines","nochars","nolines",
   "escape","unescape","delim","quit",
   "write","nop","reparse","restart" {
     put; clear;
     add "word*";
     push; .reparse
   }
   
   #------------ 
   # the .reparse command and "parse label" is a simple way to 
   # make sure that all shift-reductions occur. It should be used inside
   # a block test, so as not to create an infinite loop. There is
   # no "goto" in java so we need to use labelled loops to 
   # implement .reparse/parse>

   "parse>" {
     clear; count;
     !"0" {
       clear; 
       add "script error:\n";
       add "  extra parse> label at line "; lines; add ".\n";
       print;
       quit;
     }
     clear; add "// parse>"; put;
     clear; add "parse>*"; push;
     # use accumulator to indicate after parse> label
     a+; .reparse 
   }

   # --------------------
   # implement "begin-blocks", which are only executed
   # once, at the beginning of the script (similar to awk's BEGIN {} rules)
   "begin" {
     put; add "*"; push; .reparse 
   }

   add " << unknown command on line "; lines; 
   add " (char "; chars; add ")"; 
   add " of source file. \n"; 
   print; clear; quit;

# ----------------------------------
# PARSING PHASE:

# Below is the parse/compile phase of the script. Here we pop tokens off the
# stack and check for sequences of tokens eg "word*semicolon*". If we find a
# valid series of tokens, we "shift-reduce" or "resolve" the token series eg
# word*semicolon* --> command*
#
# At the same time, we manipulate (transform) the attributes on the tape, as
# required. 
#

parse>

#-------------------------------------
# 2 tokens
#-------------------------------------
  pop; pop;

  # All of the patterns below are currently errors, but may not
  # be in the future if we expand the syntax of the parse
  # language. Also consider:
  #    begintext* endtext* quoteset* notclass*, !* ,* ;* B* E*
  # It is nice to trap the errors here because we can emit some
  # (hopefully not very cryptic) error messages with a line number.
  # Otherwise the script writer has to debug with
  #   pep -a asm.pp -I scriptfile 
  #

  "word*word*","word*}*","word*begintext*","word*endtext*", "word*!*",
  "word*,*","quote*word*", "quote*class*", "quote*state*", "quote*}*",
  "quote*begintext*", "quote*endtext*", "class*word*", "class*quote*",
  "class*class*", "class*state*", "class*}*", "class*begintext*",
  "class*endtext*", "class*!*", "notclass*word*", "notclass*quote*",
  "notclass*class*", "notclass*state*", "notclass*}*" {
    add " (Token stack) \nValue: \n"; get; 
    add "\nValue: \n"; ++; get; --; add "\n";
    add "Error near line "; lines; add " (char "; chars; add ")"; 
    add " of pep script (missing semicolon?) \n";
    print; clear; 
    quit;
  }  

  "{*;*", ";*;*", "}*;*" {
    push; push;
    add "Error near line "; lines; add " (char "; chars; add ")"; 
    add " of pep script: misplaced semi-colon? ; \n";
    print; clear; quit;
  }

  ",*{*" {
    push; push;
    add "Error near line "; lines; add " (char "; chars; add ")"; 
    add " of script: extra comma in list? \n";
    print; clear; quit;
  }

  "command*;*","commandset*;*" {
    push; push;
    add "Error near line "; lines; add " (char "; chars; add ")"; 
    add " of script: extra semi-colon? \n";
    print; clear; quit;
  }

  "!*!*" {
    push; push;
    add "error near line "; lines; add " (char "; chars; add ")"; 
    add " of script: \n double negation '!!' is not implemented \n";
    add " and probably won't be, because what would be the point? \n";
    print; clear; quit;
  }

  "!*{*","!*;*" {
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; 
    add " of script: misplaced negation operator (!)? \n";
    add " The negation operator precedes tests, for example: \n";
    add "   !B'abc'{ ... } or !(eof),!'abc'{ ... } \n";
    print; clear; quit;
  }

  ",*command*" {
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; 
    add " of script: misplaced comma? \n";
    print; clear; quit;
  }

  "!*command*" {
    push; push;
    add "error near line "; lines;
    add " (at char "; chars; add ") \n"; 
    add " The negation operator (!) cannot precede a command \n";
    print; clear; quit;
  }

  ";*{*", "command*{*", "commandset*{*" {
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; 
    add " of script: no test for brace block? \n";
    print; clear; quit;
  }

  "{*}*" {
    push; push;
    add "error near line "; lines;
    add " of script: empty braces {}. \n";
    print; clear; quit;
  }

  "B*class*","E*class*" {
    push; push;
    add "error near line "; lines;
    add " of script:\n  classes ([a-z], [:space:] etc). \n";
    add "  cannot use the 'begin' or 'end' modifiers (B/E) \n";
    print; clear; quit;
  }

  "comment*{*" {
    push; push;
    add "error near line "; lines;
    add " of script: comments cannot occur between \n";
    add " a test and a brace ({). \n";
    print; clear; quit;
  }

  "}*command*" {
    push; push;
    add "error near line "; lines;
    add " of script: extra closing brace '}' ?. \n";
    print; clear; quit;
  }

  #*
  E"begin*".!"begin*" {
    push; push;
    add "error near line "; lines;
    add " of script: Begin blocks must precede code \n";
    print; clear; quit;
  }
  *#

  #------------ 
  # The .restart command jumps to the first instruction after the
  # begin block (if there is a begin block), or the first instruction
  # of the script.
  ".*word*" {
    clear; ++; get; --;
    "restart" {
      clear; add "continue script;";
      # not required because we have labelled loops, 
      # continue script works both before and after the parse> label
      # "0" { clear; add "continue script;"; }
      # "1" { clear; add "break lex;"; }
      put; clear;
      add "command*";
      push; .reparse 
    }
    "reparse" {
      clear; count; 
      # check accumulator to see if we are in the "lex" block
      # or the "parse" block and adjust the .reparse compilation
      # accordingly.
      "0" { clear; add "break lex;"; }
      "1" { clear; add "continue parse;"; }
      put; clear;
      add "command*";
      push; .reparse 
    }
    push; push;
    add "error near line "; lines;
    add " (char "; chars; add ")"; add " of script:  \n";
    add " misplaced dot '.' (use for AND logic or in .reparse/.restart \n";
    print; clear; quit;
  }

  #---------------------------------
  # Compiling comments so as to transfer them to the java 
  "comment*command*","command*comment*","commandset*comment*" {
    clear; get; add "\n"; ++; get; --; put; clear;
    add "command*"; push; .reparse
  }
  "comment*comment*" {
    clear; get; add "\n"; ++; get; --; put; clear;
    add "comment*"; push; .reparse
  }

  # -----------------------
  # negated tokens.
  #
  # This is a new more elegant way to negate a whole set of 
  # tests (tokens) where the negation logic is stored on the 
  # stack, not in the current tape cell. We just add "not" to 
  # the stack token.

  # eg: ![:alpha:] ![a-z] ![abcd] !"abc" !B"abc" !E"xyz"
  #  This format is used to indicate a negative test for 
  #  a brace block. eg: ![aeiou] { add "< not a vowel"; print; clear; }

  "!*quote*","!*class*","!*begintext*", "!*endtext*",
  "!*eof*","!*tapetest*" {
    # a simplification: store the token name "quote*/class*/..."
    # in the tape cell corresponding to the "!*" token. 
    replace "!*" "not"; push;
    # this was a bug?? a missing ++; ??
    # now get the token-value
    get; --; put; ++; clear;
    .reparse
  }

  #-----------------------------------------
  # format: E"text" or E'text'
  #  This format is used to indicate a "workspace-ends-with" text before
  #  a brace block.
  "E*quote*" {
     clear; add "endtext*"; push; get; 
     '""' {
       # empty argument is an error
       clear;
       add "pep script error near line "; lines;
       add " (character "; chars; add "): \n";
       add '  empty argument for end-test (E"") \n';
       print; quit;
     }
     --; put; ++;
     clear; .reparse
  } 

  #-----------------------------------------
  # format: B"sometext" or B'sometext' 
  #   A 'B' preceding some quoted text is used to indicate a 
  #   'workspace-begins-with' test, before a brace block.
  "B*quote*" {
     clear; add "begintext*"; push; get; 
     '""' {
       # empty argument is an error
       clear;
       add "pep script error near line "; lines;
       add " (character "; chars; add "): \n";
       add '  empty argument for begin-test (B"") \n';
       print; quit;
     }
     --; put; ++;
     clear; .reparse
  } 

  #--------------------------------------------
  # ebnf: command := word, ';' ;
  # formats: "pop; push; clear; print; " etc
  # all commands need to end with a semi-colon except for 
  # .reparse and .restart
  #
  "word*;*" {
     clear;
     # check if command requires parameter
     get;
     "add","while","whilenot","mark",
     "escape", "unescape","delim","replace" {
       put; clear; add "'"; get; add "'";
       add " command needs an argument, on line "; lines; 
       add " of script.\n";
       print; clear; quit;
     }

     # go;  
     # not implemented in pars/compile.pss yet (feb 2025)
     "go" { 
       clear;
       add "mm.goToMark(mm.tape.get(mm.tapePointer));  /* go (tape) */";
       put;
     }
     "mark" { 
       clear;
       add "mm.addMark(mm.tape.get(mm.tapePointer));  /* mark (tape) */";
       put;
     }

     # the new until; command with no argument
     "until" { 
       clear;
       add "mm.until(mm.tape.get(mm.tapePointer));  /* until (tape) */";
       put;
     }

     "clip" { 
       clear; 
       # are these length tests really necessary
       add "if (mm.workspace.length() > 0) { /* clip */\n";
       add "  mm.workspace.delete(mm.workspace.length() - 1, \n";
       add "  mm.workspace.length()); }";
       put; 
     }
     "clop" { 
       clear; 
       add "if (mm.workspace.length() > 0) { /* clop */\n";
       add "  mm.workspace.delete(0, 1); }   /* clop */";
       put; 
     }
     "clear" { 
       clear; add "mm.workspace.setLength(0);";
       add "            /* clear */"; put; 
     }
     "upper" { 
       clear; 
       add "/* upper */ \n"; 
       add "for (int i = 0; i < mm.workspace.length(); i++) { \n";
       add "  char c = mm.workspace.charAt(i); \n";
       add "  mm.workspace.setCharAt(i, Character.toUpperCase(c)); } ";
       put; 
     }
     "lower" { 
       clear; 
       add "/* lower */ \n"; 
       add "for (int i = 0; i < mm.workspace.length(); i++) { \n";
       add "  char c = mm.workspace.charAt(i); \n";
       add "  mm.workspace.setCharAt(i, Character.toLowerCase(c)); } ";
       put;
     }
     "cap" { 
       clear; 
       add "/* cap */ \n"; 
       add "for (int i = 0; i < mm.workspace.length(); i++) { \n";
       add "  char c = mm.workspace.charAt(i); \n";
       add "  if (i==0){ mm.workspace.setCharAt(i, Character.toUpperCase(c)); } \n";
       add "  else { mm.workspace.setCharAt(i, Character.toLowerCase(c)); } \n";
       add "}";
       put;
     }
     "print" { 
       clear; add "System.out.print(mm.workspace); /* print */"; put; 
     }
     "pop" { clear; add "mm.pop();"; put; }
     "push" { clear; add "mm.push();"; put; }
     "unstack" { 
        clear; add "while (mm.pop());          /* unstack */"; put; }
     "stack" { 
        clear; add "while(mm.push());          /* stack */"; put; }
     "put" { 
       clear; 
       add "mm.tape.get(mm.tapePointer).setLength(0); /* put */\n";
       add "mm.tape.get(mm.tapePointer).append(mm.workspace); ";
       put; }
     "get" { 
       clear; 
       add "mm.workspace.append(mm.tape.get(mm.tapePointer)); /* get */";
       put;
     }
     "swap" { clear; add "mm.swap();"; put; }
     "++" { 
       clear; add "mm.increment();"; 
       add "                 /* ++ */"; put; }
     "--" { 
       clear; 
       add "if (mm.tapePointer > 0) mm.tapePointer--; /* -- */"; put;
     }
     "read" { clear; add "mm.read(); /* read */"; put; }
     "count" { 
       clear; 
       add "mm.workspace.append(mm.accumulator); /* count */"; put; 
     }
     "a+" { clear; add "mm.accumulator++; /* a+ */"; put; }
     "a-" { clear; add "mm.accumulator--; /* a- */"; put; }
     "zero" { clear; add "mm.accumulator = 0; /* zero */"; put; }
     "chars" { 
       clear; add "mm.workspace.append(mm.charsRead); /* chars */"; put; 
     }
     "lines" { 
       clear; add "mm.workspace.append(mm.linesRead); /* lines */"; put; 
     }
     "nochars" { clear; add "mm.charsRead = 0; /* nochars */"; put; }
     "nolines" { clear; add "mm.linesRead = 0; /* nolines */"; put; }
     # use a labelled loop to quit script.
     "quit" { clear; add "break script;"; put; }
     "write" { clear; add "mm.writeToFile();"; put; }
     # just eliminate since it does nothing.
     "nop" { clear; add "/* nop: no-operation eliminated */"; put; }

     clear; add "command*";
     push; .reparse
   }

  #-----------------------------------------
  # ebnf: commandset := command , command ;
  "command*command*", "commandset*command*" {
    clear;
    add "commandset*"; push;
    # format the tape attributes. Add the next command on a newline 
    --; get; add "\n"; 
    ++; get; --;
    put; ++; clear; 
    .reparse
  } 

  #-------------------
  # here we begin to parse "test*" and "ortestset*" and "andtestset*"
  # 

  #-------------------
  # eg: B"abc" {} or E"xyz" {}
  # transform and markup the different test types
  "begintext*,*","endtext*,*","quote*,*","class*,*",
  "eof*,*","tapetest*,*",
  "begintext*.*","endtext*.*","quote*.*","class*.*",
  "eof*.*","tapetest*.*",
  "begintext*{*","endtext*{*","quote*{*","class*{*",
  "eof*{*","tapetest*{*" {

    B"begin" { clear; add "mm.workspace.toString().startsWith("; }
    B"end" { clear; add "mm.workspace.toString().endsWith("; }
    B"quote" { clear; add "mm.workspace.toString().equals("; }
    B"class" { clear; add "mm.workspace.toString().matches("; }
    # clear the tapecell for testeof and testtape because
    # they take no arguments. 
    B"eof" { clear; put; add "mm.eof"; }
    B"tapetest" { 
      clear; put; 
      add 
        "(mm.workspace.toString().equals(mm.tape.get(mm.tapePointer).toString())"; 
    }
    get; !B"mm.eof" { add ")"; }
    put; 
    #*
    #  maybe we could ellide the not tests by doing here
    B"not" { clear; add "!"; get; put; }
    *#
    clear; add "test*"; push;
    # the trick below pushes the right token back on the stack.
    get; add "*"; push; .reparse
  }

  #-------------------
  # negated tests
  # eg: !B"xyz {} !(eof) {} !(==) {}
  #     !E"xyz" {} 
  #     !"abc" {}
  #     ![a-z] {}
  "notbegintext*,*","notendtext*,*","notquote*,*","notclass*,*",
  "noteof*,*","nottapetest*,*",
  "notbegintext*.*","notendtext*.*","notquote*.*","notclass*.*",
  "noteof*.*","nottapetest*.*",
  "notbegintext*{*","notendtext*{*","notquote*{*","notclass*{*",
  "noteof*{*","nottapetest*{*"
  {

    B"notbegin" { clear; add "!mm.workspace.toString().startsWith("; }
    B"notend" { clear; add "!mm.workspace.toString().endsWith("; }
    B"notquote" { clear; add "!mm.workspace.toString().equals("; }
    B"notclass" { clear; add "!mm.workspace.toString().matches("; }
    # clear the tapecell for testeof and testtape because
    # they take no arguments. 
    B"noteof" { clear; put; add "!mm.eof"; }
    B"nottapetest" { 
      clear; put; 
      add 
        "(!mm.workspace.toString().equals(mm.tape.get(mm.tapePointer).toString())"; 
    }
    get; !B"!mm.eof" { add ")"; }
    put; clear; add "test*"; push; 
    # the trick below pushes the right token back on the stack.
    get; add "*"; push; .reparse
  }

  #-------------------
  # 3 tokens
  #-------------------

  pop;

  #-----------------------------
  # some 3 token errors!!!
 
  # not a comprehensive list of 3 token errors
  "{*quote*;*","{*begintext*;*","{*endtext*;*","{*class*;*",
  "commandset*quote*;*", "command*quote*;*" {
    push; push; push;
    add "[pep error]\n invalid syntax near line "; lines;
    add " (char "; chars; add ")"; 
    add " of script (misplaced semicolon?) \n";
    print; clear; quit;
  }  

  # to simplify subsequent tests, transmogrify a single command
  # to a commandset (multiple commands).
  "{*command*}*" {
    clear; add "{*commandset*}*"; push; push; push;
    .reparse
  }

  # errors! mixing AND and OR concatenation
  ",*andtestset*{*",
  ".*ortestset*{*" {
    # push the tokens back to make debugging easier
    push; push; push; 
    add " error: mixing AND (.) and OR (,) concatenation in \n";
    add " in pep script near line "; lines;
    add " (character "; chars; add ") \n";
    add ' 
  For example:
     B".".!E"/".[abcd./] { print; }  # Correct!
     B".".!E"/",[abcd./] { print; }  # Error! \n';
    print; clear; quit;
  }

  #--------------------------------------------
  # ebnf: command := keyword , quoted-text , ";" ;
  # format: add "text";
  "word*quote*;*" {
    clear; get;
    "replace" {
       # error 
       add "< command requires 2 parameters, not 1 \n";
       add "near line "; lines;
       add " of script. \n";
       print; clear; quit;
    }

    # check whether argument is single character, otherwise
    # throw an error
    "escape", "unescape", "while", "whilenot" {
      # This is trickier than I thought it would be.
      clear; ++; get; --; 
      # check that arg not empty, (but an empty quote is ok 
      # for the second arg of 'replace'
      '""' {
        clear; 
        add "[pep error] near line "; lines;
        add " (or char "; chars; add "): \n"; 
        add "  command '"; get; 
        add '\' cannot have an empty argument ("") \n';
        print; quit;
      }

      # quoted text has the quotes still around it.
      # also handle escape characters like \n \r etc
      clip; clop; clop; clop;
      # B "\\" { clip; } 
      clip; 
      !"" {
        clear; 
        add "Pep script error near line "; lines;
        add " (character "; chars; add "): \n"; 
        add "  command '"; get; 
        add "' takes only a single character argument. \n";
        print; quit;
      }
      clear; get;
    }

    "mark" {
      clear;
      add "/* mark */ \n";
      # added function to remove existing mark with the same name
      add "mm.addMark("; ++; get; --; add "); /* mark */";
      put; clear; add "command*"; push; .reparse
    }

    "go" {
      clear;
      add "mm.goToMark("; ++; get; --; add ");   /* go */";
      put; clear; add "command*"; push; .reparse
    }

    "delim" {
      clear;
      # this.delimiter.setCharAt(0, text.charAt(0));
      # only the first character of the delimiter argument is used. 
      add "mm.delimiter.setLength(0); /* delim */\n"; 
      add "mm.delimiter.append("; ++; get; --; 
      add "); ";
      put; clear; add "command*"; push; .reparse
    }

    "add" {
      clear;
      add "mm.workspace.append("; ++; get; --; 
      # handle multi-line text
      replace "\n" '"); \nmm.workspace.append("\\n';
      add "); /* add */";
      put; clear; add "command*"; push; .reparse
    }

   
    "while" {
      clear;
      add "while ((char) mm.peep == "; ++; get; --;
      add ".charAt(0)) /* while */\n "; 
      add " { if (mm.eof) {break;} mm.read(); }"; 
      put; clear; add "command*"; push; .reparse
    }

    "whilenot" {
      clear;
      add "while ((char) mm.peep != "; ++; get; --;
      add ".charAt(0)) /* whilenot */\n ";
      add " { if (mm.eof) {break;} mm.read(); }"; 
      put; clear; add "command*"; push; .reparse
    }

    "until" {
       clear; add "mm.until("; 
       ++; get; --; 
       # error until cannot have empty argument
       'mm.until(""' { 
         clear; 
         add "Pep script error near line "; lines;
         add " (character "; chars; add "): \n";
         add " empty argument for 'until' \n";
         add " 
   For example:
     until '.txt'; until \">\";    # correct   
     until '';  until \"\";        # errors! \n";
         print; quit;
       }
       # handle multi-line argument
       replace "\n" "\\n";
       add ');'; put; clear;
       add "command*"; push; .reparse
     }

    # But really, can't the "replace" command just be used
    # instead of escape/unescape?? This seems a flaw in the 
    # machine design.
    "escape","unescape" {
       clear; add "mm."; get; add "Char"; 
       add "("; ++; get; --; add '.charAt(0));'; put; clear;
       add "command*"; push; .reparse
     }

     # error, superfluous argument
     add ": command does not take an argument \n";
     add "near line "; lines;
     add " of script. \n";
     print; clear;
     #state
     quit;
   }


   #----------------------------------
   # format: "while [:alpha:] ;" or whilenot [a-z] ;

   "word*class*;*" {
     clear; get;

     "while" {
       clear;
       add "/* while */ \n";
       add "while (Character.toString((char)mm.peep).matches("; ++; get; --;
       add ")) { if (mm.eof) { break; } mm.read(); }"; 
       put; clear; add "command*"; push; .reparse
     }

     "whilenot" {
       clear;
       add "/* whilenot */ \n";
       add "while (!Character.toString((char)mm.peep).matches("; ++; get; --;
       add ")) { if (mm.eof) { break; } mm.read(); }"; 
       put; clear; add "command*"; push; .reparse
     }

     # error 
     add " < command cannot have a class argument \n";
     add "line "; lines; add ": error in script \n";
     print; clear; quit;
   }


  # arrange the parse> label loops
  (eof) {
    "commandset*parse>*commandset*","command*parse>*commandset*",
    "commandset*parse>*command*","command*parse>*command*" {
      clear; 
      # indent both code blocks
      add "  "; get; replace "\n" "\n  "; put; clear; ++; ++;
      add "  "; get; replace "\n" "\n  "; put; clear; --; --;
      # add a block so that .reparse works before the parse> label.
      add "lex: { \n";
      add "  if (jumptoparse) { jumptoparse=false;break lex; }\n";
      get; add "\n}\n"; ++; ++;
      # indent code block
      # add "  "; get; replace "\n" "\n  "; put; clear;
      add "parse: \n";
      add "while (true) { \n"; get;
      add "\n  break parse;\n}"; 
      --; --; put; clear;
      add "commandset*"; push; .reparse
    }
  }

  # -------------------------------
  # 4 tokens
  # -------------------------------

  pop;

  #-------------------------------------
  # bnf:     command := replace , quote , quote , ";" ;
  # example:  replace "and" "AND" ; 
  "word*quote*quote*;*" {
    clear; get;
    "replace" {
      #---------------------------
      # a command plus 2 arguments, eg replace "this" "that"
      clear; 
      add "/* replace */ \n";
      add "if (mm.workspace.length() > 0) { \n";
      add "  temp = mm.workspace.toString().replace(";
      ++; get; add ", ";
      ++; get; add ");\n"; 
      add "  mm.workspace.setLength(0); \n";
      add "  mm.workspace.append(temp);\n} ";
      --; --; put;
      clear; add "command*"; push; .reparse
    }

    add "nom script error on line "; lines; 
    add " (character "; chars; add "): \n";
    add "  command does not take 2 quoted arguments. \n";
    print; quit;
  }

  #-------------------------------------
  # format: begin { #* commands *# }
  # "begin" blocks which are only executed once (they
  # will are assembled before the "start:" label. They must come before
  # all other commands.

   "begin*{*commandset*}*" {
      clear; 
      # need to add a 'begin {} javad block so as to implement
      # .restart and .reparse in the begin block.
      ++; ++; 
      add "begin: {\n"; get; replace "\n" "\n  "; add "\n}"; 
      # make .restart work
      replace "continue script;" "break begin;"; 
      # make .reparse work
      replace "break lex;" "jumptoparse=true;break begin;"; 
      --; --; put; clear;
      add "beginblock*"; push; .reparse
   }

   # -------------
   # parses and compiles concatenated tests
   # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ...

   # these 2 tests should be all that is necessary
   "test*,*ortestset*{*",
   "test*,*test*{*" {
     clear; get; add " || ";
     ++; ++; get; --; --; put; clear; 
     add "ortestset*{*";
     push; push;
     .reparse
   }

   # dont mix AND and OR concatenations 

   # -------------
   # AND logic 
   # parses and compiles concatenated AND tests
   # eg: 'a',B'b',E'c',[def],[:space:],[g-k] { ...
   # it is possible to elide this block with the negated block
   # for compactness but maybe readability is not as good.

   # negated tests can be chained with non negated tests.
   # eg: B'http' . !E'.txt' { ... }

   "test*.*andtestset*{*",
   "test*.*test*{*" {
     clear; get; add " && ";
     ++; ++; get; --; --; put; clear; 
     add "andtestset*{*";
     push; push; .reparse
   }

  #-------------------------------------
  # we should not have to check for the {*command*}* pattern
  # because that has already been transformed to {*commandset*}*

  "test*{*commandset*}*",
  "andtestset*{*commandset*}*",
  "ortestset*{*commandset*}*" { 
     clear; 
     # indent the java code for readability
     ++; ++; add "  "; get; replace "\n" "\n  "; put; --; --; 
     clear; add "if ("; get; add ") {\n";
     ++; ++; get;
     add "\n}"; 
     --; --; put; clear;
     add "command*";
     push;
     # always reparse/compile
     .reparse
   }

  # -------------
  # multi-token end-of-stream errors
  # not a comprehensive list of errors...
  (eof) {
    E"begintext*",E"endtext*",E"test*",E"ortestset*",E"andtestset*" {
      add "  Error near end of script at line "; lines;
      add ". Test with no brace block? \n";
      print; clear; quit;
    }

    E"quote*",E"class*",E"word*"{
      put; clear; 
      add "Error at end of pep script near line "; lines; 
      add ": missing semi-colon? \n";
      add "Parse stack: "; get; add "\n";
      print; clear; quit;
    }

    E"{*", E"}*", E";*", E",*", E".*", E"!*", E"B*", E"E*" {
      put; clear; 
      add "Error: misplaced terminal character at end of script! (line "; 
      lines; add "). \n";
      add "Parse stack: "; get; add "\n";
      print; clear; quit;
    }
  }

  # put the 4 (or less) tokens back on the stack
  push; push; push; push;

  (eof) {
    print; clear;

    # create the virtual machine object code and save it
    # somewhere on the tape.
    add '

 /* Java code generated by "translate.java.pss" */
 import java.io.*;
 import java.util.regex.*;
 import java.util.*;   // contains stack

 public class Machine {
   // using int instead of char so that all unicode code points are
   // available instead of just utf16. (emojis cant fit into utf16)
   private int accumulator;         // counter for anything
   private int peep;                // next char in input stream
   private int charsRead;           // No. of chars read so far
   private int linesRead;           // No. of lines read so far
   public StringBuffer workspace;    // text accumulator
   private Stack<String> stack;      // parse token stack
   private int LENGTH;               // tape initial length

   // use ArrayLists instead with .add() .get(n) and .set(n, E)
   // ArrayList<StringBuffer> al=new ArrayList<StringBuffer>();
   private List<StringBuffer> tape;      // array of token attributes 
   private List<StringBuffer> marks;     // tape marks
   private int tapePointer;          // pointer to current cell
   private Reader input;             // text input stream
   private boolean eof;              // end of stream reached?
   private boolean flag;             // not used here
   private StringBuffer escape;    // char used to "escape" others "\\"
   private StringBuffer delimiter; // push/pop delimiter (default is "*")
   private boolean markFound;      // if the mark was found in tape
   
   /** make a new machine with a character stream reader */
   public Machine(Reader reader) {
     this.markFound = false; 
     this.LENGTH = 100;
     this.input = reader;
     this.eof = false;
     this.flag = false;
     this.charsRead = 0; 
     this.linesRead = 1; 
     this.escape = new StringBuffer("\\\\");
     this.delimiter = new StringBuffer("*");
     this.accumulator = 0;
     this.workspace = new StringBuffer("");
     this.stack = new Stack<String>();
     this.tapePointer = 0;
     this.tape = new ArrayList<StringBuffer>();
     this.marks = new ArrayList<StringBuffer>();
     for (int ii = 0; ii < this.LENGTH; ii++) {
       this.tape.add(new StringBuffer(""));
       this.marks.add(new StringBuffer(""));
     }

     try
     { this.peep = this.input.read(); } 
     catch (java.io.IOException ex) {
       System.out.println("read error");
       System.exit(-1);
     }
   }

   /** read one character from the input stream and 
       update the machine. */
   public void read() {
     int iChar;
     try {
       if (this.eof) { System.exit(this.accumulator); }
       this.charsRead++;
       // increment lines
       if ((char)this.peep == \'\\n\') { this.linesRead++; }
       this.workspace.append(Character.toChars(this.peep));
       this.peep = this.input.read(); 
       if (this.peep == -1) { this.eof = true; }
     }
     catch (IOException ex) {
       System.out.println("Error reading input stream" + ex);
       System.exit(-1);
     }
   }

   /** increment tape pointer by one */
   public void increment() {
     this.tapePointer++;
     if (this.tapePointer >= this.LENGTH) {
       this.tape.add(new StringBuffer(""));
       this.marks.add(new StringBuffer(""));
       this.LENGTH++;
     }
   }
   
   /** remove escape character  */
   public void unescapeChar(char c) {
     if (workspace.length() > 0) {
       String s = this.workspace.toString().replace("\\\\"+c, c+"");
       this.workspace.setLength(0); workspace.append(s);
     }
   }


  /*
  Perl code. Also allows multiple escape chars
  eg: unescape "+-xyz";
  # this walks the string and determines if the given char 
  # is already escaped or not # eg "ab\cab\\cab\c"
  # allow multiple chars for escape/unescape
  sub unescapeChar {
    my $self = shift;   # the machine
    my $chars = shift;  # list of chars to escape
    my $cc = "";
    my $result = "";
    my $isEscaped = $false;

    foreach $cc (split(//,$self->{"work"})) {
      if (($isEscaped == $false) && ($cc eq $self->{"escape"})) {
        $isEscaped = $true;
      } else { $isEscaped = $false; }
      # remove the last escape character (usually backslash)
      # this allows multiple chars for escaping
      if (($isEscaped == $true) && (index($chars, $cc) != -1)) {
        $result =~ s/.$//s; 
      }
      $result .= $cc; 
    }
    $self->{"work"} = $result;
  }

  */

   /** add escape character  */
   public void escapeChar(char c) {
     if (workspace.length() > 0) {
       String s = this.workspace.toString().replace(c+"", "\\\\"+c);
       workspace.setLength(0); workspace.append(s);
     }
   }

   /** whether trailing escapes \\\\ are even or odd */
   // untested code. check! eg try: add "x \\\\"; print; etc
   public boolean isEscaped(String ss, String sSuffix) {
     int count = 0; 
     if (ss.length() < 2) return false;
     if (ss.length() <= sSuffix.length()) return false;
     if (ss.indexOf(this.escape.toString().charAt(0)) == -1) 
       { return false; }

     int pos = ss.length()-sSuffix.length();
     while ((pos > -1) && (ss.charAt(pos) == this.escape.toString().charAt(0))) {
       count++; pos--;
     }
     if (count % 2 == 0) return false;
     return true;
   }

   /* a helper to see how many trailing \\\\ escape chars */
   private int countEscaped(String sSuffix) {
     String s = "";
     int count = 0;
     int index = this.workspace.toString().lastIndexOf(sSuffix);
     // remove suffix if it exists
     if (index > 0) {
       s = this.workspace.toString().substring(0, index);
     }
     while (s.endsWith(this.escape.toString())) {
       count++;
       s = s.substring(0, s.lastIndexOf(this.escape.toString()));
     }
     return count;
   }

   /** reads the input stream until the workspace end with text */
   // can test this with
   public void until(String sSuffix) {
     // read at least one character
     if (this.eof) return; 
     this.read();
     while (true) {
       if (this.eof) return;
       if (this.workspace.toString().endsWith(sSuffix)) {
         if (this.countEscaped(sSuffix) % 2 == 0) { return; }
       }
       this.read();
     }
   }

   /** pop the first token from the stack into the workspace */
   public Boolean pop() {
     if (this.stack.isEmpty()) return false;
     this.workspace.insert(0, this.stack.pop());     
     if (this.tapePointer > 0) this.tapePointer--;
     return true;
   }

   /** push the first token from the workspace to the stack */
   public Boolean push() {
     String sItem;
     // dont increment the tape pointer on an empty push
     if (this.workspace.length() == 0) return false;
     // need to get this from this.delim not "*"
     int iFirstStar = 
       this.workspace.indexOf(this.delimiter.toString());
     if (iFirstStar != -1) {
       sItem = this.workspace.toString().substring(0, iFirstStar + 1);
       this.workspace.delete(0, iFirstStar + 1);
     }
     else {
       sItem = this.workspace.toString();
       this.workspace.setLength(0);
     }
     this.stack.push(sItem);     
     this.increment(); 
     return true;
   }

   /** swap current tape cell with the workspace */
   public void swap() {
     String s = new String(this.workspace);
     this.workspace.setLength(0);
     this.workspace.append(this.tape.get(this.tapePointer).toString());
     this.tape.get(this.tapePointer).setLength(0);
     this.tape.get(this.tapePointer).append(s);
   }

   /** save the workspace to file "sav.pp" */
   public void writeToFile() {
     try {
       File file = new File("sav.pp");
       Writer out = new BufferedWriter(new OutputStreamWriter(
          new FileOutputStream(file), "UTF8"));
       out.append(this.workspace.toString());
       out.flush(); out.close();
     } catch (Exception e) { 
       System.out.println(e.getMessage());
     }
   }

   public void goToMark(String mark) {
     this.markFound = false; 
     for (var ii = 0; ii < this.marks.size(); ii++) {
       if (this.marks.get(ii).toString().equals(mark)) { 
         this.tapePointer = ii; this.markFound = true; 
       }
     }
     if (this.markFound == false) { 
       System.out.print("badmark \'" + mark + "\'!"); 
       System.exit(1);
     }
   }

   /* remove existing marks with the same name and add new mark */
   public void addMark(String mark) {
     this.markFound = false; 
     for (var ii = 0; ii < this.marks.size(); ii++) {
       if (this.marks.get(ii).toString().equals(mark)) { 
         this.marks.get(ii).setLength(0); 
       }
     }
     this.marks.get(this.tapePointer).setLength(0);
     this.marks.get(this.tapePointer).append(mark);
   }

   /** parse/check/compile the input */
   public void parse(InputStreamReader input) {
     //this is where the actual parsing/compiling code should go 
     //but this means that all generated code must use
     //"this." not "mm." see nom.todart.pss etc
   }

  public static void main(String[] args) throws Exception { 
    boolean jumptoparse = false;
    String temp = "";    
    Machine mm = new Machine(new InputStreamReader(System.in)); \n';

    # save the code in the current tape cell
    put; clear;

    #---------------------
    # check if the script correctly parsed (there should only
    # be one token on the stack, namely "commandset*" or "command*").
    pop; pop;

    "commandset*", "command*" {
      clear;
      # indent generated code (6 spaces) for readability.
      add "      "; get; 
      replace "\n" "\n      "; put; clear;
      # restore the java preamble from the tape
      ++; get; --;
      add '
    script: 
    while (!mm.eof) {\n'; get;

      add "\n    }";
      add "\n    System.exit(mm.accumulator);";
      add "\n  }";
      add "\n}\n";
      # put a copy of the final compilation into the tapecell
      # so it can be inspected interactively.
      put; print; clear; zero; quit;
    }

    "beginblock*commandset*", "beginblock*command*" {
      clear; 
      # indent begin block code  
      add "    "; get; 
      replace "\n" "\n    "; put; clear; 
      # indent main code for readability.
      ++; add "      "; get; 
      replace "\n" "\n      "; put; clear; --;
      # get java preamble from tape
      ++; ++; get; --; --;

      get; add "\n"; ++; 
      # a labelled loop for "quit" (but quit can just exit?)
      add "    script: \n";
      add "    while (!mm.eof) {\n"; get;
      add "\n    }";
      add "\n    System.exit(mm.accumulator);";
      add "\n  }";
      add "\n}\n";
      # put a copy of the final compilation into the tapecell
      # for interactive debugging.
      put; print; clear; zero; quit;
    }

    push; push;
    # try to explain some more errors
    unstack;
    B"parse>" {
      put; 
      clear; 
      add "[error] pep syntax error:\n";
      add "  The parse> label cannot be the 1st item \n"; 
      add "  of a script \n"; 
      zero; a+; print; quit;
    }
    put; clear;

    clear;
    add "After compiling with 'translate.java.pss' (at EOF): \n ";
    add "  parse error in input script. \n ";
    print; clear; 
    unstack; put; clear;
    add "Parse stack: "; get; add "\n";
    add "   * debug script ";
    add "   >> pep -If script -i 'some input' \n ";
    add "   *  debug compilation. \n ";
    add "   >> pep -Ia asm.pp script' \n ";
    # non-zero accumulator for error condition
    zero; a+; print; 
    quit;

  } # not eof

  # there is an implicit .restart command here (jump start)

