Feeds

JavaCC: Don't talk back

Implementing a parser-analyser

3 Big data security analytics techniques

A list of grammar productions

The final section of the .jj file defines the context-free grammar that is a set of grammar productions (rules) that define what the legal syntax for the input data should be. In the case of a parser to analyse a programming language this would contain the grammar rules for that language. In our case we will define a very simple set of rules that define a very simple syntax. The syntax will allow two integers to be added or subtracted. Thus the set of productions for this grammar might be:

expr := element + element
|  element – element
number := digit
digit := [0 – 9]
EOL = “\n”

We will define this grammar in a JavaCC .jj file. The resultant parser will check the input has a legal format as defined by the grammar file and will then print out the result of the addition or subtraction.

The following is the list of grammar productions defined for our simple example in JavaCC syntax:

void calc():
{ 
    double x;
}
{
   x=expr() <EOL> {System.out.println(x); }   
|  <EOF> {System.exit(0); }
}

double expr():
{
   double a;
   double b;
}
{
   a = element() (
   "+" b=element() {a += b; }
|  "-" b=element() {a -= b;}
   )
   { return a; }
}


double element():
{
   Token t;
}
{
   t=<NUMBER> { System.out.println("t: " + t) ; return Double.parseDouble(t.toString()); }
}

This states that:

A calc element is comprised of

  • an expression followed by an End Of Line; or
  • an End Of File marker.

If the expression is found then the value returned is printed out while if an end of file marker is found then the program terminates. Note that in this way call backs can be made to any arbitrary Java code. It is via this mechanism that the parser can be linked into your own code.

Note that the format used to define the grammar is a translation of BNF (Backus-Naur-Form) notation used with the JavaCC compiler.

The remaining grammar productions define what comprises an expression and an element. In particular an expression is defined to be either:

  • an element,
  • followed by either a “+” and another element, or
  • a “-“ and another element.

Note in our case we also specify that the two elements should be added or subtracted and the result returned from this production.

Finally an element is defined as being any type of NUMBER. Note that the result of something being a number is that we return that number as a double value.

Generating the Parser

We are now in a position to generate our Java implementation parser class. This is done using the JavaCC program. For example:

JavaCC -NOSTATIC -OUTPUT_DIRECTORY=source SimpleCalculator.jj

The options included on the command line to the JavaCC program specify:

  • -NOSTATIC – which generates a parser with instance methods (the default is to generate a static class with static methods).
  • -OUTPUT_DIRECTORY – which tells the JavaCC compiler where to put the generated classes (the default is the current directory)
  • SimpleCalculator.jj – which is the file containing the specification of the JavaCC grammar.

With the javacc.jar file on the classpath, the resulting output generated is:

Java Compiler Compiler Version 4.0 (Parser Generator)
(type "JavaCC" with no arguments for help)
Reading from file SimpleCalculator.jj . . .
File "TokenMgrError.java" does not exist.  Will create one.
File "ParseException.java" does not exist.  Will create one.
File "Token.java" does not exist.  Will create one.
File "SimpleCharStream.java" does not exist.  Will create one.
Parser generated with 0 errors and 1 warnings.

This generates 7 classes as listed below:

ParseException.java
SimpleCalculator.java
SimpleCalculatorConstants.java
SimpleCalculatorTokenManager.java
SimpleCharStream.java
Token.java
TokenMgrError.java

As a combination these classes define the SimpleCalculator parser. If you are interested, the primary class to look at is the SimplerCalculator.java class, as it is this one which is both the entry point and the primary implementation of the parser.

To execute this parser we first need a simple file to process. In our case the file is called test.dat and contains the following:

2 + 3

4 – 6

We can compile the classes for the classes generated by JavaCC (again with the javacc.jar on the classpath). The parser can then be run by issuing the following:

java SimpleCalculator

The result of running the parser is:

Reading: C:\experis\projects\javacc\test.dat
t: 2
t: 3
5.0
t: 4
t: 6
-2.0

Summary

JavaCC is a very powerful tool that can be used to create parsers for a wide range of data formats, including RTF, Visual Basic, Java itself, HTML, MHEG-5, C, C++, Excel files, numeric formula, email headers etc. It can be easily integrated into your own code as well as being used in stand-alone fashion as illustrated here.

Complete Listing of SimpleCalculator.jj

options {
  LOOKAHEAD=2;
}

PARSER_BEGIN(SimpleCalculator)
import java.io.*;

public class SimpleCalculator {
   public static void main(String [] args) throws Exception {
      File file = new File("test.dat");
      System.out.println("Reading: " + file.getAbsolutePath());
      FileReader reader = new FileReader("test.dat") ;
      SimpleCalculator sc = new SimpleCalculator(reader);
      while (true) {
         sc.calc();
      }
   }
}

PARSER_END(SimpleCalculator)

SKIP:
{
   " "
|   "\r"
}

TOKEN:
{
   <NUMBER:(<DIGIT>)>
|  <DIGIT:["0"-"9"]>
|  <EOL: "\n" >

}   

void calc():
{ 
    double x;
}
{
   x=expr() <EOL> {System.out.println(x); }   
|  <EOF> {System.exit(0); }
}

double expr():
{
   double a;
   double b;
}
{
   a = element() (
   "+" b=element() {a += b; }
|  "-" b=element() {a -= b;}
   )
   { return a; }
}


double element():
{
   Token t;
}
{
   t=<NUMBER> { System.out.println("t: " + t) ; return Double.parseDouble(t.toString()); }
}

Top three mobile application threats

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.