How to Create a Programming Language Using ANTLR4

  Programming

There are thousands of programming languages available today, and new ones show up every year. If you are a programmer, at some point in your life, you must have wondered if you too could ever create your very own language, one that conforms to your ideals. Well, thanks to ANTLR v4, doing so has become easier than ever. In this tutorial, I’ll show you how to create a very simple programming language using ANTLR4 and Java.

Project Setup

I’m going to assume that you already have Java 7 installed on your computer, along with Eclipse.

Use the following command to download ANTLR v4.5.3 as a JAR file:

wget http://www.antlr.org/download/antlr-4.5.3-complete.jar

After you place it in the lib folder of your project, in the Package Explorer view, right click on it and select Build Path > Add to Build Path.

Create a Grammar File

To keep this tutorial short, we’ll be creating a very simple programming language. Let’s call it GYOO. Here’s a sample program in GYOO:

begin
    let a be 5
    let b be 10
    add 3 to b
    add b to a
    add a to b
    print b
    print 3
end

The program above demonstrates all the features that we are going to support in this language. It’s going to have three types of statements:

  • an assign statement
  • an add statement
  • a print statement

It can handle only positive numbers, and programs must begin and end with the begin and end keywords.

Accordingly, create a new file call GYOO.g4 inside the src folder and add the following grammar to it:

grammar GYOO;
program   : 'begin' statement+ 'end';
          
statement : assign | add | print ;

assign    : 'let' ID 'be' (NUMBER | ID) ;
print     : 'print' (NUMBER | ID) ;
add       : 'add' (NUMBER | ID) 'to' ID ;

ID     : [a-z]+ ;
NUMBER : [0-9]+ ;
WS     : [ \n\t]+ -> skip;

The grammar should be fairly intuitive to you if you are familiar with BNF.

Generate Parser and Lexer

Now that we have a grammar file, we can pass it as an input to the org.antlr.v4.Tool class and generate a parser and lexer for it. You could use the ANTLR Eclipse plugin to do so. In this tutorial, however, I’ll do it manually on the command line:

java -cp ".:../../../../lib/antlr-4.5.3-complete.jar:$CLASSPATH" \
 org.antlr.v4.Tool -package com.progur.langtutorial GYOO.g4

Make sure that you specify your own classpath and package name while running the command. Once the command completes successfully, you’ll have several new Java classes in your src folder.

Create a Custom Listener

You must now create a new Java class that is a subclass of the GYOOBaseListener class. Call it MyListener.

Inside MyListener, you need to tell ANTLR’s parser what it should do every time it encounters a specific type of token. For example, every time it encounters an assign statement, it must assign a value to a variable. You can do so by overriding the enterAssign() and exitAssign() methods. There are similar methods for the print and add statements too.

MyListener also needs a Map object that can store the names and values of all the variables.

Accordingly, add the following code to the class:

public class MyListener extends GYOOBaseListener {

    private Map<String, Integer> variables;
    
    public MyListener() {
        variables = new HashMap<>();
    }
    
    @Override
    public void exitAssign(AssignContext ctx) {
        // This method is called when the parser has finished
        // parsing the assign statement
        
        // Get variable name
        String variableName = ctx.ID(0).getText();
        
        // Get value from variable or number
        String value = ctx.ID().size() > 1 ? ctx.ID(1).getText() 
                : ctx.NUMBER().getText();
        
        // Add variable to map		
        if(ctx.ID().size() > 1)
            variables.put(variableName, variables.get(value));
        else
            variables.put(variableName, Integer.parseInt(value));
    }
    
    @Override
    public void exitAdd(AddContext ctx) {
        // This method is called when the parser has finished
        // parsing the add statement
        
        String variableName = ctx.ID().size() > 1 ? ctx.ID(1).getText() 
                : ctx.ID(0).getText();
        int value = ctx.ID().size() > 1 ? variables.get(ctx.ID(0).getText()) 
                : Integer.parseInt(ctx.NUMBER().getText());
        
        variables.put(variableName, variables.get(variableName) + value);
    }
    
    @Override
    public void exitPrint(PrintContext ctx) {
        // This method is called when the parser has finished
        // parsing the print statement
        
        String output = ctx.ID() == null ? ctx.NUMBER().getText() 
                : variables.get(ctx.ID().getText()).toString();
        System.out.println(output);
    }

}

Run the Parser

At this point, our programming language is almost ready. However, we still need to pass an input file to it and break it down into tokens. In order to that, add a new class called Main to your project and add a main() method to it.

Inside the method, you must first create an ANTLRInputStream object and pass a FileInputStream to it. Next, you must create a GYOOLexer object based on the InputStream. You can now create a stream of tokens using the lexer, and pass it as an input to a GYOOParser object.

You must, of course, not forget to add the MyListener class as a listener to the GYOOParser object.

At this point, you can call the program() method to start the parsing.

public static void main(String[] args) {
    try {
        ANTLRInputStream input = new ANTLRInputStream(
            new FileInputStream(args[0]));    

        GYOOLexer lexer = new GYOOLexer(input);
        GYOOParser parser = new GYOOParser(new CommonTokenStream(lexer));
        parser.addParseListener(new MyListener());

        // Start parsing
        parser.program(); 
    } catch (IOException e) {
        e.printStackTrace();
    }
}

And now, if you pass the sample GYOO program you saw earlier as the input file, you should see the following output:

31
3

Conclusion

You now know how to create a simple programming language using ANTLR v4. In a future tutorial, I’ll show you how to create a more complex language that can have conditions and loops. If you have any questions, please do write a comment.

If you found this article useful, please share it with your friends and colleagues!