Software Development blog

Monday, November 11, 2019

Antlr 4 grammar with spaces

Intro

This post discusses the difficulties and solutions of creating an Antlr grammar without removing spaces from the text being parsed. But first, a brief explanation of what Antlr really is.

Here is the link https://github.com/antlr/antlr4. Antlr can take a text with a well-defined rules and create a syntax tree out of it. It is basically what it is so good for. The user must create the grammar of the DSL (domain-specific language) and feed it to the tool. The tool generates the parser code. It can be a Java code or Javascript code. Antlr has no less than 5 targets (supported languages). This generated parser code can parse the text and create an abstract syntax tree. Then the user can use the generated visitor (it is the well-known visitor pattern) to walk the AST to get something out of it.

Problem

Most Antlr grammars remove the whitespace in the lexer like this:

WS
	: [ \r\n\t] + -> skip
	;

This definitely makes parsing somewhat easier as it transforms this expression:
q1 + q2 / f2 - ( 4 + 5 )
into a much simpler form:
q1+q2/f2-(4+5)
Without these spaces it is much easier to write the rules for the Antlr grammar.

But in some cases, it is not possible to remove all spaces because they may be part of a variable name. In the case, I was working on variables were with single quotes like this: 'Variable 1'. Spaces were part of the variable name. The grammar had to take into account that a space could be part of the variable name but also could be a "useless" character that the user put in for readability like a space after a comma. Special handling of spaces is required to parse this expression:

'Variable ABC' * 5 + 'Variable Q'*( 1 + 'Variable W')

Solution

The idea is as follows: if we cannot remove spaces from the text we have to "pull" spaces into parenthesis, operation signs, etc.

The rules for the operators and parenthesis must change to pull in spaces. Firstly, we need to declare the tokens for spaces:

ANY_SPACE: SINGLE_SPACE+;

SINGLE_SPACE: ' ';

We cannot use SINGLE_SPACE* because the Antlr tool correctly shows that SINGLE_SPACE* can match an empty string.

Then incorporate these tokens into rules for operators (MUL is '*' and Div is '/', rParameterSeparator is for the comma that separates function actual parameters):

rLeftVarParenthesis: L_VAR_PAREN ANY_SPACE | L_VAR_PAREN;

rArithmeticOperatorMulDiv: ANY_SPACE (MUL | DIV) ANY_SPACE | ANY_SPACE (MUL | DIV) | (MUL | DIV) ANY_SPACE | (MUL | DIV);

rParameterSeparator: ANY_SPACE ',' ANY_SPACE | ANY_SPACE ',' | ',' ANY_SPACE | ',';

Now, this grammar can handle variables with spaces in their names. The visitor when visiting the operator rule can join the text from the children and use trim to remove all unnecessary space. In Javascript it may look like this:

return ctx.children.join('').trim();

This simple and elegant solution allows for parsing variable names with spaces.

Thursday, May 23, 2019

Flexible Styling with Varying Criteria for POI-created documents

Intro

This post explains the difficulties with applying styles to a document based on varying criteria and offers a solution. One of the common tasks in Java programming is to create an Excel report from data that is stored in the database. In these cases, Java programmers use POI :-). These reports often have strict rules for styling and data formats. And the data is usually the result of a SQL query execution.

The goal is usually to turn this:

into this:

Problem

The problem here is that styles can be applied based on varying criteria. For example, numbers may be formatted with different numbers of digits after the decimal separator. Dates may need to be formatted with a special pattern. And the last but not the least, some columns or rows may need to be selected with color.

Applying these styles "directly" in the code will lead to a lot of code that is hard to read and maintain. Every time the style of the report changes a lot of code may need to be changed.

It would be great if styles could have a "pattern" with them like a predicate and could be applied depending on this predicate. But how to implement his?

Solution

The proposed solution solves the predicate problem. I used this solution with POI to generate reports so mostly I'll assume a POI-like library is used.

Style Map Generation

Before any document-creating operations, a style map must be created. This style map will contain the styles mapped to keys. The key must include all values which may affect the style. For example the type of the value (in Java), the row number, the column number and the value itself.

This map will look like this:

Key Sequence Generation

The idea is to generate a List (java.util.List) of keys to try in the correct order. This List is used to define the style of the object. Every key from the List is tried in the iteration order until one has a value in the style map.

The general idea is to start with more detailed keys with all the values filled in and end with the most general style that can be applied to any cell.

For example (ri is row index, ci column index):

List<CellStyleKey> allKeys = new ArrayList<>();

if (val != null) {

if ((ri != null) || (ci != null)) {

allKeys.add(new CellStyleKey(val, val.getClass(), ri, ci));

allKeys.add(new CellStyleKey(val, val.getClass(), ri, null));

allKeys.add(new CellStyleKey(val, val.getClass(), null, ci));

allKeys.add(new CellStyleKey(val, val.getClass(), null, null));

allKeys.add(new CellStyleKey(null, val.getClass(), ri, ci));

allKeys.add(new CellStyleKey(null, val.getClass(), ri, null));

allKeys.add(new CellStyleKey(null, val.getClass(), null, ci));

allKeys.add(new CellStyleKey(null, val.getClass(), null, null));

} else {

allKeys.add(new CellStyleKey(val, val.getClass(), null, null));

allKeys.add(new CellStyleKey(null, val.getClass(), null, null));

}

allKeys.add(new CellStyleKey(null, null, ri, ci));

allKeys.add(new CellStyleKey(null, null, ri, null));

allKeys.add(new CellStyleKey(null, null, null, ci));

allKeys.add(new CellStyleKey(null, null, null, null));

In this code, the most detailed key is the one that contains all values: the value, the class, the row number, and the column number. The least detailed key contains no values.

Style Map Generation

For every key in the list (the list should be as short as possible) the get operation is performed on the style map. If the style map has a value for this key it is the style.

Conclusion

This scheme allows for a simple yet flexible assignment of styles to rows/columns/values/classes in an Excel document.

Monday, March 11, 2019

Insert content with Apache POI

Intro

Everybody knows POI! It is the best library available to create Excel documents in Java. it is really good and can do a lot of things easily. But recently I wanted to do something not so simple. I wanted to use an existing xlsx file as a template and insert some data into it at some position. In this post I'll show why it wasn't so easy with the version 4.0.0.

Available Tools

In order to insert some content in the middle of an existing Excel document all the rows from the insertion point to the last row must be moved down by n rows where n is the number of rows being inserted. It is also preferable to keep the formatting of the rows being moved.

Actually POI has some API to help with these issues. The Sheet interface has the method Sheet.shiftRows(int, int, int) which can be used like this:

sheet.shiftRows(insertPosition, sheet.getLastRowNum(), n);

There is a more advanced method Sheet.shiftRows(int, int, int, boolean, boolean) but for XSSF the boolean parameters make no difference as I could find in the code. For HSSF these parameters could be important.

So this method is supposed to do the trick! What's the problem?

Problem

When I tried to use this method I ran into the dreaded problem "We found a problem with some content..."

It was actually very frustrating. I quickly found that somehow the rows were moved incorrectly. LibreOffice was able to at least open the file but the formatting and the inserted content were all wrong.

Solution

This took 2-3 hours of diligent digging :-). As it turned out POI even in the latest 4.0.0 (most likely even 4.0.1) has a bug and cannot move the rows correctly. Here is the link to the bug:

https://bz.apache.org/bugzilla/show_bug.cgi?id=57423

The method updates the row references in the sheet xml for example

<row r="139"

but doesn't update the cell references

<c r="A138" s="1"/> - wrong

The cell must point to the upper level row in this case it must point to the row 139 like this:<c r="A139" s="1"/> - right

(you can take a look at the sheet xml if you unpack the xlsx file)

The bug shows a workaround. I created a method that implements the shifting and the workaround in one method:

public static void xssfShiftRows(Sheet sh, int firstRow, int last Row, int shiftN) {

int firstShiftedRow = firstRow + shiftN;

int lastShiftedRow = lastRow + shiftN;

sh.shiftRows(firstRow, lastRow, shiftN, true, true);

/*

* This code is a workaround for the bug

* https://bz.apache.org/bugzilla/show_bug.cgi?id=57423

* In the sheet xml the row references are updated like this:

* <row r="139"

* but the cell references are incorrect

* <c r="A138" s="1"/>

*

* The number in the row 139 must match the number in the cell A139.

* This code manually updates these links.

*/

for (int nRow = firstShiftedRow; nRow <= lastShiftedRow; nRow++) {

final Row row = sh.getRow(nRow);

if (row != null) {

String msg = "Row[rownum=" + row.getRowNum()

+ "] contains cell(s) included in a multi-cell array formula. "

+ "You cannot change part of an array.";

for (Cell c : row) {

((XSSFCell) c).updateCellReferencesForShifting(msg);

}

}

}

}

Friday, January 25, 2019

Spring Custom Serializers with @JsonIdentityInfo

Intro

Serialization/Deserialization from/to JSON in Spring is widely used in modern Spring-based applications. It is based on Jackson. Jackson can serialize any POJO into JSON and vice versa with ease. This code is well written. I never encountered any issues. It gets more difficult when custom serializers are involved. This post shows how to use custom serializers in Spring with autowired fields.

Defining a Custom Serializer

Usually a custom serializer for a class is inherited from com.fasterxml.jackson.databind.ser.std.StdSerializer. This class defines some constructors but the framework only need a no-argument constructor that should call the superclass, something like this:

public CustomSerializer() {

this(null);

}

public CustomSerializer(Class<ObjectToSerialize> t) {

super(t);

}

Then there is the main method that must be implemented to actually write the JSON:

@Override

public void serialize(ObjectToSerialize value, JsonGenerator gen, SerializerProvider provider) throws IOException {

gen.writeStartObject();

...

provider.defaultSerializeField("some field", value.getField(), gen);

...

gen.writeEndObject();

}

When the serializer class is created it must be registered as the serializer for ObjectToSerialize. This can be done with the @JsonSerialize annotation on the class:

@JsonSerialize(using = CustomSerializer.class)

public class ObjectToSerialize {

Now Jackson will be using this custom serializer for all instances of this class. If necessary a custom deserializer can be written by subclassing com.fasterxml.jackson.databind.deser.std.StdDeserializer<T>

Circular References and @JsonIdentityInfo

For most commercial applications with Spring and Hibernate the issue of circular references manifests itself sooner or later. Here is a simple example.

We have 2 classes:

public class Building {

@Id

@GeneratedValue(<parameters>)

private Long id;

private Set<Apartment> apartments;

}

public class Apartment {

@Id

@GeneratedValue(<parameters>)

private Long id;

private Building building;

}

If we try to serialize one building that has at least one apartment we get a StackOverflowException.

Jackson has a solution to this problem - @JsonIdentityInfo.

If the annotation @JsonIdentityInfo is added to the class like this:

@JsonIdentityInfo(generator = ObjectIdGenerators.PropertyGenerator.class, property = "id")

public class ObjectToSerialize {

then any ObjectMapper will break the cycle by replacing every occurrence of the object except the first with its id. Like this:

{

"id": 1,

"apartments": [

{

"id": 2,

"building": 1 - the object is replaced with its ID

{

"id": 3,

"building": 1 - the object is replaced with its ID

}

]

}

These are the tools Jackson provides to customize serialization and deal with circular references.

JSON Structure Problem

Problem

@JsonIdentityInfo works well for simple applications. But as the application grows in the default form it could affect the structure of the JSON. For example if some method returns the buildings and the districts in one response here is what may occur:

{

"buildings": [

{

"id": 1,

"apartments": [

{

"id": 2,

"building": 1 - the object is replaced with its ID

{

"id": 3,

"building": 1 - the object is replaced with its ID

}

]

}

"districts": [

{

"buildings": [

{

"id": 5,

...

1, - the object is replaced with its ID

{

"id": 6,

...

}

]

}

]

}

This replacement could be quite unpredictable from the parser's point of view. Within an array it could encounter objects and IDs. And this could happen for any field and any object. Any object where the class is annotated with @JsonIdentityInfo is replaced with its ID if the serialization provider finds it more than once. Every second, third, fourth etc. instance with the same ID found by the serialization provider is replaced with its ID.

Solution

The solution here is to use a separate ObjectMapper to write parts of the JSON. The lists of already seen IDs are stored in the serialization provider which is created by ObjectMapper. By creating a separate ObjectMapper (with a probably different configuration) the lists are reset.

For a "composite" JSON result which returns different objects types a custom serializer can be written. In this custom serializer the "header" is written manually with JsonGenerator methods and when the correct level in the JSON is reached we create a new ObjectMapper and write a much better looking JSON.

{

"buildings": [ - create a new ObjectMapper

{

"id": 1,

"apartments": [

{

"id": 2,

"building": 1 - the object is replaced with its ID

{

"id": 3,

"building": 1 - the object is replaced with its ID

}

]

}

"districts": [ - create a new ObjectMapper

{

"buildings": [

{

"id": 5,

...

{ - the object is written as a JSON Object not an ID

"id": 1,

...

{

"id": 6,

...

}

]

}

]

}

To write the JSON to the original generator we can use ObjectMapper.writeValueAsString and JsonGenerator.writeRawValue(String).

P.S. it is also possible to create a new serialization provider by means of DefaultSerializerProvider.createInstance(SerializationConfig, SerializerFactory) but it is potentially more complicated.

Custom Serializer Autowire Problem

Problem

We'd like to be able to use @Autowire in our custom serializers. It is one of Spring's best features! Actually it works if the default ObjectMapper is used. But if we use the solution to the JSON structure problem it doesn't work for custom serializers instantiated by our own object mappers.

Solution

Our own object mappers must be configured with a special HandlerInstantiator:

// try to use the default configuration as much as possible

ObjectMapper om = Jackson2ObjectMapperBuilder.json().build();

// This instantiator will handle autowiring properties into the custom serializers

om.setHandlerInstantiator(

new SpringHandlerInstantiator(this.applicationContext.getAutowireCapableBeanFactory()));

If the custom object mappers are created inside another custom serializer which is created by the default ObjectMapper then it can autowire the ApplicationContext.