Monday, March 11, 2019

Insert content with Apache POI

Intro

Everybody knows POI! It is the best library available to create Excel documents in Java. it is really good and can do a lot of things easily. But recently I wanted to do something not so simple. I wanted to use an existing xlsx file as a template and insert some data into it at some position. In this post I'll show why it wasn't so easy with the version 4.0.0. 

Available Tools

In order to insert some content in the middle of an existing Excel document all the rows from the insertion point to the last row must be moved down by n rows where n is the number of rows being inserted. It is also preferable to keep the formatting of the rows being moved. 
Actually POI has some API to help with these issues. The Sheet interface has the method Sheet.shiftRows(int, int, int) which can be used like this:

sheet.shiftRows(insertPosition, sheet.getLastRowNum(), n);

There is a more advanced method Sheet.shiftRows(int, int, int, boolean, boolean) but for XSSF the boolean parameters make no difference as I could find in the code. For HSSF these parameters could be important.
So this method is supposed to do the trick! What's the problem?

Problem

When I tried to use this method I ran into the dreaded problem "We found a problem with some content..."

It was actually very frustrating. I quickly found that somehow the rows were moved incorrectly. LibreOffice was able to at least open the file but the formatting and the inserted content were all wrong. 

Solution

This took 2-3 hours of diligent digging :-). As it turned out POI even in the latest 4.0.0 (most likely even 4.0.1) has a bug and cannot move the rows correctly. Here is the link to the bug:

https://bz.apache.org/bugzilla/show_bug.cgi?id=57423

The method updates the row references in the sheet xml for example 
<row r="139"
but doesn't update the cell references
<c r="A138" s="1"/> - wrong
The cell must point to the upper level row in this case it must point to the row 139 like this:<c r="A139" s="1"/> - right
(you can take a look at the sheet xml if you unpack the xlsx file)

The bug shows a workaround. I created a method that implements the shifting and the workaround in one method:

public static void xssfShiftRows(Sheet shint firstRowint last    Rowint shiftN) {
    int firstShiftedRow = firstRow + shiftN;
    int lastShiftedRow = lastRow + shiftN;
    sh.shiftRows(firstRowlastRowshiftNtruetrue);
    /*
     * This code is a workaround for the bug
     * https://bz.apache.org/bugzilla/show_bug.cgi?id=57423
     * In the sheet xml the row references are updated like this:
     * <row r="139"
     * but the cell references are incorrect
     * <c r="A138" s="1"/>
     *
     * The number in the row 139 must match the number in the cell A139.
     * This code manually updates these links.
     */
               for (int nRow = firstShiftedRownRow <= lastShiftedRownRow++)  {
    final Row row = sh.getRow(nRow);
    if (row != null) {
    String msg = "Row[rownum=" + row.getRowNum()
    + "] contains cell(s) included in a multi-cell array         formula.  "
    + "You cannot change part of an array.";
    for (Cell c : row) {
        ((XSSFCell) c).updateCellReferencesForShifting(msg);
    }
  }
}
}