Fishing for POI

Creating Excel or Word files from Java...

3 Big data security analytics techniques

Have you ever needed to create a Microsoft Excel or Word file from Java? If you have, did you try to do it from scratch yourself? If you were working with Excel, did you end up creating comma-separated data in a file (CSV files)?

CSV files work very well as long as all you are interested in is the raw data. But what if you are interested in including formulas in your data or need to format your spreadsheet appropriately (with centring, colours, bold, italics etc)? Did you give up in the end, or work with a compromise solution?

As it happens I was recently asked this very question by some Java developers. They were working with a web-based application and wanted to create both Excel and Word files for senior management to access. These files would hold dynamically generated data relevant to their organisation. They therefore needed to be generated programmatically as and when required.

The POI Project

Creating Excel and Word files is hard, not least due to the complex nature of the file formats used by Microsoft for Excel and Word. That is, formats based upon Microsoft's OLE 2 Compound Document format. However, one of the Apache projects does all the hard work for you and makes it very easy to create, read and update Excel, and soon, Word files. This project is called POI. It has already been in development for several years, starting in April 2001. It is currently in version 2.5. You can download it here.

POI is actually more like a number of combined projects, which allow you to create both Word and Excel format files. It can be divided into several sub-projects, in particular:

* POIFS, the oldest and most stable part of the project, which provides facilities for reading and writing OLE 2 Compound Document files.

* HWPF, a port of the Microsoft Word 97 file format to pure Java.

* HSSF, a port of the Microsoft Excel 97(-2002) file format (BIFF8) to pure Java.

In this column we will focus on the use of POI to create Excel files using HSSF.

HSSF for Excel files.

You may wonder what HSSF stands for. Rather provocatively it stands for Horrible SpreadSheet Format (indeed many of the elements of POI have quite provocative names, e.g. DDF - Dreadful Drawing Format, which is the Microsoft Office Drawing format, otherwise known as Escher format).

HSSF provides a way to create Excel spreadsheets as well as to read, modify and write existing spreadsheets. All together it provides:

* low level structures for those with special needs

* an event model API for efficient read-only access

* a full user model API for creating, reading and modifying XLS files

Creating an Excel file with POI

Let's look at the basics of what is needed to create an Excel file. First, we need to create a workbook and add a sheet to it. We will then need to add values for cells, formulas and the like. The program presented in figure 1 illustrates how we can do this using POI.

The code required to create an Excel file using POI

The first thing that this program does is create a new HSSFWorkBook (in line 14). We then create a sheet from this book (in line 15). Finally, we obtain the first row in the sheet in line 16. Note that in POI rows and columns are numbered from zero. Thus, a cell A1 in Excel is obtained from row 0, and cell 0 (in that row).

We now have a workbook object, with a single sheet in it (called Sheet1). In turn the sheet contains a single row.

Lines 18 to 25 now create a set of cells to form that row in order to hold headings for each column in the sheet. Lines 27 to 35 provide data for our very simple spreadsheet. In all cases we obtain the cells in the sheet by accessing the appropriate row element and retrieving a cell form within that row (note that the createCell(short) method takes a short value rather than an int - we thus need to cast to a short when calling this method using an integer literal).

To set the value within the cell we use the setCellValue method. This is an overloaded method, which can take a Boolean, a string, a double, Date or Calendar object. It can thus represent most types of data held in a spreadsheet. It also helps define the type of the data (e.g. cells set with a string will be textual, whereas cells set with a numeric are numerical).

One cell deviates from this; cell 3 in row 1. In this cell we use the method setCellFormula(String). In fact, we pass the string "B2*C2" to this method. This sets the cell to hold a formula where its value will be calculated by executing this formula.

The setCellFormula method takes a string and uses it as a formula for the cell. In our case, the formula is very simple - it multiplies the value held in cell B2 (the second cell in row 1) with the value held in cell C2 (the third cell in row 1). Notice the cells we obtain are from row 1, and cells 1 and 2, but the formula references them as cells B2 and C2. Also notice that the formula does not include the "=" at the start - this will be automatically added by POI.

Once the Spreadsheet has been defined it can be written out to file. This is done in lines 37-39. This creates a FileOutputStream to a file called "text1.xls" and uses the write method on the Workbook object to write its contents out to file. The end result is that the file saved to the file system is now an Excel file that is indistinguishable from any other Excel file. In Figure 2 I have opened this file using Excel:

What the Excel file generated in fig 1 looks like in Excel

Top three mobile application threats

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
prev story


Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.