Fishing for POI

Creating Excel or Word files from Java...

New hybrid storage solutions

Have you ever needed to create a Microsoft Excel or Word file from Java? If you have, did you try to do it from scratch yourself? If you were working with Excel, did you end up creating comma-separated data in a file (CSV files)?

CSV files work very well as long as all you are interested in is the raw data. But what if you are interested in including formulas in your data or need to format your spreadsheet appropriately (with centring, colours, bold, italics etc)? Did you give up in the end, or work with a compromise solution?

As it happens I was recently asked this very question by some Java developers. They were working with a web-based application and wanted to create both Excel and Word files for senior management to access. These files would hold dynamically generated data relevant to their organisation. They therefore needed to be generated programmatically as and when required.

The POI Project

Creating Excel and Word files is hard, not least due to the complex nature of the file formats used by Microsoft for Excel and Word. That is, formats based upon Microsoft's OLE 2 Compound Document format. However, one of the Apache projects does all the hard work for you and makes it very easy to create, read and update Excel, and soon, Word files. This project is called POI. It has already been in development for several years, starting in April 2001. It is currently in version 2.5. You can download it here.

POI is actually more like a number of combined projects, which allow you to create both Word and Excel format files. It can be divided into several sub-projects, in particular:

* POIFS, the oldest and most stable part of the project, which provides facilities for reading and writing OLE 2 Compound Document files.

* HWPF, a port of the Microsoft Word 97 file format to pure Java.

* HSSF, a port of the Microsoft Excel 97(-2002) file format (BIFF8) to pure Java.

In this column we will focus on the use of POI to create Excel files using HSSF.

HSSF for Excel files.

You may wonder what HSSF stands for. Rather provocatively it stands for Horrible SpreadSheet Format (indeed many of the elements of POI have quite provocative names, e.g. DDF - Dreadful Drawing Format, which is the Microsoft Office Drawing format, otherwise known as Escher format).

HSSF provides a way to create Excel spreadsheets as well as to read, modify and write existing spreadsheets. All together it provides:

* low level structures for those with special needs

* an event model API for efficient read-only access

* a full user model API for creating, reading and modifying XLS files

Creating an Excel file with POI

Let's look at the basics of what is needed to create an Excel file. First, we need to create a workbook and add a sheet to it. We will then need to add values for cells, formulas and the like. The program presented in figure 1 illustrates how we can do this using POI.

The code required to create an Excel file using POI

The first thing that this program does is create a new HSSFWorkBook (in line 14). We then create a sheet from this book (in line 15). Finally, we obtain the first row in the sheet in line 16. Note that in POI rows and columns are numbered from zero. Thus, a cell A1 in Excel is obtained from row 0, and cell 0 (in that row).

We now have a workbook object, with a single sheet in it (called Sheet1). In turn the sheet contains a single row.

Lines 18 to 25 now create a set of cells to form that row in order to hold headings for each column in the sheet. Lines 27 to 35 provide data for our very simple spreadsheet. In all cases we obtain the cells in the sheet by accessing the appropriate row element and retrieving a cell form within that row (note that the createCell(short) method takes a short value rather than an int - we thus need to cast to a short when calling this method using an integer literal).

To set the value within the cell we use the setCellValue method. This is an overloaded method, which can take a Boolean, a string, a double, Date or Calendar object. It can thus represent most types of data held in a spreadsheet. It also helps define the type of the data (e.g. cells set with a string will be textual, whereas cells set with a numeric are numerical).

One cell deviates from this; cell 3 in row 1. In this cell we use the method setCellFormula(String). In fact, we pass the string "B2*C2" to this method. This sets the cell to hold a formula where its value will be calculated by executing this formula.

The setCellFormula method takes a string and uses it as a formula for the cell. In our case, the formula is very simple - it multiplies the value held in cell B2 (the second cell in row 1) with the value held in cell C2 (the third cell in row 1). Notice the cells we obtain are from row 1, and cells 1 and 2, but the formula references them as cells B2 and C2. Also notice that the formula does not include the "=" at the start - this will be automatically added by POI.

Once the Spreadsheet has been defined it can be written out to file. This is done in lines 37-39. This creates a FileOutputStream to a file called "text1.xls" and uses the write method on the Workbook object to write its contents out to file. The end result is that the file saved to the file system is now an Excel file that is indistinguishable from any other Excel file. In Figure 2 I have opened this file using Excel:

What the Excel file generated in fig 1 looks like in Excel

Reducing the cost and complexity of web vulnerability management

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
Profitless Twitter: We're looking to raise $1.5... yes, billion
We'll spend the dosh on transactions, biz stuff 'n' sh*t
prev story


Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.