Feeds

Fishing for POI

Creating Excel or Word files from Java...

Intelligent flash storage arrays

Have you ever needed to create a Microsoft Excel or Word file from Java? If you have, did you try to do it from scratch yourself? If you were working with Excel, did you end up creating comma-separated data in a file (CSV files)?

CSV files work very well as long as all you are interested in is the raw data. But what if you are interested in including formulas in your data or need to format your spreadsheet appropriately (with centring, colours, bold, italics etc)? Did you give up in the end, or work with a compromise solution?

As it happens I was recently asked this very question by some Java developers. They were working with a web-based application and wanted to create both Excel and Word files for senior management to access. These files would hold dynamically generated data relevant to their organisation. They therefore needed to be generated programmatically as and when required.

The POI Project

Creating Excel and Word files is hard, not least due to the complex nature of the file formats used by Microsoft for Excel and Word. That is, formats based upon Microsoft's OLE 2 Compound Document format. However, one of the Apache projects does all the hard work for you and makes it very easy to create, read and update Excel, and soon, Word files. This project is called POI. It has already been in development for several years, starting in April 2001. It is currently in version 2.5. You can download it here.

POI is actually more like a number of combined projects, which allow you to create both Word and Excel format files. It can be divided into several sub-projects, in particular:

* POIFS, the oldest and most stable part of the project, which provides facilities for reading and writing OLE 2 Compound Document files.

* HWPF, a port of the Microsoft Word 97 file format to pure Java.

* HSSF, a port of the Microsoft Excel 97(-2002) file format (BIFF8) to pure Java.

In this column we will focus on the use of POI to create Excel files using HSSF.

HSSF for Excel files.

You may wonder what HSSF stands for. Rather provocatively it stands for Horrible SpreadSheet Format (indeed many of the elements of POI have quite provocative names, e.g. DDF - Dreadful Drawing Format, which is the Microsoft Office Drawing format, otherwise known as Escher format).

HSSF provides a way to create Excel spreadsheets as well as to read, modify and write existing spreadsheets. All together it provides:

* low level structures for those with special needs

* an event model API for efficient read-only access

* a full user model API for creating, reading and modifying XLS files

Creating an Excel file with POI

Let's look at the basics of what is needed to create an Excel file. First, we need to create a workbook and add a sheet to it. We will then need to add values for cells, formulas and the like. The program presented in figure 1 illustrates how we can do this using POI.

The code required to create an Excel file using POI

The first thing that this program does is create a new HSSFWorkBook (in line 14). We then create a sheet from this book (in line 15). Finally, we obtain the first row in the sheet in line 16. Note that in POI rows and columns are numbered from zero. Thus, a cell A1 in Excel is obtained from row 0, and cell 0 (in that row).

We now have a workbook object, with a single sheet in it (called Sheet1). In turn the sheet contains a single row.

Lines 18 to 25 now create a set of cells to form that row in order to hold headings for each column in the sheet. Lines 27 to 35 provide data for our very simple spreadsheet. In all cases we obtain the cells in the sheet by accessing the appropriate row element and retrieving a cell form within that row (note that the createCell(short) method takes a short value rather than an int - we thus need to cast to a short when calling this method using an integer literal).

To set the value within the cell we use the setCellValue method. This is an overloaded method, which can take a Boolean, a string, a double, Date or Calendar object. It can thus represent most types of data held in a spreadsheet. It also helps define the type of the data (e.g. cells set with a string will be textual, whereas cells set with a numeric are numerical).

One cell deviates from this; cell 3 in row 1. In this cell we use the method setCellFormula(String). In fact, we pass the string "B2*C2" to this method. This sets the cell to hold a formula where its value will be calculated by executing this formula.

The setCellFormula method takes a string and uses it as a formula for the cell. In our case, the formula is very simple - it multiplies the value held in cell B2 (the second cell in row 1) with the value held in cell C2 (the third cell in row 1). Notice the cells we obtain are from row 1, and cells 1 and 2, but the formula references them as cells B2 and C2. Also notice that the formula does not include the "=" at the start - this will be automatically added by POI.

Once the Spreadsheet has been defined it can be written out to file. This is done in lines 37-39. This creates a FileOutputStream to a file called "text1.xls" and uses the write method on the Workbook object to write its contents out to file. The end result is that the file saved to the file system is now an Excel file that is indistinguishable from any other Excel file. In Figure 2 I have opened this file using Excel:

What the Excel file generated in fig 1 looks like in Excel

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.