OET News
Tutorial: Generate Excel Reports with Cocoon and POI
Posted on: 04/21/2003
by Daniel McQuillen and Steven Punte
Overview
This article demonstrates how to use Apache's powerful and open source Cocoon framework to generate Microsoft Excel files. Cocoon is based on Java servlets and can be used with any compliant servlet container, like that found in Tomcat or JBoss. Furthermore, Cocoon's well-planned architecture helps you structure your application code for efficient cross-team development.
So why should you care? If you're an open source advocate, you'll probably be interested in the fact that you can generate Excel spreadsheets with freely available tools. On top of that, you'll increase your Excel spreadsheet's power tremendously by populating it on-the-fly with real-time data.
Your users get an interface they know and like, fresh data at every request, and you get a reliable open source application on a Java-based platform. Now that's a nice combination.
Background : Spreadsheets for Corporate Reports
Generating and distributing data-intensive reports is a standard exercise for healthy corporate communications. EIS managers are often required to build a channel for this kind of information (e.g. inventory stats, productivity numbers, or financial data) and make sure that end-users have an easy and useful way of interacting with this data.
Often these kinds of reports are distributed as Excel spreadsheets. Excel is found on most corporate desktops and offers a responsive, intuitive interface for viewing and manipulating complex data sets. Although open source alternatives like OpenOffice (www.openoffice.org) are gaining ground, it's a fair bet that your end-user uses Excel and knows immediately how to make use of a data in an .xls file.
Creating useful reports, however, involves more than simply distributing a static spreadsheet. Often information must be pulled from various data sources like relational databases, text files and even web services and then integrated into a useful form
Although this implies an n-Tier solution, it doesn't have to be a proprietary one. Thanks to Apache's open source Cocoon and POI projects, Excel spreadsheets can be built dynamically within your J2EE container while taking advantage of your EIS resources.
POI, according to its authors, "consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format using pure Java." POI stands for Poor Obfuscation Implementation, a not-so-subtle jibe at Redmond's approach to archive formats for OLE documents.
Furthemore, by leveraging Cocoon's architecture and emphasis on "separation of concerns," a cross-discplinary team can build and maintain the application effectively.
A Hypothetical Business Problem
To illustrate the use of Excel and Cocoon, we'll develop a reporting solution for an imaginary company. ABC Computer Parts wants to allow our Sales department to view past orders. We'd like a simple web interface for selecting the orders, but we want to display the actual invoices in Excel format.
Here's a list of requirements:
- Access report over the Web
- Configure report by selecting the relevant customer
- View spreadsheet in the browser
- Spreadsheet should have up-to-the-minute prices from database
Also, as many parts are coming from overseas, we'd like to pull current exchanges rates from our database and integrate them into the spreadsheet for a more accurate report.
So let's build this report with Cocoon and XML.
Cocoon, XML and Excel
Our report solution comes down to really three core documents used by Cocoon:
- Report Parameters: an XML document to generate an HTML page for report selection
- Report Template: an Excel spreadsheet with the basic look-and-feel, column headers, cell formatting, etc.
- Report Content : an XML document that describes how the report will be built.
Here's a basic idea of how these documents are used by our report generator:
We'll build our report generator with Cocoon. Cocoon's XML pipeline will first display selection page in HTML after receiving an HTTP request for that HTML page. This step will include a database call for all current customer invoices. Once the user has selected a customer invoice, Cocoon will accept HTTP requests for a report, process the report template and scheme, accept optional parameters, access the database and then serialize the report into a binary stream compatible with Excel.
First, let's look at the component parts. Then we'll look at how to use Cocoon to put them together.
The Data
Our example assumes a relational database with some tables for customer information and for invoice data. We also have a simple table for exchange rates. This could be a web service but for now we'll use a static table (see resources below for more on Cocoon and Web Services).
| PART |
| ID |
PARTNUMBER |
DESCRIPTION |
SOURCEID |
NOTES |
PRICE |
| CUSTOMER |
| ID |
IDENTIFIER |
NAME |
COMPANY |
PHONOE |
| INVOICE |
| ID |
CUSTOMERID |
TRANSACTION |
STATUS |
| LINEITEM |
| ID |
INVOICEID |
PARTID |
REFNUMBER |
QUANTITY |
| SOURCE ( For exchange rates) |
| ID |
NAME |
CURRENCY |
CONVERSION |
The Report Selection Page
The web designers have built a simple HTML page to select the appropriate report. This could be a fancy rich-client in Flash, but in the interest of simplicity we're using plain-old HTML -- but styled with the imaginary company's look-and-feel.
 |
| The user's HTML report selection page. |
The Report Template
To keep the marketers happy, the template reflects the look-and-feel as well. Note that we've also used Excel's built-in reporting tools to create a pie-chart that will reflect the item information once populated.
 |
| The Excel report template |
Generating the Selection Page
In order to generate the selection buttons for each model, we'll have to perform a quick query to the database. Ideally query would be in a separate file to keep the HTML free of logic, but for the purposes of this example we'll put the SQL tages directly in the HTML within the ReportParameters.xml file.
(fragment from ReportParameters.xml)
<sql:execute-query xmlns:sql="http://apache.org/cocoon/SQL/2.0">
<sql:query>
SELECT *,
DATE_FORMAT( transaction, '%m/%d/%y') as date,
DATE_FORMAT( transaction, '%r') as time,
invoice.id as invoiceid
FROM invoice, customer
WHERE invoice.customerid = customer.id
</sql:query>
</sql:execute-query>
The Report Content XML Document
ReportContent.xml contains the core of the business logic that constructs our dynamic report. This XML document has a few sections of logic in it:
- Accept Input Parameters
- Primary Data Query
- Data Placement
The fact that these are all located in one file means our business logic is easily managed in central place.
Primary Query
ReportContent.xml will be processed a couple times in the XML pipeline, first to receive the input parameters, then to access the database, then finally to copy the result data to our spreadsheet.
The fact that the ReportContent.xml file includes both the SQL query and the spreadsheet inclusion code is provides centralization of business logic. This is convenient because our developers can access the spreadsheet inclusion code and the related SQL code in the same XML file. And remember that because we're building the spreadsheet on the server we have access to any data source exposed to the servlet container and Cocoon, be it flat files, XML Files, relational databases or web services. This makes for timely reports.
Below is a fragment showing the both the tag that will access the database during one tranformation in our pipline and then the tags used to populate a row of the spreadsheet during the next.
(fragment from ReportContent.xml)
<!-- List all line items associated with the specified invoice -->
<rpt:data-block>
<sql:execute-query>
<sql:query>
SELECT *,
( part.price / source.conversion) AS domesticprice,
( lineitem.quantity * part.price / source.conversion) AS lineprice
FROM lineitem, part, source
WHERE lineitem.partid = part.id
AND part.sourceid = source.id
AND lineitem.invoiceid = <rpt:get-select>invoice-id</rpt:get-select>
ORDER BY lineitem.refnumber
</sql:query>
</sql:execute-query>
<rpt:row-copy-shift>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:adrs-column>A</rpt:adrs-column>
</rpt:row-copy-shift>
<rpt:column>
<rpt:qry-name>refnumber</rpt:qry-name>
<rpt:adrs-column>B</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.Double</rpt:type>
</rpt:column>
<rpt:column>
<rpt:qry-name>partnumber</rpt:qry-name>
<rpt:adrs-column>C</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.String</rpt:type>
</rpt:column>
<rpt:column>
<rpt:qry-name>description</rpt:qry-name>
<rpt:adrs-column>D</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.String</rpt:type>
</rpt:column>
<rpt:column>
<rpt:qry-name>quantity</rpt:qry-name>
<rpt:adrs-column>E</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.Double</rpt:type>
</rpt:column>
<rpt:column>
<rpt:qry-name>name</rpt:qry-name>
<rpt:adrs-column>F</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.String</rpt:type>
</rpt:column>
<rpt:column>
<rpt:qry-name>price</rpt:qry-name>
<rpt:adrs-column>G</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.Double</rpt:type>
</rpt:column>
<rpt:column>
<rpt:qry-name>domesticprice</rpt:qry-name>
<rpt:adrs-column>H</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.Double</rpt:type>
</rpt:column>
<rpt:column>
<rpt:qry-name>lineprice</rpt:qry-name>
<rpt:adrs-column>I</rpt:adrs-column>
<rpt:adrs-row>20</rpt:adrs-row>
<rpt:type>java.lang.Double</rpt:type>
</rpt:column>
</rpt:data-block>
Controlling the Pipeline
Once our separate XML documents are set-up, we'll need a "master" XML document to control both of our pipelines.
We'll do this by using the sitemap.xmap XML file. For each web application it runs, Cocoon provides a default file called sitemap.xmap. This document contains the instructions for each pipeline, including the initial source of XML, which tranformers to use, and how to serialize the end result.
Here's the XML code in sitemap.xmap for our second pipline.
(fragment from sitemap.xmap)
<map:match pattern="report/*.xls">
<!-- Obtain HTTP Request Parameters -->
<!-- These are parameters needed for the report -->
<map:generate type="request"/>
<!-- use XInclude to obtain parameters and logic from ReportContent.xml -->
<map:transform type="xslt" src="report.xsl">
<map:parameter name="label" value="{1}"/>
</map:transform>
<!-- Pass through most stuff, but execute XInclude statements inserted above -->
<map:transform type="xinclude"/>
<!-- Combine HTTP request parameters into Query statements as specified by ReportContent.xml file -->
<map:transform type="xslt2" src="report.xsl"/>
<!-- Execute Primary SQL Query(s) -->
<map:transform type="sql">
<map:parameter name="use-connection" value="jxrdb"/>
</map:transform>
<!-- Create Report Serializer Primitives base on data -->
<map:transform type="xslt3" src="report2.xsl">
<map:parameter name="label" value="{1}"/>
<map:parameter name="absolute-path" value="/usr/local/customers/jxr/webapps/examples"/>
</map:transform>
<!-- Seriallize into format suitable for Excel -->
<map:serialize type="excel-serializer"/>
</map:match>
The Final Result
After inserting the information into the template, Cocoon serializes the populated spreadsheet into the MIME type that represents an Excel spreadsheet: application/vnd.exe. It then sends this output stream to the browser. Internet Explorer automatically displays the spreadsheet within the browser, which gives our users a nice seamless experience.
Thanks to Cocoon, we managed the interaction of our components (ReportParameters.xml, ReportContent.xml, and the Excel template) with our pipeline document, the sitemap.xmap administration file.
This "XML directed" software provides a couple of benefits for web application developers, which we discuss below.
Solution Implementation Details
Cocoon facilitates XML-directed software. The entire solution is
achieved by directing generic components with the contents of XML files.
The following two figures below demonstrate how the main pipelines are composed of
reusable, generic XML processing components, which are in turn directed by
XML documents.
Apache Cocoon Notation
This article uses the following Apache Cocoon Graphical Notation:
|
The Report Parameter Pipeline
The report generator parameter pipeline is responsible for rendering
the report input parameter page. Such parameters are typically date, staff
member name, company name, and so on -- whatever makes sense as a generic
feature of the report technology. The parameter definitions are obtained
from the report scheme document.
 |
| Input Parameter Cocoon Pipeline |
The Report Rendering Pipeline
This pipeline is the real heart of the solution. It accepts the configuration parameters from a user as an HTTP POST and after a series of transformations generates the spreadsheet.
 |
| Report Rendering Cocoon Pipeline |
This pipeline's processing steps are as follows:
- Incoming HTTP request parameters are converted into an XML
document by the request generator so as to be compatible with later
components.
- An XSL transformer, directed by render.xsl,
generates XInclude
statements so that key elements of the Report Content document can be
incorporated.
- The XInclude statements are executed retrieving the
appropriate ReportContent.xml query and formatting constructs.
- The original HTTP request parameters are properly merged
into the database query statements using a second XSL transformer.
- The database query statements are executed using the standard
Cocoon SQL transformer, retrieving the desired report data.
- The retrieved data is arranged according to the ReportContent.xml
data-block formatting statements for appropriate insertion into the Excel
spreadsheet. This is accomplished by the use of a third XSL transformer.
- The retrieved data is merged with the Excel template and the final
binary Excel report is generated and sent back to the user using the report serializer.
Separation of Concerns
A big motivation for the team that built Cocoon was "separation of concerns," or SoC. Basically, SoC is the concept of separating workers into areas of specialization and then managing the working relationships between these groups with minimum overlap. As the Cocoon team explains on their website,
"It can be observed that separating people with common skills in different working groups increases productivity and reduces management costs, but only if the groups do not overlap and have clear "contracts" that define their operability and their concerns." (source)
Although others have attempted to enforce this separation by using technologies like JSP and ASP, Cocoon goes the extra length to make SoC more natural. It does this by encouraging the separation of content, style and logic into separate XML files and then allowing a producer to "seal" working groups into separate virtual rooms connected only by "contract" pipes. These pipes are created and controlled by management with tools like the sitemap.xmap file.
So how does this approach help our imaginary ABC Computer Parts team? Taking a look at the application we've created, it's quickly apparent that the work of each group has been nicely constrained to separate files with a minimum number of contracts between them.
For example, the designers of the Excel template need only to know the names of the data fields (a contract with the content team) and nothing about insertion code. Likewise, the team controlling the population of the spreadsheet need not touch the template, only know what columns the designers have chosen for the data.
Designers, coders, content managers can each work on their section of the application without stepping on each others toes. Meanwhile, the manager has a nice view of the pipeline in the sitemap.xmap file.
Using Cocoon and Excel in Business Solutions
The authors of this article were so convinced of the benefits of producing Excel spreadsheets with the open source Cocoon platform that we've built a custom report-building services based on that approach.
The service, called JX Reports, is a way for us to provide clients with a quick, cheap and intuitive report generation solution with really nice results. And with Cocoon's built-in extensibility and separation of concerns, maintenance and adaption to new business problems are not a problem.
If you'd like to learn more about JX Reports, please email Steven Punte.
About the Authors
 |
|
Daniel McQuillen is an information designer and programmer who works in a number of media with the intention of "making complex things clear." |
 |
|
Steven Punte, an accomplished architect and system specialist hailing from Hewlett Packard, excels at leading edge technologies and agile software development tactics. |
Resources
For this article:
Other Reading:
Like to learn more about Cocoon? Here's some more resources to help your efforts along: