Need to reorganize data files into spreadsheet output

I much prefer c++ to scripting languages like perl/python.

I have to do some bioinformatics work as part of my biomedical phd training in graduate school (lab research). In order to deal with the data, I would like to be able to take several ascii text files, parse their contents and output a spreadsheet file organized as a multi-sheet workbook.

The idea here is to automate the process, and thus CSV as an intermediate is not a desirable option. I am trying to avoid needing to import anything from within the spreadsheet program, but to have my program produce the finished product ready for analysis. I have already implemented a working program that is currently utilizing the free version of libxl http://www.libxl.com/.

However, I am fairly poor (being a grad student), and I can't really afford to buy the library (not to mention all three, since I develop on Mac, Linux, and windows interchangably). The free version does some weird things to the output workbooks, (it inserts a request to purchase the full version as a merged-cells top row, overwriting any cells present there & randomly replaces cells with the text "buy me!". Does anybody know of a cheaper library or even open source library that can do similar things in an object oriented fashion? I don't even care if it uses excel format. I am open to producing ods files (ods: calc spreadsheets from openoffice.)

I realize that the ODS file format description is public domain and available, but I'm trying to write a fairly simple program, and I really don't want to have to implement all the backend file-type handling for something so trivial.

If such a libary doesn't yet exist, I'm thinking of designing a sort of modified ascii format to represent the spreadsheet so I can combine the source data files and separate the sheets and then writing a converter in perl, using the excel perl modules, to convert it to the desired format. This introduces an extra step though, and it would be best if I could avoid doing this.

For those with TLDR syndrome or lost track of the question::

The short and sweet: I'm looking for a library that enables the production of a spreadsheet format output file (either using source files with implementation in them or headers that link to static or dynamic libaries, I have no preference). I have no specific preference for excel and I am completely ok with using open document formats.

I have found ways of doing this in java, and I am aware of at least one way to do this in perl, but I have had no luck in finding a free/cheap library to do this in c++. It is entirely possible that I'm just stupid and don't know where to look.
Last edited on
I have to do some bioinformatics work as part of my biomedical phd training in graduate school (lab research). In order to deal with the data, I would like to be able to take several ascii text files, parse their contents and output a spreadsheet file organized as a multi-sheet workbook.


When you need to deal with a lot of text files, parse contents, I think Perl/Python really fit the job well in comparison to C++.

Have you heard of Perl Bioinformatics ? Google search enter "perl bioinformatics" and you will see what I mean.

Of cuz it is purely my opinion but if you insist to be C++ I am not stopping you either.
As XLS is a proprietary file format, you are unlikely to find very much to directly handle it.

The options, AFAIK, are these:

1. buy libxl (alas)
2. find or write something simple like http://www.codeproject.com/KB/office/BasicExcel.aspx
3. use OLE automation to have the user's copy of Excel do all the dirty work for you
4. use multiple CSV files or one CSV file that is specially formatted, like this person did http://www.ozgrid.com/forum/showthread.php?t=62756&page=1

Good luck!
@Sohguanh
I am aware of bioperl, and I actually have it, but my mind works so much better in object oriented strongly typed languages, I am considering learning python, but thats on the shelf for now.

@Duoas
I figured as much for the excel format, but i'm dumbfounded as to why there isn't anything like this for the open formats yet.

Are there any good guides for using OLE automation, and does it work for mac excel? (My University loves macs, and there is a significant lack of windows machines.)
Last edited on
If no such library exists, I'm thinking of writing to an ascii file intermediate as follows, and then writing a quick perl converter to read this and output the desired workbook files. I would probably use the Spreadsheet::WriteExcel module.

Here is the formatting Idea, I just wanted to run it by someone to confirm that using these nonprintable ascii chars will work, provided I read the file correctly.

File[(char)28]: Workbook
Group[(char)29]: Sheet
Record[(char)30]: Cell
Unit[(char)31]: Use to "end" items
End Transmission Block[(char)23]:Indicate New Row

File Definition:
(FileSep)Filename(UnitSep)

Sheet Definition:
(GroupSep)SheetName(UnitSep)

Cell Definition:
(RecordSep)Cell_Type.Contents(UnitSep)

New Row Indication:
(ETB)(UnitSep)

Cell Types:
num: number
str: text
frm: formula
File format:
http://www.cplusplus.com/forum/general/32818/#msg176733

I hear that Perl has a good Excel extension.
While I appreciate the input, it seems you didn't read what I was going for in the first post. I am aware that the older excel format descriptions have been released (though I didn't mention it previously I did mention the open document formats, and the same thing applies here). Not to mention, even if I wanted to implement it, I'm not nearly experienced enough to handle the programming required to deal with microsoft binary storage formats.

On another note, the strategy from my post above has been mostly implemented and seems to work. Polishing it up right now.
Last edited on
Solution worked like a charm, with the caveat that it has dependencies on perl and perl's Spreadsheet::WriteExcel module. (Since I'm only expecting to use this on Windows Linux or Mac, I was also able to make it smart enough to abort if it can't find its companion perl script.)
It writes to a temporary file, then orders the perl script to convert it into excel format. If the perl script returns 0, the temp file is purged. If an error value returns from the system call, the temp file remains. The number of excel files produced is determined by the contents of the temporary binary file.

Useful little format I came up with actually, sort of like a multi-workbook/multi-worksheet binary version of CSV. (binary in the sense that it uses non-printable characters to delineate separations and structures within the spreadsheets.)
Last edited on
Topic archived. No new replies allowed.