Bio::Affymetrix Perl modules


Got an Affymetrix machine? Or do you know folks who do? Then chances are, you will be familiar with the .CHP, .CEL and .CDF files associated with them. Unfortunately sometimes these are binary files, and so getting data out of them is not easy. Wouldn't it be great if you could use a handy set of Perl modules to parse the information out of them? Strangely enough ...

What does it do?

With these modules you can...

  • Parse CHP files from MAS 5 and GCOS 1.2 software, and obtain expression values and summary statistics. The modules handle the two file formats transparently, so you can write application that parse either without trouble
  • Parse CDF files from MAS 5 and obtain all information about design of Affymetrix chips

If you have a lot of CHP files lying around that you need to get data from, these are the modules for you.

Design philosophy

General usage is as follows- first you make an object, then you call one of the parse_ methods to fill it with data. The objects are entirely parsed into memory. This makes manipulating the data very easy, at the expense of using lots of memory. It is possible to write a module that parses through the data in one step. Hopefully these modules will give some clues if you want to write such a system.

Where can I get them?

These modules are available from CPAN . Also included is some perldoc documentation explaining how to use the modules, and some example programs.

Missing Features

Features that we want to include, but have not so far:

  • Parsing GCOS v1.2 CDF files
  • Adding the ability to write files as well as read them. We have a prototype CDF file writer available
  • Any handling of CEL files

Features that are arguably missing, but we do not plan to implement:

  • Non-expression arrays (SNP chips, etc.)

What other ways are there of doing the same thing?

There are various options available. You can pay for one of the Affymetrix developer kits. This provides Microsoft COM access to Affymetrix files, and to the GCOS database. Affymetrix also has a free (LGPL) parser for some files written in C++. Bioconductor can read various Affymetrix file formats. The Bioperl Microarray modules can read some Affymetrix file formats however it cannot read the latest formats.