Scientists from National Hellenic Research Foundation, University of Aegean, National Technical University of Athens and Athens Information Technology have used the HellasGrid Infrastructure and the EGI Grid infrastructure in order to solve problems coming from the areas of computational biology, medical imaging and distributed systems. The goal of this group which developed the HECTOR web application is to implement tools for biological data analysis over parallel and distributed systems in order to facilitate the vast amount of processing and storage resources of the HellasGrid infrastructure.
In the area of bioinformatics, research efforts are drawn to whole genome functional genomics studies, through the use of DNA microarray high-throughput technology, that is the standard experimental technique over the last six years for the study of whole gene in an organism. Experiments of this type however, have a very high potential impact and are therefore adopted by a large and ever increasing number of laboratories. Laboratories and research groups are nowadays starving for powerful computational methodologies for processing microarray datasets, supporting through friendly interfacing functionalities related to array fabrication, labeling, hybridization, and data analysis. Such tools are a key prerequisite for effective data analysis that could automate and ultimately provide new insights for the transcriptomic component of the biological systems investigated.
The problem that arises with the ever growing usage of the microarray technology is the large amount of data produced everyday and the drawback of data preprocessing and statistical selection algorithms that demand high computational resources, making single threaded applications time consuming.
By utilizing the grid infrastructure, HECTOR  team managed to overcome the computational burden of such single node applications. The starting point was a set of legacy MATLAB applications, the ANDROMEDA (Automated aND Robust Microarray Experiments Data Analysis) , originally developed by researchers of the National Hellenic Research Foundation (NHRF), a new parallel statistical analysis pipeline was created.
The initial step was the transformation of all needed MATLAB functions to an open source derivative, OctaveForge that is installed on HellasGrid nodes in order to perform the data analysis. Accordingly all the initial data parsing functions were implemented from scratch in Python scripts to further speed up the operation. The pipeline was separated in different phases of parsing-preprocessing, normalization, statistical analysis and post-processing and was studied for potential parallelization. Due to the large computational needs and the ability to run independently from each other, the first two phases were designed to run on multiple nodes as seen in Figure 1. The use of MPI technology provided the ability to implement the whole workflow of the application and orchestrate the execution on multiple nodes of a site and monitor their time and possible errors. After the completion of the parallelized part, data are sent to the head MPI node for further statistical processing that need the existence of all experiment related data.
In HECTOR platform all this processing workflow is fully automated and provides a JSP based user friendly web portal that enables scientists/users from the fields of biology and medicine to use the processing power of grid without any specialty on informatics science. After the completion of statistical analysis, users are notified to get their resulting list and to further annotate their experiments with the use of MIAME XML platform implemented, and to decide whether they want to make the results public or not through the distributed database of HECTOR platform that supports the HellasGrid storage elements.
- I. Maglogiannis, University of the Aegean, Samos, Greece, imaglo (at) aegean.gr
- A. Chatziioannou, National Hellenic Research Foundation, Greece, achatzi (at) eie.gr
- I . Kanaris, University of the Aegean, Samos, Greece, kanaris.i (at) aegean.gr
- V. Mylonakis, National Technical University of Athens, Athens, Greece, vmil (at) netmode.ntua.gr
- J. Soldatos, Athens Information Technology, Athens, Greece, jsol (at) ait.edu.gr
- I. Maglogiannis, A. Chatzioannou, J. Soldatos, V. Mylonakis, J. Kanaris, “An Application Platform Enabling High Performance Grid Processing of Microarray Experiments”, In Proc. 20th IEEE Conf on Computer Based Medical Systems CBMS2007 pp. 477-482, Maribor Slovenia
- J. Soldatos, I. Maglogiannis, A. Chatzioannou, V. Mylonakis, J. Kanaris, ‘Application Architecture for High Performance Microarray Experiments over the Hellas-Grid Infrastructure’, EGEE User Forum, Manchester, United Kingdom, May 9-11, 2007.
- Kanaris, V. Mylonakis, A. Chatziioannou, I. Maglogiannis, and J.Soldatos, “HECTOR: Enabling Microarray Experiments over the Hellenic Grid Infrastructure,” J. Grid Comput., vol. 7, no. 3, pp. 1–22, Aug. 20