Table Of ContentMethods in
Molecular Biology 1986
Verónica Bolón-Canedo
Amparo Alonso-Betanzos
Editors
Microarray
Bioinformatics
M M B
ETHODS IN OLECULAR IO LO GY
SeriesEditor
JohnM.Walker
School of Lifeand MedicalSciences,
University of Hertfordshire,Hatfield,
Hertfordshire, AL109AB,UK
Forfurther volumes:
http://www.springer.com/series/7651
Microarray Bioinformatics
Edited by
Verónica Bolón-Canedo and Amparo Alonso-Betanzos
CITIC, Universidade da Coruña, A Coruña, Spain
Editors
Vero´nicaBolo´n-Canedo AmparoAlonso-Betanzos
CITIC CITIC
UniversidadedaCorun˜a UniversidadedaCorun˜a
ACorun˜a,Spain ACorun˜a,Spain
ISSN1064-3745 ISSN1940-6029 (electronic)
MethodsinMolecularBiology
ISBN978-1-4939-9441-0 ISBN978-1-4939-9442-7 (eBook)
https://doi.org/10.1007/978-1-4939-9442-7
©SpringerScience+BusinessMedia,LLC,partofSpringerNature2019
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthematerialis
concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,reproduction
onmicrofilmsorinanyotherphysicalway,andtransmissionorinformationstorageandretrieval,electronicadaptation,
computersoftware,orbysimilarordissimilarmethodologynowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublicationdoesnotimply,
evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelawsandregulations
andthereforefreeforgeneraluse.
Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbookarebelievedto
betrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditorsgiveawarranty,
expressorimplied,withrespecttothematerialcontainedhereinorforanyerrorsoromissionsthatmayhavebeenmade.
Thepublisherremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC, part of
SpringerNature.
Theregisteredcompanyaddressis:233SpringStreet,NewYork,NY10013,U.S.A.
Preface
Over the last few decades, advances in molecular genetics technologies, such as DNA
microarrays, have stimulated a new line of research in bioinformatics. DNA microarrays
allowustoobtainaglobalviewofthecell,whereitispossibletomeasurethesimultaneous
expressionoftensofthousandsofgenes.Inparticular,thistypeofdataworksbycollecting
informationfromtissueandcellsamplesregardinggeneexpressiondifferencesthatcouldbe
usefulfordiagnosingdiseaseor fordistinguishingaspecifictumor type.
Microarray data quickly became very popular among bioinformatics researchers. In a
microarray experiment, there are usually very few samples (often fewer than 100 samples),
but the number of features in the raw data ranges up to 60,000.This high dimensionality,
together with the almost naturally unbalancedness of such data, makes the analysis of
microarraydataveryappealingfor machinelearningandstatisticalresearcherstoo.
Thisbookprovidesacomprehensivereviewofthemain,up-to-datemethods,tools,and
techniques for microarray data analysis. Internationally recognized experts address specific
researchtopicsandchallengesintheirareasofexpertise,someofthembeingfromthefield
ofbiology,othersfromthefieldofcomputerscience,andothersfromthefieldofstatistics.
Thisinterdisciplinarityprovidesvaluableknowledgeaboutthestate-of-the-artmethodsfor
microarray analysis, covering the necessary steps for the acquisition of the data, its prepro-
cessing,anditsposterioranalysis.Fromthefieldofbiology,thisbookcoversanintroduction
to bioinformatics, as well as the protocol for DNA microarrays on glass slides and data
warehousing.Oncethemicroarraydataisreadytobedealtwith,machinelearningmethods
for microarraydataanalysiscover mainaspectssuchasclustering,featureselection,classifi-
cation,datanormalization,andmissingvalueimputation.Wehavealsocoveredthestatisti-
calanalysisofthedataandpresentedthemostpopularcomputertoolstoanalyzemicroarray
data.Sincetheuseofhigh-performancecomputing(HPC)hasbecomeverypopularinthe
field,thereisachapterdevotedtoHPCtoolstodealwithmicroarraydata.Finally,achapter
discussing the challenges and future trends for microarray analysis closes this book. The
book also contains examples and code of research work using microarray data from pub-
lished articles that are referred to in the references at the end of each chapter. In this way,
interestedreaderscaneasilyfindthoseproposalsandresultsmoredirectlyrelatedtoeachof
thesubjectsaddressedineachchapter.
Thebookisintendedforresearchersandgraduatestudentsinbioinformatics,withbasic
knowledge in biology and computer science and with a view to work with microarray
datasets. The most used tools in this book are R and Weka, both of which can be down-
loaded free1,2. Basic understanding of both is needed to fully take advantage of the pre-
sentedexamples.However,theideaspresenteddonotassumemorethanbasicknowledgeof
computer science. We hope our readers enjoy reading this book as much as we enjoyed
editingit.
1https://www.cs.waikato.ac.nz/~ml/weka/downloading.html
2https://www.r-project.org
v
vi Preface
Thisbookhasbeenmadepossiblethankstotheexpertcontributorswhohavecarefully
puttheireffortsintowritinghigh-qualitychaptersabouttheirspecifictopics.Wearegrateful
tothemformakingthishappen.Wearealsoindebtedtothebookserieseditor,JohnWalker,
whospeciallyinvitedustoeditthisbookandguidedusthroughthemainsteps.
ACorun˜a,Spain Ver(cid:1)onicaBol(cid:1)on-Canedo
AmparoAlonso-Betanzos
Contents
Preface ..................................................................... v
Contributors................................................................. ix
1 IntroductiontoBioinformatics ........................................... 1
DilaraAyyildizandSilvanoPiazza
2 ProtocolforDNAMicroarraysonGlassSlides.............................. 17
KathleenM.Eyster
3 DataWarehousingwithTargetMineforOmicsDataAnalysis................. 35
Yi-AnChen,LokeshP.Tripathi,andKenjiMizuguchi
4 AReviewofMicroarrayDatasets:WheretoFindThem
andSpecificCharacteristics............................................... 65
AmparoAlonso-Betanzos,Ver(cid:1)onicaBol(cid:1)on-Canedo,
LauraMora´n-Ferna´ndez,andNoeliaSa´nchez-Maron˜o
5 StatisticalAnalysisofMicroarrayData ..................................... 87
RicardoGonzaloSanzandAlexSa´nchez-Pla
6 FeatureSelectionAppliedtoMicroarrayData .............................. 123
AmparoAlonso-Betanzos,Ver(cid:1)onicaBol(cid:1)on-Canedo,
LauraMora´n-Ferna´ndez,andBorjaSeijo-Pardo
7 ClusterAnalysisofMicroarrayData ....................................... 153
ManuelFrancoandJuana-Marı´aVivo
8 ClassificationofMicroarrayData.......................................... 185
NoeliaSa´nchez-Maron˜o,OscarFontenla-Romero,
andBeatrizPe´rez-Sa´nchez
9 MicroarrayDataNormalizationandRobustDetection
ofRhythmicFeatures.................................................... 207
YolandaLarriba,CristinaRueda,MiguelA.Ferna´ndez,
andShyamalD.Peddada
10 HPCToolstoDealwithMicroarrayData.................................. 227
JorgeGonza´lez-Domı´nguezandRobertoR.Exp(cid:1)osito
11 ROCCurvesfor theStatisticalAnalysisofMicroarrayData .................. 245
RicardoCaoandIgnacioL(cid:1)opez-de-Ullibarri
12 Missing-ValuesImputationAlgorithmsforMicroarrayGene
ExpressionData ........................................................ 255
KohbalanMoorthy,AwsNaserJaber,MohdArfianIsmail,
FerdaErnawan,MohdSaberiMohamad,andSafaaiDeris
13 ComputerToolstoAnalyzeMicroarrayData............................... 267
GiuseppeAgapito
vii
viii Contents
14 ChallengesandFutureTrendsforMicroarrayAnalysis....................... 283
Ver(cid:1)onicaBol(cid:1)on-Canedo,AmparoAlonso-Betanzos,
IgnacioL(cid:1)opez-de-Ullibarri,andRicardoCao
Index ...................................................................... 295
Contributors
GIUSEPPE AGAPITO (cid:1) DepartmentofMedicalandSurgicalScience,UniversityMagna
Graecia,Catanzaro,Italy
AMPAROALONSO-BETANZOS (cid:1) ResearchGroupLIDIA,DepartamentodeComputaci(cid:1)on,
CITIC,UniversidadedaCorun˜a,ACorun˜a,Spain
DILARA AYYILDIZ (cid:1) DepartmentofMedicine,UniversityofUdine,Udine,Italy
VERO´NICABOLO´N-CANEDO (cid:1) ResearchGroupLIDIA,DepartamentodeComputaci(cid:1)on,
CITIC,UniversidadedaCorun˜a,ACorun˜a,Spain
RICARDO CAO (cid:1) ResearchGroupMODES,DepartmentofMathematics,CITICand
ITMATI,UniversidadedaCorun˜a,ACorun˜a,Spain
YI-ANCHEN (cid:1) LaboratoryofBioinformatics, NationalInstitutesofBiomedicalInnovation,
HealthandNutrition,Ibaraki,Osaka,Japan
SAFAAIDERIS (cid:1) InstituteforArtificialIntelligenceandBigData,UniversitiMalaysia
Kelantan,KotaBharu,Kelantan,Malaysia
FERDAERNAWAN (cid:1) FacultyofComputerSystems&SoftwareEngineering,Universiti
MalaysiaPahang,Kuantan,Pahang,Malaysia
ROBERTOR.EXPO´SITO (cid:1) GrupodeArquitecturadeComputadores,CITIC,Universidadeda
Corun˜a,ACorun˜a,Spain
KATHLEENM.EYSTER (cid:1) DivisionofBasicBiomedicalSciences,SanfordSchoolofMedicine,
UniversityofSouthDakota,Vermillion,SD,USA
MIGUELA.FERNA´NDEZ (cid:1) DepartamentodeEstadı´sticaeInvestigaci(cid:1)onOperativa,
UniversidaddeValladolid,Valladolid,Spain
OSCARFONTENLA-ROMERO (cid:1) ComputerScienceDepartment,UniversidadedaCorun˜a,A
Corun˜a,Spain
MANUELFRANCO (cid:1) CMN,UniversityofMurcia,Murcia,Spain
JORGEGONZA´LEZ-DOMI´NGUEZ (cid:1) GrupodeArquitecturadeComputadores,CITIC,
UniversidadedaCorun˜a,ACorun˜a,Spain
RICARDO GONZALOSANZ (cid:1) StatisticsandBioinformatics Unit(UEB),Valld’Hebron
ResearchInstitute(VHIR),Barcelona,Spain
MOHDARFIAN ISMAIL (cid:1) FacultyofComputerSystems&SoftwareEngineering,Universiti
MalaysiaPahang,Kuantan,Pahang,Malaysia
AWSNASERJABER (cid:1) FacultyofComputerSystems&SoftwareEngineering,Universiti
MalaysiaPahang,Kuantan,Pahang,Malaysia
YOLANDALARRIBA (cid:1) DepartamentodeEstadı´sticaeInvestigaci(cid:1)onOperativa,Universidadde
Valladolid,Valladolid,Spain
IGNACIO LO´PEZ-DE-ULLIBARRI (cid:1) ResearchGroupMODES,DepartmentofMathematics,
CITIC,UniversidadedaCorun˜a,ACorun˜a,Spain
KENJIMIZUGUCHI (cid:1) LaboratoryofBioinformatics,NationalInstitutesofBiomedical
Innovation,HealthandNutrition,Ibaraki,Osaka,Japan
MOHDSABERIMOHAMAD (cid:1) InstituteforArtificialIntelligenceandBigData,Universiti
MalaysiaKelantan,KotaBharu,Kelantan,Malaysia
KOHBALANMOORTHY (cid:1) FacultyofComputerSystems&SoftwareEngineering,Universiti
MalaysiaPahang,Kuantan,Pahang,Malaysia
LAURA MORA´N-FERNA´NDEZ (cid:1) CITIC,UniversidadedaCorun˜a,ACorun˜a,Spain
ix
x Contributors
SHYAMALD.PEDDADA (cid:1) DepartmentofBiostatistics,UniversityofPittsburgh,Pittsburgh,PA,
USA
BEATRIZPE´REZ-SA´NCHEZ (cid:1) ComputerScienceDepartment,UniversidadedaCorun˜a,A
Corun˜a,Spain
SILVANOPIAZZA (cid:1) DepartmentofCellular,ComputationalandIntegrativeBiology—
(CIBIO),UniversityofTrento,Trento,Italy
CRISTINARUEDA (cid:1) DepartamentodeEstadı´sticaeInvestigaci(cid:1)onOperativa,Universidadde
Valladolid,Valladolid,Spain
NOELIASA´NCHEZ-MARON˜O (cid:1) ComputerScienceDepartment,CITIC,Universidadeda
Corun˜a,ACorun˜a,Spain
ALEXSA´NCHEZ-PLA (cid:1) StatisticsandBioinformatics Unit(UEB),Valld’HebronResearch
Institute(VHIR),Barcelona,Spain;Genetics,MicrobiologyandStatisticsDepartment,
UniversityofBarcelona,Barcelona,Spain
BORJASEIJO-PARDO (cid:1) CITIC,UniversidadedaCorun˜a,ACorun˜a,Spain
LOKESHP.TRIPATHI (cid:1) LaboratoryofBioinformatics,NationalInstitutesofBiomedical
Innovation,HealthandNutrition,Ibaraki,Osaka,Japan
JUANA-MARI´AVIVO (cid:1) DepartmentofStatisticsandOperationsResearch,University
ofMurcia,Murcia,Spain