Haixu+Tang

Associate Professor of [|Informatics and Computing] Adjunct Associate Professor in [|Biology] Director of Bioinformatics at [|Center for Genomics and Bioformatics]

**Office**: Lindley Hall 301D **Phone**: (812)-856-1859

**Fax**: (812)-856-4764

**E-mail**: hatang (at) indiana.edu Lab: [|Computational Omics Lab (COL)] Go to [|Bioinformatics programs at IU Bloomington] **Mailing address:**

School of Informatics and Computing

150 S. Woodlawn Avenue

Bloomington, IN 47405-7104 **Educaltional Background:** Ph.D. in Biochemistry, Shanghai Institute of Biochemistry, Chinese Academy of Sciences. B.S. in Physics, Department of Physics, Nanjing University, China.

**Research Interests:**

Algorithmic and statistical problems in Bioinformatics, particularly in **Teaching:**
 * repeats and segmental duplication in eukaryotic genomes
 * fragment assembly in DNA sequencing
 * mass spectrometry data analysis for proteomics, glycomics and glycoproteomics
 * gene regulatory analysis


 * Fall 2004 **
 * I602: **  Capstone project for bioinformatics master students


 * Spring 2005 **
 * I590: **  Topics in Informatics: Introduction to genomics (for non-biology students)


 * Fall 2006 **
 * I519: **  Introduction to Bioinformatics

** I627:  **  Seminar in Bioinformatics
 * Spring 2006 **
 * I690: **  Computational techniques in comparative genomics


 * Fall 2006 **
 * I617: **  Informatics in life sciences and chemistry


 * Spring 2007 **
 * I529: **  Biological sequence analysis


 * Fall 2007 **
 * I690: **  Advanced algorithms in bioinformatics


 * Spring 2008 **
 * I529: **  Biological sequence analysis


 * Fall 2008 **
 * I201: **  Mathematics foundations in Informatics

**Awards** **Books and Book Chapters** **Recent Publications:**
 * 1) NSF Early Career Development (CAREER) Award, 2007.
 * 2) Indiana University Outstanding Junior Faculty Award, 2009.
 * 1) H. Tang, How does influenza virus jump from animals to humans? pp148-1640, in â€œBioinformatics for Biologistsâ€, P. A. Pevzner and R. Shamir (Eds), Cambridge University Press, 2011. [|Amazon]
 * 2) S. Kim, H. Tang and E. Mardis, ed. Genome sequencing technology and algorithms, Artech House Publishers (2007). [|Amazon]
 * 3) H. Tang and S. Kim, Bioinformatics: Mining the massive data from high throughput genomics experiments, pp1-24, in [|Analysis of Biological Data: A Soft Computing Approach], edited by Sanghamitra Bandyopadhyay, Ujjwal Maulik and Jason T. L. Wang, World Scientific Press (2007).
 * 4) Y. Ye and H. Tang, Dynamic programming algorithms for sequence and structure comparison, pp9-28, in [|Bioinformatics Algorithms: Techniques and Applications], edited by Ion Mandoiu and Alexander Zelikovsky , Wiley Press (2008).
 * 5) H. Tang, How does influenza virus jump from animals to humans? [|Bioinformatics for Biologists], edited by P. A. Pevzner and R. Shamir, Cambridge University Press, (2011).
 * 1) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]M. Rho, Y. Wu, H. Tang, T. Doak and Y. Ye (2011), Diverse CRISPRs evolving in human microbiomes, Plos Genetics, in press.
 * 2) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]Y. Zhao, H. Tang and Y. Ye (2011) RAPSearch2: a fast and memory-efficient protein similarity search tool for next generation sequencing data. Bioinformatics, in press. [|Pubmed]
 * 3) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]S. Lee, M. Kwon, H. Lee, Y. Paik, H. Tang, J. K. Lee and T. Park (2011) Enhanced peptide quantification using spectral count clustering and cluster abundance, BMC Bioinformatics, 12:423. [|Pubmed]
 * 4) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]X. Lai, L. Wang, H. Tang and F. A. Witzmann (2011) A Novel Alignment Method and Multiple Filters for Exclusion of Unqualified Peptides To Enhance Label-Free Quantification Using Peptide Intensity in LC-MS/MS, Journal of Proteome Research, 10:4799-4812. [|Pubmed]
 * 5) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]Y. Ye, J. Choi and H. Tang (2011) RAPSearch: a fast protein similarity search tool for short reads, BMC Bioinformatics. 12:159. [|Pubmed]
 * 6) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]A. Mayampurath, C. Yu and H. Tang (2011), Bioinformatic approach to glycomics and glycoproteomics, Current Proteomics, 8(4):309-324. [|online]
 * 7) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]Y. Chen, B. Peng, X. Wang and H. Tang (2012), Large-Scale Privacy-Preserving Mapping of Human Genomic Sequences on Hybrid Clouds, Proceeding of the 19th Network & Distributed System Security Symposium (NDSS'12), accepted.
 * 8) [[image:http://www.informatics.indiana.edu/hatang/new.gif width="31" height="12"]]X. Zhou, B. Peng, Y. F. Li, Y. Chen, H. Tang and X. Wang (2011) To release or not to release: evaluating information leaks in aggregate human-genome Data, Proceedings of the 16th European conference on Research in computer security (ESORICS'11), Lecture Notes in Computer Science, 6879:607-627. [|LNCS]
 * 9) Daphnia Genome Consortium (2011), The expansive genome of Daphnia pulex with environment-dependent gene regulation, Science, (6017):555-561. [|Science online], [|Science comment]
 * 10) L. Cherbas, A. Willingham, D. Zhang, L. Yang, Y. Zou, B. D. Eads, J. W. Carlson, J. M. Landolin, P. Kapranov, J. Dumais, A. Samsonova, J.-H. Choi, J. Roberts, C. A. Davis, H. Tang, M. J. van Baren, S. Ghosh, A. Dobin, K. Bell, W. Lin, L. Langton, M. O. Duff, A. E. Tenney, C. Zaleski, M. R. Brent, R. A. Hoskins, T. C. Kaufman, J. Andrews, B. R. Graveley, N. Perrimon, S. E. Celniker, T. R. Gingeras and P. Cherbas (2011) The transcriptional diversity of 25 Drosophila cell lines, Genome Research, 21:301-314. [|Pubmed]
 * 11) N. Shah, H. Tang, T. G. Doak and Y. Ye (2011), Comparing bacterial communities inferred from 16S rRNA gene sequencing and shotgun metagenomics. Pac Symp Biocomput. 2011:165-176. [|Fulltext from PSB online proceeding].
 * 12) S. Li, R. J. Arnold, H. Tang and P. Radivojac (2010) On the accuracy and limits of peptide fragmentation spectrum prediction. Analytical Chemistry, in press. [|Pubmed].
 * 13) M. Rho, H. Tang and Y. Ye (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucl. Acids Res., 38(20):e191. [|Pubmed], [|Free fulltext at NAR online].
 * 14) C. Zhong, H. Tang, and S. Zhang (2010), RNAMotifScan: automatic identification of RNA structural motifs using secondary structural alignment. Nucl. Acids Res., 38(18): e176. [|Pubmed], [|Free fulltext at NAR online].
 * 15) Y. F. Li, R. J. Arnold, H. Tang and P. Radivojac (2010), The importance of peptide detectability for protein identification, quantification, and experiment design in MS/MS proteomics. J. Proteome Research. 9(12): 6288-6297. [|Pubmed].
 * 16) B. C. Bohrer, Y. F. Li, J. P. Reilly, D. E. Clemmer, R. D. DiMarchi, P. Radivojac, H. Tang and R. J. Arnold (2010), Combinatorial Libraries of Synthetic Peptides as a Model for Shotgun Proteomics. Anal. Chem. 82 (15): 6559-6568. [|Pubmed].
 * 17) M. Rho, S. Schaack, X. Gao, S. Kim, M. Lynch and H. Tang (2010), LTR retroelements in the genome of Daphnia pulex, BMC Genomics, 11:425. [|Pubmed].
 * 18) Y. Wu, Y. Mechref, I. Klouckova, A. Mayampurath, M. V. Novotny and H. Tang (2010), Mapping Site-specific Protein N-Glycosylations through Liquid Chromatography/Mass Spectrometry and Targeted Tandem Mass Spectrometry, Rapid Communication of Mass Spectrometry, 24(7):965-72. [|Pubmed].
 * 19) D. Hajkova, Y. Imanishi, V. Palamalai, K. C. S. Rao, C. Yuan, Q. Sheng, H. Tang, R. Zeng, R. M. Darrow, D. T Organisciak and M. Miyagi (2010), Proteomic changes in the photoreceptor outer segment upon intense light exposure, J. Proteome Res., 9(2):1173-1181. [|Pubmed].
 * 20) M. Rho and H. Tang (2009), MGEScan-nonLTR: computational identification and classification of Non-LTR retrotransposons in eukaryotic genomes. Nucleic Acid Res, 37(21):e143. [|Free fulltext at NAR online]
 * 21) R. Wang, F. Y. Li, X. Wang, H. Tang and X. Zhou (2009), Learning your identity and disease from research papers: information leaks in genome wide association study. Proceeding of 16th ACM conference on Computer and Communication Security (CCS'09). [|CCS online]
 * 22) R. Wang, X. Wang, Z. Li, H. Tang, M. Reiter and Z. Dong (2009), Privacy-preserving genomic computation through program specialization. Proceeding of 16th ACM conference on Computer and Communication Security (CCS'09). [|CCS online]
 * 23) Y. F. Li, R. J. Arnold, Y. Li, P. Radivojac, Q. Sheng and H. Tang (2009), A Bayesian approach to protein inference problem in shotgun proteomics. J. Comp. Biol., 16(8): 1183-1193. [|Pubmed]
 * 24) Y. Ye and H. Tang (2009), An ORFome assembly approach to metagenomics sequences analysis. J. Bioinf. Comp. Biol., 7(3):455-71. [|Pubmed]
 * 25) M. Rho, M. Zhou, X. Gao, S. Kim, H. Tang, M. Lynch (2009), Independent Mammalian Genome Contractions Following the KT Boundary. Gen. Biol. Evol. 1:2-12. [|Free Fulltext at GBE online]
 * 26) B. Mann, M. Madera, Q. Sheng, H. Tang, Y. Mechref, M. V. Novotny (2008), ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics, Rapid Comm. Mass Spec., 22:3823-3834. [|Pubmed]
 * 27) Q. Sheng, Y. Mechref, Y. Li, M. V. Novotny, H. Tang (2008), A computational approach to characterizing bond linkage of glycan isomers using MALDI-TOF-TOF mass spectrometry, Rapid Comm. Mass Spec. 22:3561-3569. [|Pubmed]
 * 28) C. Shen, Q. Sheng, J. Dai, Y. Li, R. Zeng, H. Tang (2008), On the estimation of false positives in peptide identifications using decoy search strategy, Proteomics, 9(1):194-204. [|Pubmed]
 * 29) C. Yuan, Q. Sheng, H. Tang, Y. Li, R. Zeng, R. J. Solaro (2008), Quantitative comparison of Sarcomeric phosphoproteomes of neonatal and adult rat hearts, Am. J Physiol. Heart Circ. Physiol., 295(2):H647-56. [|Pubmed]
 * 30) Y. Ye, H. Tang (2008), An ORFome assembly approach to metagenomics sequences analysis. Proceedings of the 7th Annual International Conference on Computational Systems Biology (CSB'08), 3-13. [|CSB online]
 * 31) Y. F. Li, R. J. Arnold, Y. Li, P. Radivojac, Q. Sheng, H. Tang (2008), A Bayesian approach to protein inference problem in shotgun proteomics. Proceedings of the 12th Annual International Conference on Computational Molecular Biology (RECOMB08), LNBI 4955, 167-180. [|LNBI online]
 * 32) S. Saha, S. H. Harrison, C. Shen, H. Tang, P. Radivojac, R. J. Arnold, X. Zhang, J. Y. Chen (2008), HIP2: An online database of human plasma proteins from healthy individuals. BMC Med Genomics. 1:12. [|Pubmed]
 * 33) J. H. Choi, S. Kim, H. Tang, J. Andrew, D. G. Gilbert, J. K. Colbourne (2008), A machine-learning approach to combined evidence validation of genome assemblies, Bioinformatics, 24(6):744-50. [|Pubmed]
 * 34) P. Alves, R. J. Arnold, D. E. Clemmer, Y. Li, J. P. Reilly, Q. Sheng, H. Tang, Z. Xun, R. Zeng, and P. Radivojac (2008), Fast and accurate identification of semi-tryptic peptides in shotgun proteomics, Bioinformatics, 24: 102-109. [|Pubmed]
 * 35) Z. Jiang, H. Tang, M. Ventura, M. F. Cardone, T. Marques-Bonet, X. She, P. A. Pevzner, E. E. Eichler (2007), Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat Genet. 39:1361-1368. [|Pubmed], [|Commentary on Nature Genetics].
 * 36) H. Tang (2007) Genome assembly, rearrangement and repeats, Chem Rev., 107(8):3391-3406. [|Pubmed].
 * 37) S. H. Bae, H. Tang, J. Wu, J. Xie and S. Kim (2007), dPattern: transcription factor binding site (TFBS) discovery in human genome using a discriminative pattern analysis. 23:2619-2621. [|Pubmed].
 * 38) M. Rho, J. H. Choi, S. Kim, M. Lynch and H. Tang (2007), //De novo// identification of LTR retrotransposons in eukaryotic genomes. BMC Genomics, 8:90. [|Pubmed].
 * 39) A. Sundquist, M. Ronaghi, H. Tang, P. A. Pevzner and S. Batzoglou (2007), Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS ONE, 2:e484. [|Pubmed].
 * 40) Y. Wu, Y. Mechref, I. Klouckova, M. V. Novotny and H. Tang (2007), A computational approach for the identification of site-specific protein glycosylations through ion-trap mass spectrometry, The Third RECOMB Satellite meeting on Proteomics, Lecture Notes in Bioinformatics, 4532:96-107, [|LNCS online].
 * 41) P. Alves, R. J. Arnold, M. V. Novotny, P. Radivojac, J. P. Reilly and H. Tang (2007), Advancement in protein inference from shotgun proteomics using peptide detectability. Pacific Symposium on Biocomputing, 12:409-420. [|Fulltext from PSB online proceeding].
 * 42) R. Patwardhan, H. Tang, S. Kim and M. Dalkilic (2006), An approximate de Bruijn graph approach to multiple local alignment and motif discovery in protein sequences, The First International Workshop in data mining and bioinformatics, Lecture Notes in Bioinformatics, 4316:158-169.
 * 43) D. Zhi, B. Raphael, A. Price, H. Tang and P. Pevzner (2006), Identifying repeat domains in large genomes, Genome Biology, 7(1):R7, [|Pubmed].
 * 44) D. Zhi, R. Keich, P. Pevzner, S. Heber and H. Tang (2006), Checking for base-calling errors in repeats. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(1):54-64, 2007, [|Pubmed]
 * 45) H. Tang, R. J. Arnold, P. Alves, Z. Xun, D. E. Clemmer, M. V. Novotny, J. P. Reilly and P. Radivojac (2006), A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics, 22(14):e481-488, ISMB 2006. [|Pubmed].
 * 46) V. Bafna, H. Tang and Shaojie Zhang (2006), Consensus Folding of Unaligned RNA Sequences Revisited. J. Comp. Biol. 13(2):283-295, [|Pubmed].
 * 47) R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang and P. Radivojac (2006), A machine learning approach to predicting peptide fragmentation spectra. Proceeding of Pacific Symposium on Biocomputing, 11:219-230, [|Fulltext from PSB online proceeding].
 * 48) H. Tang, Y. Mechref and M. Novotny (2005), Automatic Interpretation of MS/MS Spectra of Oligosaccharides. Bioinformatics, 21 Suppl 1:i431-i439, ISMB 2005, [|Pubmed]
 * 49) V. Bafna, H. Tang and Shaojie Zhang (2005), Consensus Folding of Unaligned RNA Sequences Revisited. Proceedings of the Ninth Annual International Conference on Computational Molecular Biology (RECOMB'05), 172-187, May 2005, Boston, USA, ACM.
 * 50) B. Raphael, D. Zhi, H. Tang and P. A. Pevzner, A novel method for multiple alignment of sequences with repeats and shuffled elements. Genome Res. 2004, 14: 2336-2346. [|Pubmed]
 * 51) N. Bandeira, H. Tang, V. Bafna and P. A. Pevzner, Shotgun protein sequencing by tandem mass assembly. Analytical Chemistry, 2004, 76:7221-33. [|Pubmed]
 * 52) P. A. Pevzner, H. Tang and G. P. Tesler, De novo repeat classification and fragment assembly. Genome Res. 2004 Sep; 14(9): 1786-96. [|Pubmed]
 * 53) M. Chaisson M, P. A. Pevzner and H. Tang, Fragment assembly with short reads. Bioinformatics. 2004 Sep 1; 20(13): 2067-74. [|Pubmed]
 * 54) P. A. Pevzner, H. Tang and G. P. Tesler, De novo repeat classification and fragment assembly. Proceedings of the Eighth Annual International Conference on Computational Molecular Biology (RECOMB'04), April 2004, San Diego, USA, ACM Press. 2004 Sep; 14(9): 1786-96.
 * 55) S. Heber, M. Alekseyev M, S. H. Sze, H. Tang and P. A. Pevzner, Splicing graphs and EST assembly problem. Bioinformatics. 2002; 18 Suppl 1 :S181-8 (ISMB 2002 issue). [|Pubmed]
 * 56) P. A. Pevzner and H. Tang, Fragment assembly with double-barreled data. Bioinformatics. 2001 Jun;17 Suppl 1:S225-33 (Special ISMB 2001 issue). [|Pubmed]
 * 57) P. A. Pevzner, H. Tang and M. S. Waterman (2001) A New Approach to Fragment Assembly in DNA Sequencing. Proceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB'01), April 2001, Montreal, Canada, ACM Press.
 * 58) Q. Tu, H. Tang and D. Ding, MedBlast: searching articles related to a biological sequence. Bioinformatics. 2004,20:75-77. [|Pubmed]
 * 59) P. A. Pevzner, H. Tang and M. S. Waterman (2001), An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98:9748-9753. [|Pubmed] [|Nature News]
 * 60) S. Kruglyak and H. Tang (2001) A New Estimator of Significance of Correlation in Time Series Data, J. Comp. Biol. 2001,8:463-470. [|Pubmed]
 * 61) S. Kruglyak and H. Tang (2000) Regulation of Adjecent Yeast Genes. Trends in Genetics, 16:109-111. [|Pubmed]