This is an introductory course that covers the topics of big data bioinformatics and its uses in basic research, healthcare, and the biotech and pharmaceutical industries. • genes are the basic units of heredity • a gene is a sequence of bases that carries the information required for constructing a particular protein (more accurately, polypeptide) • such a gene is said to encode a protein • the human genome comprises ~ 25,000 protein-coding genes Introduction to biology, biological databases, and high-throughput data sources. Bioinformatics is an essential infrastructure underpinning biological research – At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information. Goal: given two sequences, find the shortest series of operations needed to transform one into the other. Bioinformatics research and application include the analysis of molecular sequence and genomics data; genome annotation, gene/protein prediction, and expression profiling; molecular folding, modeling, and design; building biological networks; development of databases and data management systems. R is rapidly becoming the most important scripting language for both experimental- and computational biologists. Bioinformatics is an interdisciplinary scientific field of life sciences. Structural Bioinformatics combines understanding protein function and protein structure. During these three days you will make exercises using public web sites and software (freeware type) running locally on your PC. He did a Bioinformatics Postdoc in Soybean genetics and now runs the Genome Informatics Facility at Iowa State University. Bioinformatics is an interdisciplinary scientific field of life sciences. Structural bioinformatics combines applications of physical and chemical principles. Binary Search Trees, Suffix Tries & Trees, Pairwise sequence alignment algorithms: Dynamic programming. Applications of bioinformatics • Most basic: use web-based tools – Primarily need biology • Use Unix-based tools – Above, plus need ability to use Unix, write wrappers in Perl/ Python, write shell scripts • Use Unix tools for high-throughput data – Above, plus an understanding of data storage and scalability. The Introduction to Genomics course is dedicated to the subject of genomic data and the use of Next Generation Sequencing as a tool to analyze and understand the information contained within a genome. Bioinformatics is integrated in proteomics projects to mine data and is becoming more and more important. Orthologs: They're separated by speciation — is the phenomenon during which a common ancestor gives birth to two subgroups that slowly drift away from their common genetic makeup to become distinct species.