You are in browse mode. You must login to use MEMORY

Confused? » Read about learn, practice and the various modes

» To start learning, click login

1 / 25

[Front]

what does bioinformatics deals with?

[Back]

any thing that interests biologists , DNA/protein sequences, DNA variation, Gene expression (microarrays), data from experiment , images/models, articles.

Practice Known Questions

Stay up to date with your due questions

Complete 5 questions to enable practice

Exams

Exam: Test your skills

Test your skills in exam mode

Learn New Questions

Dynamic Modes

SmartIntelligent mix of all modes

CustomUse settings to weight dynamic modes

Manual Mode [BETA]

The course owner has not enabled manual mode

Specific modes

Learn with flashcards

Complete the sentence

multiple choiceMultiple choice mode

SpeakingAnswer with voice

TypingTyping only mode

bioinformatics - Leaderboard

1 user has completed this course

No users have played this course yet, be the first

bioinformatics - Details

Levels:

1) Chapter 1 View
2) genome browsers L(7), L9 (syllabus) View
3) L6 (secondary databases) L7(syllabus ) View
4) L11.Protein Structure Analysis View
5) L13.Protein Functional Divergence View

Questions:

104 questions

🇬🇧		🇬🇧

What does bioinformatics deals with?

Any thing that interests biologists , DNA/protein sequences, DNA variation, Gene expression (microarrays), data from experiment , images/models, articles.

What are the first methods of DNA sequencing?

Maxam- gilbert it was based on chemical reactions using large amounts of purified , end-labeled DNA, not used at large scales and sanger sequencing using small amounts of DNA in any form its enzymatic.

What is Shotgun sequencing?

Its a large scale sequencing on random DNA strands (used for the first genomes) , used restriction enzymes to cut large fragments into short ones making genomic libraries which are then separated and sequenced , then finally assembled on overlapping regions. its expensive and laborious .

What's high-throughput sequencing ?

This is intended to lower the cost of DNA sequencing , with the standard dye-terminator method increasing the speed. it applies to exon, genome sequencing transcriptome profiling.

What are the main NGS platforms?

Illumina solexa (1-6 GB),ABI solid (80-100Gb), paBio ( 100-200gb)

What happed to the cost of DNA sequencing in the NGS era?

The cost of Sequencing stimulated by the genome project decreased drastically in an exponentially, and it also became much faster.

Why did the cost of sequencing decrease?

Because of the increase of pre-processing (sample collection) and post processing (bioinformatic analysis). that's why its important in all labs.

How is the genome analyzed?

Storage of primary sequences, assembly of chromosomal sequences, predictions of gene locations, gene annotation (predicting their function),chromosome composition (variation).

What's metagenomics?

It involves directly sequencing samples from various locations, samples of living organisms in their natural environments. to identify the species, characterize their abundance, discover new protein.

What is the DNA chip technology ?

1-cell culture/tissue,2- RNA extraction, synthesis of florescent cDNA

What's transcriptome analysis?

Its defined as the set of all RNA molecules transcribed from genome ,gene expression is tightly regulated each expressed at a different level depending of cell type, tissue, time.

What are the two types of proteome analysis ? how to they work?

(2D-page) electrophoresis , and chip on chip analysis : by first tagging of transcription factors with a protein fragment they immobilizing it with fixative agents, fragmenting dNA then Precipitation of DNA-protein complexes, then unbinding them. measurement of DNA enrichment when two extracts are co-hybridized on microarray(chip) each containing one DNA fragment likely to bind.

What's an intractome?

A network of complexes.

Why do we need biological data bases?

1- for storing and communicating large datasets, 2-make these datasets available for scientist,3- making data available in computer-readable forms.

What are some examples of bimolecular databases?

Sequence and structure databases (UniPort), genome sequences and annotations (NCBI), molecular functions (EXpasy), biological processes (GeneNet)

What's the difference between primary and secondary (derived) databases?

Primary databases are experimental results directly inputed into databases , secondary databases are results of analysis of primary databases, aggregate of many databases have links to other data items, combination of data and consolidation data.

What's the availability of databases?

Its publicly available no restriction, with copyright , not downloadable, academic but not freely available , commercial . every year new databases are created with the first issue being open access.

What are some nucleic sequence databases ?

GenBank 1979 USA, EMBL-EBI 1974 UK,and DDBJ 1986 Japan

What's the INSDC?

Ints the international nucleotide sequence database collaboration of (DDBJ, EMBL-EBI, and NCBI)

What is the rule with regarding publishing articles that have sequencing?

The sequence has to be deposited in a reference database in any of the 3 databases, they are automatically sincronised.

What's the sequencing pace ?

Nucleic sequences , Entire genomes, protein sequencing (by translation of gene not direct)

How is data of sequences submitted?

Direct submission from author by web or email, sequences between banks is identical.

What are the sequence format?

Fast A, GenBank (protein ID N)

What are the main features of a good genome browser?

Software designed to enable a user to access and display sequence data,Provide a visual correlation for different types of information,Organize large amounts of genome sequence data.

What are the common features and differences between genome browsers?

Common features: • Coordinate system is based on the build • Zoom in and out • Gene features aligned to genome Major Differences: • Each browser has a very different look and feel • Navigating through the information

What are the three main genome browser repositories?

Ensembl,NCBI (Entrez) - BLAST,UCSC - BLAT. they all use the same human genome assembly but their release timing is different between sites.

What are some features of the UCSC browser?

(Vertebrates, Deuterostomes, Insects, Nematodes, Yeast),Entry into genome sequence via BLAT,Table Browser,Creation of PDF, access to all the data produced by the project, and the software used to analyze and present it,Site produces and maintains annotation tracks.

What dos the annotation track genomic data mean?

Known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands, mouse homologies etc..

How are aligned annotation tracks computed? what can users do with it?

Annotation tracks are both computed at UCSC from publicly available sequence data and provided by collaborators, but Users can also add their own custom tracks to the browser.

What is UCSC outline?

Navigating, Configuring Browser ,Extracting data

How is UCSC navigated?

Graphical Interface, that helps you Control graphical display.

What can you find when you search for on the UCSC genome browse regarding the Vast amount of genomic data?

Genome location,Gene ID and amino acid sequence, Gene expression across 53 different tissue types, marks can be found here from 7 different cell lines, DNase I hypersensitivity clusters,DNA conservation across 100 vertebrates, Amino acid conservation across different species, Known SNPs as found in dbSNP, Known repetitive or low complexity regions.

Does UCSC that helps you find other sources ?

Yes ,using Cross-link to the other sources and tools

What does configuring mean ?

The Ability to collect that data

What are the features of the Ensemble data base?

Takes genomic sequence assemblies of human, mouse, rat, mosquito. adds annotation and links (automated process).presents all the data on a web site.

What does the Ensemble data base contain?

Chromosome summary,Synteny view,SNP Variation.

What is the NCBI genome data viewer's page design comprised of?

GDV is comprised of a series of page elements (widgets) that are used for different types of interactions with the browser, such as genome searches, analysis of BLAST results, data uploads, or changing the display.

What do widgets do?

The widgets communicate with one another such that an action in one widget causes other widgets on the page to update.

What are secondary databases?

Make use of publicly available sequence data in primary databases.

What are some bimolecular databases?

UniProt ,ExPASy

How is UniProt structured?

What is UniProt and what does this database consist of?

UniProtKB Translation of EMBL coding sequences , UniProtKB/Swiss-Prot section (reviewed) which is annotated by experts and has high amounts of info, The rest (90% of the entries).

What are some of UniProts features?

The most comprehensive protein database in the world,Annotation by experts: annotators are specialized for different types of proteins or organisms with huge team,World-wide recognized as an essential resource.

What is ExPASy?

Expert Protein Analysis System

What are some 3D structure of macromolecules databases?

Worldwide Protein Data Bank and The Protein Data Bank (PDB )

What are the 4 protein banks that comprise the World wide protein data bank?

Biological Magnetic Resonance Data Bank (BMRDB), Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), PDBj, and PDBe.

What is the genome databases used in (comparative genomics)

EnsemblGenomes (access to complete genomes and proteomes)

What are the databases for protein domains?

Prosite and CATH.

What does the Prosite database align sequences and logos?

The sequences that are used to built the Prosite profile,The Sequence Logo indicates the level of conservation of each residue.

What is the domain signature in the Prosite database?

The domain signature is a string-based pattern representing the residues that are characteristic of a domain.

What is CATH database? and how do they cluster proteins?

Is a hierarchical classification of protein domain structures, which clusters proteins at four major levels: ❑ Class (C), ❑ Architecture (A), ❑ Topology (T) ❑ Homologous superfamily (H).

How are the CATH boundaries and assignment for each protein determined?

Using a combination of automated and manual procedures which include computational techniques, empirical and statistical evidence, literature review and expert analysis.

What does Ontology mean?

Ontology (information science)is the philosophical study of being, is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

How to answer the problems of inconsistencies in the annotations?

By Controlled vocabulary,Hierarchical classification between the terms of the controlled vocabulary.

What are the gene ontology processes?

DNA metabolism ( DNA, repair, replication,packajing,recombonation)

What are the gene ontology molecular functions?

Nucleic acid binding and enzymes.

What are the gene ontology cellular components?

Nucleus and cytoplasm.

What's the gene ontology database ?

The Gene Ontology (GO) database is a resource for ontological information.

What is the status of GO annotations?

Term definitions (biological processes, molecular functions, cellular components, sequence ontology), Genomes with annotation (excluding annotations from UniProt) ,Annotated gene products. (total, electronic only, manually curated).

What's the quick GO?

A user-friendly Web interface to the Gene Ontology, with Graphical display of the hierarchical relationships between terms that makes in convenient to brows between classes.

What are some remarks on "bio-ontologies"?

Improvement compared to free text,Nothing to do with the philosophical concept of ontology (only a taxonomical classification),Multiple possibilities of classification criteria, it should remain purpose-based,No representation of molecular interactions (relationships between objects are only hierarchical).

What is biological function?

A general definition,Function: characteristic action (role) of an element (organ) within an set

What is the function and gene ontology of biological functions?

Understanding the function requires to establish the link between molecular activity and the context in which it takes place (process).

What are the databases that include small compounds, reactions and metabolic pathways?

(KEGG) Kyoto Encycplopaedia of Genes and Genomes , (BioCyc ) Metabolic pathways.

How does the function and gene ontology have Multifunctionality?

Same activity can play different roles in different processes, Multiple activities of a same protein in a given process.

What are protein functions determined by? what are the folds determined by?

• Protein functions are determined by their structures. • Essential elements in bioinformatics. • Conformation (folding) of protein is determined by dihedral angle (phi and psi).

What are amino acids linked by

Amino acids are linked by peptide bond.

How are the ψ (psi) against φ (phi) angles visualized?

Ramachandran plot visualizes backbone dihedral angles ψ (psi) against φ (phi) of amino acid residues in protein structure.

What's the difference between protein Structure Analysis and Prediction ?

-protein structure analysis - usually refers to the determination of the protein structure by physical or chemical methods. -Protein structure prediction - refers to the inference of the protein structure using computer algorithms

What are the four different protein structures?

1-Primary Structure – Sequence of amino acids 2-Secondary Structure – Local Structure such as a-helices and b-sheet. 3-tertiary Structure –Arrangement of the secondary structural elements to give 3- dimensional structure of a protein. 4- Quaternary Structure– Arrangement of the subunits to give a protein complex its 3- dimensional structure.

How are protein structures determined ?

• X-ray crystallography • NMR spectroscopy • Cryo-electron microscopy • Neutron diffraction • Atomic force microscopy

What is used to measure protein structure?

X-ray diffraction analysis – must first be able to crystallize the protein and then calculate its structure by the way it disperses X-rays. determining the protein structure directly is difficult.

How does X-ray crystallography work ? what is used to determine quality?

– Protein need to be grown into large crystal – The X-ray are reflected by electron cloud surrounding the atoms, diffraction patterns are converted into electron density map. The quality is determined by -R factor is used to determined the quality of the model, ranging from 0.0 – 0.59

What are the two methods used in X-ray crystallography to resolve the structures?

• Molecular replacement • Multiple isomorphous replacement

What are the steps going from x-ray got atomic model?

How does NMR (Nuclear Magnetic Resonance) work ?

– Detect spinning pattern of atomic nuclei in magnetic field – Protein are in solution, so it is mobile and vibrating, thus a number of different models will be constructed. – Limit to <200 amino acid residues, use radioisotope

What are the limitations of both X-rayDiffraction and NMRDistanceMeasurement?

• X-rayDiffraction ✓ Only a small number of proteins can be made to form crystals ✓ A crystal is not the protein’s native environment ✓ Very time consuming • NMRDistanceMeasurement ✓ Not all proteins are found in solution ✓ This method generally looks at isolated proteins rather than protein complexes ✓ Very time consuming

How are structures verified and validated?

•MolProbity •NQ-Flipper •Procheck •CheckMyMetal •Prosa-web •Uppsala Electron Density Server •Verify3D Structure Evaluation Server •WHAT_CHECK •WHAT IF

How does a Ramachandran Plot look like?

What's Cryo-electron microscopy?

• Transmission EM at very low temperatures (liquid nitrogen) • Veryhighresolution(3-4Å)

What's Atomic force microscopy?

• Type of Scanning Probe Microscopy (SPM) • Invented in 1985 by IBM • Provides resolution of a fraction of a nanometer

What's Structure-structure alignment and comparison?

Its done by placing them side by side and comparing them.

How are conformational changes analyzed ?

There are two forms open form and closed form . Citrate synthase, ligand induced conformational changes Domain motion and small structural distortions.

Why do we Defining Domains?

Link between domain structure and function -Different structural domains can be associated with different functions. -Enzyme active sites are often at domain interfaces; domain movements play a functional role.

What are the Methods for Identifying Domains?

Domain limits are defined by identifying groups of residues such that the number of contacts between groups is minimized.

What are the Advantages of knowing the complete genome sequence?

• All encoded proteins can be predicted and identified. • The missing functions can be identified and analyzed • Peculiarities and novelties in each organism can be studied • Predictions can be made and verified.

What has changed in protein science of the 20th century to 21th century?

20th century • Few well-studied proteins • Mostly globular with enzymatic activity • Biased protein set 21st century • Many “hypothetical” proteins • Various, often with no enzymatic activity • Natural protein set

What are Properties of the natural protein set?

• Unexpected diversity of even common enzymes (analogous, paralogous, xenologous) • Conservation of the reaction chemistry, but not the substrate specificity • Functional diversity in closely related proteins • Abundance of new structures

What are the conserved in comparative genomics for proteins?

• Those amino acids that are conserved in divergent proteins (archaeal and bacterial, hyperthermophilic and mesophilic) are likely to be important for catalytic activity. • Prediction of the 3D fold and general biochemical function is much easier than prediction of exact biological (biochemical) function. •Reaction chemistry often remains conserved even when sequence diverges almost beyond recognition.

What's Comparative analysis?

Allows us to find subtle sequence similarities in proteins that would not have been noticed otherwise.

What do Sequence database and Sequence analysis function?

-Sequence database searches that use exotic or highly divergent query sequences often reveal more subtle relationships than those using queries from humans or standard model organisms (E. coli, yeast, worm, fly). -Sequence analysis complements structural comparisons and can greatly benefit from them.

What's Protein Evolution?

• Tree of life & evolution of protein families (Dayhoff, 1978) • Can build a tree representing evolution of a protein family, based on sequences • Othologous gene family: organismal and sequence trees match well.

What's Protein Evolution with regards to homologs,orthologs,ans paralogs?

• Homolog ✓ Common ancestors ✓ Common 3D structure ✓ Usually at least some sequence similarity (sequence motifs or more close similarity) • Ortholog ✓ DerivedfromSpeciation • Paralog ✓ DerivedfromDuplication

What's Enzyme recruitment?

Minor mutational changes convert a glycerol kinase into gluconate kinase, that Leads to non-orthologous gene displacement.

What are some traditional thoughts?

• Homologous sequences have similar function • Sites of greatest functional significance are under the strongest selective constraints • Selective constraints can be measured by dN/dS ratio BUT... • Most synonymous substitutions are selectively neutral and therefore occur at a high rate, i.e., are inappropriate to detect functional divergence, if it occurred long ago (over 150 Mya).

What are some new approaches?

• Non-synonymous (replacement) substitutions are analyzed alone • Substitution models, which allow evolutionary rates to vary among sites in a protein-coding sequence according to a gamma distribution: • homogeneous: the functional constraints at sites are constant over the entire evolutionary history. • heterogeneous: some residues might be subject to changed functional constraints in various branches of the phylogenetic tree

What's gamma distribution ?

The mean E(r) of the gamma distribution is the average mutation rate of the selected substitution model and its variance.

What's the gamma distribution formula?

?? =?(?)^2/ ? where a is a shape parameter, which describes the shape of the distribution and the substitution rates for all categories of sites • a increases as the variation in rates among sites decreases • when a→+∞, the gamma model reduces to the single rate model

What's the difference between Homogeneous and non-homogeneous gamma models?

•homogeneous gamma model: -gives rise to two descendent populations (D1 and D2) each of which has the same site- specific rates as the ancestral population. • non- homogeneous gamma model: -gives rise to two descendent populations (D3 and D4) in which the site-specific rates can be different from those in the ancestral population. -each descendent population D3 and D4 contains the same number of slow, moderate and fast sites. that means the same for each of these individual descendent populations .

What's the physiological meaning and evolutionary meaning of functional divergence?

• Physiological meaning: the genes (and respective proteins) diverge by their actual physiological (biochemical) functions. This is tested by so-called “wet” experiments. • Evolutionary meaning: the genes (and respective proteins) diverge by their evolutionary rates, which assumes respective divergence in their physiological functions.