As sequencing throughput approaches dozens of gigabases per day, there is a growing need for efficient software for analysis of transcriptome sequencing (RNA-Seq) data. Myrna is a cloud-computing pipeline for calculating differential gene expression in large RNA-Seq datasets. We apply Myrna to the analysis of publicly available data sets and assess the goodness of fit of standard statistical models. Myrna is available from http://bowtie-bio.sf.net/myrna.
Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.
Background:
MicroRNAs (miRNAs), which originate from precursor transcripts with stem-loop structures, are essential gene expression regulators in eukaryotes.
Results:
We report 19 miRNA precursors in Arabidopsis that can yield multiple distinct miRNA-like RNAs in addition to miRNAs and miRNA*s. These miRNA precursor-derived miRNA-like RNAs are often arranged in phase and form duplexes with an approximately two-nucleotide 3'-end overhang. Their production depends on the same biogenesis pathway as their sibling miRNAs and does not require RNA-dependent RNA polymerases or RNA polymerase IV. These miRNA-like RNAs are methylated, and many of them are associated with Argonaute proteins. Some of the miRNA-like RNAs are differentially expressed in response to bacterial challenges, and some are more abundant than the cognate miRNAs. Computational and expression analyses demonstrate that some of these miRNA-like RNAs are potentially functional and they target protein-coding genes for silencing. The function of some of these miRNA-like RNAs was further supported by their target cleavage products from the published small RNA degradome data. Our systematic examination of public small-RNA deep sequencing data from four additional plant species (Oryza sativa, Physcomitrella patens, Medicago truncatula and Populus trichocarpa) and four animals (Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila) shows that such miRNA-like RNAs exist broadly in eukaryotes.
Conclusions:
We demonstrate that multiple miRNAs could derive from miRNA precursors by sequential processing of Dicer or Dicer-like proteins. Our results suggest that the pool of miRNAs is larger than was previously recognized, and miRNA-mediated gene regulation may be broader and more complex than previously thought.
The Galaxy package empowers regular users to perform rich DNA sequence analysis through a much-needed and user-friendly graphical web interface.See research article http://genomebiology.com/2010/11/8/R86Research highlightWith the advent of affordable and high-throughput DNA sequencing, sequencing is becoming an essential component in nearly every genetics lab. These data are being generated to probe sequence variations, to understand transcribed, regulated or methylated DNA elements, and to explore a host of other biological features across the tree of life and across a range of environments and conditions. Given this deluge of data, novices and experts alike are facing the daunting challenge of trying to analyze the raw sequence data computationally. With so many tools available and so many assays to analyze, how can one be expected to stay current with the state of the art? How can one be expected to learn to use each tool and construct robust end-to-end analysis pipelines, all while ensuring that input formats, command-line options, sequence databases and program libraries are set correctly? Finally, once the analysis is complete, how does one ensure the results are reproducible and transparent for others to scrutinize and study?In an article published in Genome Biology, Jeremy Goecks, Anton Nekrutenko, James Taylor and the rest of the Galaxy Team (Goecks et al. 1) make a great advance towards resolving these critical questions with the latest update to their Galaxy Project. The ambitious goal of Galaxy is to empower regular users to carry out their own computational analysis without having to be an expert in computational biology or computer science. Galaxy adds a desperately needed graphical user interface to genomics research, making data analysis universally accessible in a web browser, and freeing users from the minutiae of archaic command-line parameters, data formats and scripting languages. Data inputs and computational steps are selected from dynamic graphical menus, and the results are displayed in intuitive plots and summaries that encourage interactive workflows and the exploration of hypotheses. The underlying data analysis tools can be almost any piece of software, written in any language, but all their complexity is neatly hidden inside of Galaxy, allowing users to focus on scientific rather than technical questions.
Background:
Adenocarcinomas of the tongue are rare and represent the minority (20-25%) of salivary gland tumors affecting the tongue. We investigated the utility of massively parallel sequencing to characterize an adenocarcinoma of the tongue, before and after treatment.
Results:
In the pre-treatment tumor we identified 7,629 genes within regions of copy number gain. 1,078 genes exhibited increased expression relative to the blood and unrelated tumors and four genes contained somatic protein-coding mutations. Our analysis suggested the tumor cells were driven by the REToncogene. Genes whose protein products are targeted by the RET inhibitors sunitinib and sorafenib correlated with being amplified and or highly expressed. Consistent with our observations administration of sunitinib was associated withstable disease lasting 4 months, after which the lung lesions began to grow. Administration of sorafenib and sulindac provided disease stabilization for an additional 3 months after which the cancer progressed and new lesions appeared. A recurring metastasis possessed 7,288 genes within copy number amplicons, 385 genes exhibiting increased expression relative to other tumours and 9 new somatic protein coding mutations. The observed mutations and amplifications were consistent with therapeutic resistance arising through activation of the MAPK and AKT pathways.
Conclusions:
We conclude that complete genomic characterization of a rare tumour has the potential to aid in clinical decision making and identifying therapeutic approaches where no established treatment protocols exist. These results also provide directin-vivo genomic evidence for mutational evolution within a tumor under drug selection and potential mechanisms of drug resistance accrual.
Background:
A very early step in splice site recognition is exon definition, a process that is as yet poorly understood. Communication between the two ends of an exon is thought to be required for this step. We report genome-wide evidence for exons being defined through the combinatorial activity of motifs located in flanking intronic regions.
Results:
Strongly co-occurring motifs were found to specifically reside in four intronic regions surrounding a large number of human exons. These paired motifs occur around constitutive and alternative exons but not pseudo exons. Most co-occurring motifs are limited to intronic regions within 100 nucleotides of the exon. They are preferentially associated with weaker exons. Their pairing is conserved in evolution and they exhibit a lower frequency of single nucleotide polymorphism when paired. Paired motifs display specificity with respect to distance from the exon borders and in constitutive versus alternative splicing. Many resemble binding sites for heterogeneous nuclear ribonucleoproteins. Specific pairs are associated with tissue-specific genes, the higher expression of which coincides with that of the pertinent RNA binding proteins. Tested pairs acted synergistically to enhance exon inclusion, and this enhancement was found to be exon-specific.
Conclusions:
The exon-flanking sequence pairs identified here by genomic analysis promote exon inclusion and may play a role in the exon definition step in pre- mRNA splicing. We propose a model in which multiple concerted interactions are required between exonic sequences and flanking intronic sequences to effect exon definition.
Nuclear transcription factors have been detected in mammalian mitochondria and may directly regulate mitochondrial gene expression. Emerging genomics techniques may overcome outstanding challenges in this field.
Maintaining up-to-date annotation on reference genomes is becoming more important, not less, as the ability to rapidly and cheaply resequence genomes expands.
Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.
MirSVR is a new machine learning method for ranking microRNA target sites by a down-regulation score. The algorithm trains a regression model on sequence and contextual features extracted from miRanda-predicted target sites. In a large-scale evaluation, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting the extent of their downregulation at the mRNA or protein levels. Importantly, the method identifies a significant number of experimentally determined non-canonical and non-conserved sites.