Unification of miRNA and isomiR research: the mirGFF3 format and the mirtop API

Abstract

Background: MicroRNAs (miRNAs) are small RNA molecules (̃22 nucleotide long) involved in post-transcriptional gene regulation. Advances in high-throughput sequencing technologies led to the discovery of isomiRs, which are miRNA sequence variants. While many miRNA-seq analysis tools exist, a lack of consensus on miRNA/isomiR analyses exists, and the resulting diversity of output formats hinders accurate comparisons between tools and precludes data sharing and the development of common downstream analysis methods. Findings: To overcome this situation, we present here a community-based project, miRTOP (miRNA Transcriptomic Open Project) working towards the optimization of miRNA analyses. The aim of miRTOP is to promote the development of downstream analysis tools that are compatible with any existing detection and quantification tool. Based on the existing GFF3 format, we first created a new standard format, mirGFF3, for the output of miRNA/isomiR detection and quantification results from small RNA-seq data. Additionally, we developed a command line Python tool, mirtop, to manage the mirGFF3 format. Currently, mirtop can convert into mirGFF3 the outputs of commonly used pipelines, such as seqbuster, miRge2.0, isomiR-SEA, sRNAbench, and Prost!, as well as BAM files. Its open architecture enables any tool or pipeline to output results in mirGFF3. Conclusions: Collectively a comprehensive isomiR categorization system, along with the accompanying mirGFF3 and mirtop API provide a complete solution for the standardization of miRNA and isomiR analysis, enabling data sharing, reporting, comparative analyses, and benchmarking, while promoting the development of common miRNA methods focusing on downstream steps to miRNA detection, annotation, and quantification.

Publication
bioRxiv
Date