About this document
MicroRNA.org is a web-accessible information resource produced and maintained at cBio,
the Computational Biology Center at Memorial Sloan-Kettering Cancer Center (MSKCC).
This document details changes in data, programs, and interface for each release, and also
contains general information and references about this resource.
Hyperlinks to document sections:
General Information
August 2010 (current) release notes
September 2008 release notes
January 2008 release notes
Summary
The MicroRNA.org website is a comprehensive resource of microRNA target predictions
and expression profiles. Target predictions are based on a development of the miRanda
algorithm which incorporates current biological knowledge on target rules and on the use
of an up-to-date compendium of mammalian microRNAs. The target sites predicted by miRanda
are scored for likelihood of mRNA downregulation using mirSVR, a regression model that is
trained on sequence and contextual features of the predicted miRNA::mRNA duplex.
Expression profiles are derived from a comprehensive sequencing project of a large set of
mammalian tissues and cell lines of normal and disease origin. This website enables users
to explore:
- The set of genes that are potentially regulated by a particular microRNA.
- The co-occurrence of predicted target sites for multiple microRNAs in an mRNA.
- MicroRNA expression profiles in various mammalian tissues.
The microRNA.org resource currently contains:
- 16228619 predicted microRNA target sites in 34911 distinct 3'UTR from isoforms of 19898 human genes.
- 7459149 predicted microRNA target sites in 28287 distinct 3'UTR from isoforms of 19231 mouse genes.
- 586068 predicted microRNA target sites in 6865 distinct 3'UTR from isoforms of 6256 rat genes.
- 345671 predicted microRNA target sites in 12285 distinct 3'UTR from isoforms of 10532 fruitfly genes.
- 230901 predicted microRNA target sites in 12228 distinct 3'UTR from isoforms of 10124 nematode genes.
- target sites for 1100 human microRNAs.
- target sites for 717 mouse microRNAs.
- target sites for 387 rat microRNAs.
- target sites for 186 fruitfly microRNAs.
- target sites for 233 nematode microRNAs.
Send comments and questions to 
Software and data files
The miRanda source code
is available under the LGPL open source license. The microRNA target predictions and expression data
are available as tab delimited files. See the
Downloads tab of the MicroRNA.org website.
Previous releases
Previous releases are available as tab delimited files
through the Downloads tab of the website. Additionally,
the previous release (September 2008) is available
through the website interface for a limited time at
http://cbio.mskcc.org/microrna-previous
References
-
The mirSVR regression method for predicting likelihood of target mRNA down-regulation from sequence and structure features in microRNA/mRNA predicted target sites.
Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites.
Betel D, Koppal A, Agius P, Sander C, Leslie C.
Genome Biology 2010 11:R90
-
MicroRNA target predictions with expression profiles.
The microRNA.org resource: targets and expression.
Betel D, Wilson M, Gabow A, Marks DS, Sander C.,
Nucleic Acids Res. 2008 Jan; 36(Database Issue): D149-53.
-
Comprehensive cloning and sequencing effort of 172 human, 64 mouse and 16 rat small RNA
libraries extracted from major organs and cell types.
A mammalian microRNA expression atlas based on small RNA library sequencing.
Landgraf P., et al., Cell 2007 Jun 29;129(7):1401-14.
-
First comprehensive computational prediction of microRNA targets in the human genome and first publication of the
hypothesis that a large fraction of human genes (more than 10%) may be regulated by microRNAs.
Human MicroRNA targets.
John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS.
PLoS Biology 2005 Jul;3(7):e264.
-
First implementation of the miRanda dynamic programming algorithm and comprehensive application to the drosophila genome.
MicroRNA targets in Drosophila.
Enright AJ, John B, Gaul U, Tuschl T, Sander C and Marks DS
Genome Biology (2003) 5;R1
August 2010 release notes
Data Sources
- Mature microRNA sequences for human, mouse, rat, fruitfly, and nematode were downloaded from mirBASE 15.0 (April 2010).
- 3' UTR sequences were downloaded from ucsc genome assemblies: human: hg19 (Feb. 2009), mouse: mm9 (July 2007), rat: rn4 (Nov 2004), fruitfly: dm3 (Apr 2006), nematode: ce6 (May 2008).
Changes to the website interface
- mirSVR score is reported for each predicted target site. miRanda alignment score is still
available in the downloadable prediction tables.
- Gene isoforms which have identical 3'UTR are now merged into a single record.
- When a gene has multiple isoforms with distinct 3'UTR, the canonical isoform (from the
ucsc knownCanonical table) defines the default UTR to display in query results. If there
are alternatve UTR for a gene, these are accessible under the default UTR in an expandible
list. If a gene's canonical isoform has no UTR, the longest alternatve 3'UTR becomes the
default.
- mRNA searches will find matches to transcript identifiers (such as kgid or RefSeq), in
addition to ncbi gene ids, gene symbols, or gene synonyms
- When viewing the 3'UTR detail view, you may filter the diplay of predictions. In simple
mode the criteria are conserved v. all mature microRNA, and good v. all mirSVR scores. In
detailed mode there are 4 criteria: conserved v. all mature microRNA, your setting of a
mirSVR score cutoff, your setting of a phastCons conservation score cutoff for the target
site, and 6mer v. 7mer v. all seed classes.
- predicted target sites are now displayed relative to their 3' end in the UTR. This makes
it easier to observe placement of predicted target site groups for mature microRNA
families.
- target sites of conserved microRNA can be distinguished visually by color intensity and
a graphical mark in the alignment display.
Prediction of target sites by miRanda
Because of the website interface's more flexible filtering of predicted target
sites, we have used an alignment score threshold of 120 instead of the previous threshold
of 140. By default, the website interface will filter out many of these lower scoring
sites, but you have several criteria which can be adjusted to allow you to view them.
Scoring of predicted target sites by mirSVR
Details of the mirSVR scoring method can be found in
Betel et al. Genome Biology 2010 11:R90.
Briefly, mirSVR is a regression model that computes a weighted sum of a
number of sequence and context features of the predicted miRNA::mRNA duplex.
The features are broadly divided into three types:
- duplex features which includes base pairing at the seed region,
and 3'end of the miRNA
- sequence features which include A/U composition near
the target sites and secondary structure accessibility
- Global features such as
length of the UTR, relative position of the target site in the UTR and
conservation score.
mirSVR downregulation scores are calibrated to correlate linearly with the
extent of downregulation and therefore enable accurate scoring of genes with
multiple target sites by simple addition of the individual target scores.
Furthermore, the scores can be interpreted as an empirical probability of
downregulation, which provides a meaningful guide for selecting a score cutoff.
In addition, the composite approach of miRanda-generated alignments and mirSVR
scores enables the judicious prediction of non-canonical sites (those with
mismatch or GU wobble in the seed region) without inflating the number of
false predictions. This release includes all target site predictions which have
either a 6-mer or better seed site, or a mirSVR score ≤ -0.1.
MirSVR takes conservation into account when computing a score, so there is no
longer any predefined cutoff for filtering target sites. However, users are able
to restrict the interface display using conservation as a criterion (see above).
The determination of conservation score for predicted target sites is now
sensitive to sites which span intron junctions.
New version of miRanda target prediction algorithm
There have been several minor corrections and simplifications to the miRanda source code.
Changes include:
- Default command line arguments altered to -sc 140 -go -9 -ge -4 -en 1 (no energy filtering)
- Limit on UTR length removed
- Overlap filtering of predicted target sites tightened; requires overlap in seed region to filter
- Offsets corrected in "score for this scan" output lines and -keyval output
September 2008 release notes
Data Sources
- Mature microRNA sequences for human, mouse, and rat were downloaded from mirBASE 11.0 (April 2008).
- 3' UTR sequences were downloaded from ucsc genome assemblies: human: hg18 (Mar. 2006), mouse: mm9 (July 2007), rat: rn4 (Nov 2004).
Changes to the website interface
Changes to microRNA target prediction algorithm
The terminal microRNA nucleotides (first nucleotide and last two nucleotides)
no longer contribute to the alignment score, regardless of base pairing.
For example, in the following alignment:
3' uuCUUUCUCAGA-ACGAAACAGCGGg 5' hsa-miR-1273
|||:|: | | || | |||||:|
5' aaGAAGGGAUGUCUG-UCUGUCGUCc 3' MEX3A
^^ ^
the 3'-end uu nucleotides and the 5'-end g nucleotide do not
contribute to the alignment score even
though they are complementary to nucleotides in the mRNA.
Recent structural data show that these nucleotides are inaccessible to
base-pairing and therefore, are not likely to contribute to target specificity.
[Structure of the guide-strand-containing argonaute silencing complex.
Wang Y, Sheng G, Juranek S, Tuschl T, Patel DJ.
Nature advance online publication 27 August 2008]
Change in conservation score cutoff
For microRNA target predictions in mouse, the
cutoff for minimum phastCons conservation score
at the predicted target site has been
changed from .57 to .566. The current mm9
phastCons conservation score is based on
30 aligned genomes, while the mm8 phastCons conservation
score is based on 17 aligned genomes.
The new threshold is based on a
comparison of 30way to 17way conservation scores
for all target sites that map to both
mm9 and mm8 genome coordinates using liftOver chains.
Figure: Scatterplot of Target Site Conservation Scores
(phastCons 30way v. phastCons 17way)
January 2008 release notes
Data
MicroRNA sequences were collected from MirBase release 10.0.
Mammalian 3'UTR sequences from the rat (rno4), mouse (mm7) and human (hg18) genomes were downloaded from the
UCSC genome browser.
MicroRNA expression profiles were collected from a recently published comprehensive cloning and sequencing
effort of 172 human, 64 mouse and 16 rat small RNA libraries extracted from major organs and cell types
(Landgraf et al., Cell, 129, (2007), 1401-1414). Expression values represent the number of cloned mature
microRNAs that were sequenced in each library and reported as clone counts. The counts are normalized
by the total number of microRNAs that were cloned in each library.
Prediction parameters
Predictions were generated using the 2004 version of the miRanda algorithm (John et al., PLoS Biology, 2, (2004), e363)
with the following command-line options:
Score cutoff S>= 140
Energy cutoff E<= -7.0
Gap opening: -9.0
Gap extension -4.0
5' scaling: 4
The miRNA:mRNA alignment scores are based on optimization of a sum of match values using a dynamic
programming algorithm using the following single-position match values (base-pairing values):
A:U = 5
G:C = 5
G:U = 1
All other base pairs (mismatches) = -3
The match value s(i) is multiplied by a position specific weight w(i) before being evaluated in the alignment algorithm.
The position specific weights reflect the non-homogeneous effect of different positions,
such as the importance of the 'seed' or 'nucleus' generally defined as positions 2-8.
Thus the total score S for a particular alignment is
S = SUM(over i) [ w(i) s(i) ]
where the sum is taken along the alignment trace
(which may include bulges, i.e., unpaired nucleotides, in one of the sequences, evaluated as 'gaps').
| Position s(i) | Weight w(i) |
| 1 | 1.0 |
| 2-8 | 4.0 |
| 9-21 | 1.0 |
Note that a perfect heptamer match in positions 2-8 corresponds to a score of 140 (5*4*7),
where 5 reflects a match, 4 the positional weight and 7 the number of positions.
Caveat:
These base-pairing scores were optimized against a limited set of validated targets.
We are currently (early 2008) in the process of refining these values and will probably refine the parameters by mid-2008.
Therefore, target predictions may differ in subsequent releases. Users will have access to previous, archived releases through
links on the website.
Conservation
We used PhastCons conservation score, which measures the evolutionary conservation of sequence blocks across
multiple vertebrates using a phylogenetic hidden Markov model (Siepel et al., Genome research, 15, (2005),
1034-1050.), to filter out less conserved predicted target sites. Target sites are filtered for PhastCons 0.57,
which roughly corresponds to conservation across all mammals.