About JASPAR

What is JASPAR?

JASPAR is a collection of transcription factor DNA-binding preferences, modeled as matrices. These can be converted into Position Weight Matrices (PWMs or PSSMs), used for scanning genomic sequences.

JASPAR is the only database with this scope where the data can be used with no restrictions (open-source). For a comprehensive review of models and how they can be used, please see the following reviews

All JASPAR CORE profile stats

  • Vertebrates
  • Plants
  • Insects
  • Nematodes
  • Fungi
  • Urochordates

JASPAR CORE and it's Collections and when should it be used?

The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. The prime difference to similar resources (TRANSFAC, etc) consist of the open data acess, non-redundancy and quality.
When should it be used? When seeking models for specific factors or structural classes, or if experimental evidence is paramount
JASPAR CNE is a collection of 233 matrix profiles derived by Xie et al. (PNAS 2007) by clustering of overrepresented motifs from human conserved non-coding elements. While the biochemical and biological role of most of these patterns is still unknown, Xie et al. have shown that the most abundant ones correspond to known DNA-binding proteins, among them is the insulator-binding protein CTCF.
When should it be used? When characterizing of regulatory inputs in long-range developmental gene regulation in vertebrates.
The JASPAR FAM database consist of models describing shared binding properties of structural classes of transcription factors. These types of models can be called familial profiles, consensus matrices or metamodels. The models have two prime benefits: 1) Since many factors have similar tagrget sequences, we often experience multiple predictions at the same locations that correspond to the same site. This type of models reduce the complexity of the results. 2) The models can be used to classify newly derived profiles (or project what type of structural class its cognate transcription factor belongs to).
When should it be used? When searching large genomic sequences with no prior knowledge. For classification of new user-supplied profiles.
All the PBM collections are built by using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE. The PBM collection is the set derived by Badis et al (Science 2009) from binding preferences of 104 mouse transcription factors.
When should it be used? When characterizing of regulatory inputs in long-range developmental gene regulation in vertebrates.
All the PBM collections are built by using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE. The PBM HLH, collection is the set derived by Grove et al (Cell 2008). It holds 19 C. elegans bHLH transcription factor models
When should it be used? Where it is important that each matrix was derived using the same protocol, focusing on bHLH factors
All the PBM collections are built by using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE.The PBM, collection is the set derived by Berger et al (Cell 2008) including 176 profiles from mouse homeodomain.
When should it be used? Where it is important that each matrix was derived using the same protocol, focused on homeobox factors
The JASPAR PHYLOFACTS database consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements.See Xie et al., Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals., Nature 434, 338-345 (2005) and supplementary material for details.
When should it be used? The JASPAR PHYLOFACTS matrices are a mix of known and as of yet undefined motifs. They are useful when one expects that other factors might determine promoter characteristics, such as structural aspects and tissue specificity. They are highly complementary to the JASPAR CORE matrices, so are best used in combination with this matrix set.
The JASPAR POLII database consist of models describing patterns found in RNA Polymerase II (Pol II) promoters. Some of these correposnd to a known protein (like the TATA box), while some has no specific interactor (like DPE). Models are taken from published literature or public databases.
When should it be used? When investigating core promoters from multicellular eukaryotes
JASPAR SPLICE is a small collection contains matrix profiles of human canonical and non-canonical splice sites, as matching donor:acceptor pairs. It currently contains only 6 highly reliable profiles obtained from human genome made by Chong et al. 2004
When should it be used? When analyzing splice sites and alternative splicing.

JASPAR CORE data growth per release