Show/Hide
compgenomics.png

Comparative Genomics Tools

This interface allows the user to define a set of genomes and displays a tree showing the genomic similarity.

The genomic similarity is estimated using Mash, a software that computes a distance between two genomes. This distance is correlated to the ANI like: D ≈ 1-ANI

From all the pairwise distances of the genomes set, a tree is construct dynamically using the neighbor-joining javascript package.

The tree displays clustering annotations. This clustering has been computed from all-pairs distances ≤ 0.06 (≈94% ANI) that correspond to the ANI standard to define a species group.

The clustering has been computed using the Louvain Community Detection.

This interface allows the user to search for common OR specific genes/regions between a query genome/replicon and other genomes/replicons chosen from the ones available in our PkGDB database (i.e, (re)annotation of bacterial genomes or complete proteome downloaded from the RefSeq/WGS sections).
This interface allows the user to search for potentially horizontally transferred genes (HGT) which are gathered in genomic regions (Region of Genomic Plasticity). The RGP_Finder method first starts with the identification of synteny breaks between a query genome and other close genomes chosen from the ones available in our PkGDB database. Then it searches for HGT features (tRNA hotspot, mobility genes), and for compositional bias (AlienHunter (Vernikos and Parkhill, 2006), SIGI-HMM (Waack et al., 2006), and GC deviation computation) in the query genome. RGP_Finder is able to characterize genomic regions presenting both to a synteny break and several features specific to Genomic Islands, regions with HGT features only, and regions associated with synteny break only. The graphical interfaces associated to this tool are useful to explore in detail the predicted regions, using also the comparative genomic context available in MaGe.
This interface allows the user to search for potentially horizontally transferred genes (HGT) which are gathered in genomic regions (Region of Genomic Plasticity). The PanRGP tool is based on a pangenome partitioned graph, computed by the PPanGGOLiN method. From a partitioned genome, it applies a score-based algorithm to predict RGPs (>3kbp).
This tool draws a global comparison, based on synteny results (the size of which can be selected by the user) between 2 bacterial genomes. The picture gives an overview of the conservation of synteny groups between the query genome and another genome chosen from the ones available in our PkGDB database (i.e, (re)annotation of bacterial genomes or complete proteome downloaded from the RefSeq/WGS sections).
This tool provides a list of candidate genes of a query genome potentially involved in a fusion or a fission event. These events are computed from the synteny results obtained with the genomes available in the PkGDB database. They are ordered using a score which reflect the "originality" of the event. The lowest scores are generally associated to events predicted because of the presence of pseudogenes either in the query genome (fission) or in the compared genomes (fusion).
This tool provides some statistics about the similarity results between the selected organism and all the genomes available in our PkGDB database. Among the computed values between two compared genomes are: the number and percentage of genes which are in BBH (Bidirectional Best Hit) and in synteny groups, the synteny groups number and size, etc. Note that, given the MicroScope re-annotation procedure on public genomes integrated in PkGDB, these values can slightly be different from the ones obtained in the section "RefSeq Synteny Statistics".
This tool provides some statistics about the similarity results between the selected organism and all the bacterial genomes available in RefSeq/WGS NCBI sections. Among the computed values between two compared genomes are: the number and percentage of genes which are in BBH (Bidirectional Best Hit) and in synteny groups, the synteny groups number and size, etc.
This interface provides an analysis of the pan-genome and its components (core-genome, variable-genome) for an organism set. It use the MicroScope gene families (MICFAM) which are computed with the SiLiX software (« Ultra-fast sequence clustering from similarity networks with SiLiX. », Miele V et al., 2011).
It allows the users to:
  • Compute pan-genome and core-genome sizes and their evolutions for a genome set
  • Determine the common and variable genome proportion for each genome
  • Exclude the pan, core and variable-genome of another genome set to the analysis
  • Extract core-genome, variable-genome and strain specific sequences and annotations.
The Comprehensive Antibiotic Resistance Database CARD is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence.
The MicroScope virulence database has been built upon VFDB and VirulenceFinder data. VFDB virulence factor classification has been completed as best as possible with new terms and gene associations. The database is divided into 3 categories: VFDB experimentally demonstrated data, VirulenceFinder, E. coli main virulence genes. Results are obtained by running BLASTp on organism proteins against MicroScope virulence database.
Integrons are major genetic element, notorious for their major implication in the spread of antibiotic resistance genes. More generally, integrons are gene-capturing platform, whose broader evolutionary role remains poorly understood. IntegronFinder v2.0.2 is able to detect with high accuracy integron in DNA sequences. Its detection methods combines HMM profiles for the detection of integron integrases and Covariance Models for the detection of attC sites.

This page displays macromolecular systems found in the genome. We use 2 tools to detect such systems.


MacSyFinder v1.0.2 is a program to model and detect macromolecular systems, genetic pathways, etc. in protein datasets. In prokaryotes, these systems have often evolutionarily conserved properties: they are made of conserved components, and are encoded in compact loci (conserved genetic architecture). Its detection methods work by searching components of the systems by sequence similarity using Hidden Markov Models (HMM) and analyzing the content and organization of the system.


CRISPRCasFinder v4.2.19 predicts Clustered regularly interspaced short palindromic repeats (CRISPR) arrays.

This page displays prophages and defense systems found in the genome.


A prophage is a bacteriophage genome that is integrated within a prokaryote genome. We use Phigaro to detect such regions.


A defense system is a molecular system used to defend the prokaryote against bacteriophages. We use DefenseFinder to detect such regions.

RSS Latest News