Date Category notebooks

Othologous Gene Clusters and Taxon Signature Genes for Viruses of Prokaryotes (2013)


  • Phage Orthologous Groups (POGs)
  • 1,000 genomes, including genomes of dsDNA (88%), ssDNA (10%), ssRNA and dsRNA (2%). Also archael viruses (still calling it POG though). 93% of the dsDNA belonged to tailed phages of the order Caudovirales.
  • Orthologous groups Edge-Search algorithm. Only E values of <10 and covering at least 50% of the protein lengths. Protein belonging to multiple groups is always an error (only 1% of all proteins).
  • 57 taxa to find signature genes for. Only using those with at least 3 distinct viruses and removing temporary collections or unclassified viruses. Used 100% precision (only correct genomes), then highest recall (found in most genomes).
  • host listing in GenBank of archaeal and bacterial species
  • COG-building method on 97,731 proteins (or domains) from 1,027 virus genomes were clustered into 4,542 POGs
  • POG size from a minimum of 3 proteins from 3 distinct viruses up to 673 proteins from 378 virus. Most of the POGs are small with with a median size of 5 proteins from 5 viruses
  • no POG found in more than 37% of the 1,027 virus genomes. Only 1% of the POGs are shared by more than a fifth of the genomes. Most distant connections by different genes, e.g. ssDNA Microviridae and Inoviridae share a single different POG with dsDNA viruses
  • Functional classification of POGs Substantial fraction including 10 of the top 100 larges POGs are completely uncharacterized.
  • Taxon signature gens In file S5. Top-quality POG signatures in Table 1.

Personal Notes

  • Bacillus phage G, largest known phage genome - Two methods of viral reproduction: lysogenic cycle and lytic cycle - Caudoviralis (caudo means tail, order of viruses). No sequence similarity for DNA or amino acids of families in that order, just morphology.
  • Inovirus (nos means muscle in Greek). ssVirus


comments powered by Disqus