Review for "Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet"

Completed on 12 Feb 2017 by Afif Elghraoui. Sourced from

Login to endorse this review.

Comments to author

I agree and think that a single-character representation for modified bases is important since they are essentially like slightly different nucleotides. I think It is also a much better way of representing the methylome than the way Genbank currently does it (having an associated list of motifs that have been found to be modified).

The publication for the original IUPAC nucleotide nomenclature [1], however, specifically mentioned why they didn't make separate assignments for modified bases:
5.2. Modified nucleotides
In a number of organisms DNA and RNA are modified
at certain positions. For instance, the DNA of Escherichia
coli is usually methylated on N-6 of the adenine residue in the
sequence 5'-GATC-3' [23]. The present nomenclature does not
allocate any specific symbol to these modified nucleotides for
the following reasons. (1) The presence or absence of a given
modification depends upon the location of the DNA.
Sequences modified in one organism may not be modified in
another. (2) Modification is usually statistical, in that only a
proportion of possible sites for modification may actually be
utilised in vivo. Modification of a nucleotide or base in a
given polynucleotide is not a function of the sequence per se.

It might be a good idea to specifically address those points in your manuscript.

As a suggestion for the nomenclature you recommend, I have two concerns. One is that you don't take advantage of modified latin letters [2] to make for a more intutive nomenclature:

My second concern here is that the use of lower-case letters (c vs C) in existing tools should have been considered. In any case, the need for using lower-case letters will likely go away if you take advantage of Latin letters with diacritics.