In our TP53 example - 400 nucleotides are amplified to generate +77000 k-mers ("subsequences"), a total of 10,735,712 nucleotides.

Example: each of 4 mRNA or protein sequences [P1-4] of a cell/gene

  1. Compute ncDNA sequence into k-mers >7 letters
    • For sequence 'TGTGGGCCCACA' associated with transcript Protein or mRNA [P1] its k-mers are:
  2. Associate each of these subsequences with signature of P1.
  3. Its highest recurring k-mer is TGGGCCCA.
  4. Compute k-mers ncDNA sequences associated with [P2-4]
  5. Use ncDNA k-mers and their P1-P4 signature associations to discover ncDNA k-mer topology conferred to the set

From this we derive Codondex i-Score and two varieties of Protein Vector, which expose fine distinctions between k-mers of same-gene transcripts using intron-protein/mRNA pairs.

We also discovered and ranked thousands of statistically dominant k-mers in multiple gene transcripts. These are unrelated by their sequence text, are of equal length and recur with equal frequency. This symmetry is unrelated to reverse complements and inverted repeats, which we can precisely predict using k-mer recurrence data exclusively.

