A count of subsequence recurrence or inter-inclusiveness based on a one-nucleotide advance.
A sequence has to be patterned before we calculate Codondex i-Score.
Step 1: Pattern Function
For each Si in a given sequence S0S1...Sn, a pattern function generates a set of subsequences that, inter alia, preserve sequences between letters in the original string S0S1...Sn. The subsequence is preferably ordered and preferably includes the characters which are directly adjacent to Si in the forward direction and substring length is greater or equal than 8.
Pattern(Si) = Si-7...Si | Si-8...Si | Si-9...Si | ... | S0S1...Si
For example: G0T1G2G3G4C5C6C7A8G9A10C11
Pattern(G0) = Empty (no subsequences is longer than 8)
Pattern(C7) = GTGGGCCC
Pattern(A8) = TGGGCCCA | GTGGGCCCA
Pattern(G9) = GGGCCCAG | TGGGCCCAG | GTGGGCCCAG
...
Pattern(C11) = GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC
Then we put all the subsequences together.
For example: GTGGGCCCAGAC
After patterned:
GTGGGCCC | TGGGCCCA | GTGGGCCCA | GGGCCCAG | TGGGCCCAG | GTGGGCCCAG | GGCCCAGA | GGGCCCAGA |
TGGGCCCAGA | GTGGGCCCAGA | GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC
Among them,
Subsequence0 = GTGGGCCC
Subsequence1 = TGGGCCCA
...
Subsequence14 = GTGGGCCCAGAC
Step 2: Calculate Counts
2.1 Offset Count Bigger Length
For each subsequence we compare that against all the subsequences, the count increments by 1 if target subsequence is longer than and contains compared subsequence:
For example:
GTGGGCCC | TGGGCCCA | GTGGGCCCA | GGGCCCAG | TGGGCCCAG | GTGGGCCCAG | GGCCCAGA | GGGCCCAGA | TGGGCCCAGA | GTGGGCCCAGA | GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC
Offset_Count_Bigger_Length(Subsequence2) = 3 (matches offset5, offset9, offset14)
2.2 Offset Count Ignore Length
For each subsequence we compare that against all the subsequences, the count increments by 1 if either target subsequence contains compared subsequence or compared subsequence contains target subsequence:
For example:
GTGGGCCC | TGGGCCCA | GTGGGCCCA | GGGCCCAG | TGGGCCCAG | GTGGGCCCAG | GGCCCAGA | GGGCCCAGA | TGGGCCCAGA | GTGGGCCCAGA | GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC
Offset_Count_Ignore_Length(Subsequence2) = 6 (matches offset0, offset1, offset2, offset5, offset9, offset14)
Step 3: Calculate Codondex i-Score
Codondex i-Score(Subsequencei) = ( Offset_Count_Ignore_Length(Subsequencei) - Offset_Count_Bigger_Length(Subsequencei) ) ÷ Length(Subsequencei)
For above example:
Codondex i-Score(Subsequence2) = 0.3333333333333333
Step 4: PhV (hash) and PiV (Codondex i-Score) - Optional
Reverse complimentary offset (coffset) and inverted repeats are as described in classical biology.
Protein Codondex i-Score Vector (PiV) and Protein hash Vector (PhV) are measures to determine the relative ranking of each subsequence in the subsequence-protein/mRNA set derived from same gene transcripts. Described in further detail in patent application PCT/US2015/030478