Biohunt Grants
Codondex i-Score - sequence amplification

A count of subsequence recurrence or inter-inclusiveness based on a one-nucleotide advance.

A sequence has to be patterned before we calculate Codondex i-Score.

Step 1: Pattern Function

For each Si in a given sequence S0S1...Sn, a pattern function generates a set of subsequences that, inter alia, preserve sequences between letters in the original string S0S1...Sn. The subsequence is preferably ordered and preferably includes the characters which are directly adjacent to Si in the forward direction and substring length is greater or equal than 8.

Pattern(Si) = Si-7...Si | Si-8...Si | Si-9...Si | ... | S0S1...Si

For example: G0T1G2G3G4C5C6C7A8G9A10C11

Pattern(G0) = Empty (no subsequences is longer than 8)

Pattern(C7) = GTGGGCCC

Pattern(A8) = TGGGCCCA | GTGGGCCCA

Pattern(G9) = GGGCCCAG | TGGGCCCAG | GTGGGCCCAG

...

Pattern(C11) = GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC

Then we put all the subsequences together.

For example: GTGGGCCCAGAC

After patterned:

GTGGGCCC | TGGGCCCA | GTGGGCCCA | GGGCCCAG | TGGGCCCAG | GTGGGCCCAG | GGCCCAGA | GGGCCCAGA |
TGGGCCCAGA | GTGGGCCCAGA | GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC

Among them,

Subsequence0 = GTGGGCCC

Subsequence1 = TGGGCCCA

...

Subsequence14 = GTGGGCCCAGAC

Step 2: Calculate Counts

2.1 Offset Count Bigger Length

For each subsequence we compare that against all the subsequences, the count increments by 1 if target subsequence is longer than and contains compared subsequence:

For example:

GTGGGCCC | TGGGCCCA | GTGGGCCCA | GGGCCCAG | TGGGCCCAG | GTGGGCCCAG | GGCCCAGA | GGGCCCAGA | TGGGCCCAGA | GTGGGCCCAGA | GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC

Offset_Count_Bigger_Length(Subsequence2) = 3      (matches offset5, offset9, offset14)

2.2 Offset Count Ignore Length

For each subsequence we compare that against all the subsequences, the count increments by 1 if either target subsequence contains compared subsequence or compared subsequence contains target subsequence:

For example:

GTGGGCCC | TGGGCCCA | GTGGGCCCA | GGGCCCAG | TGGGCCCAG | GTGGGCCCAG | GGCCCAGA | GGGCCCAGA | TGGGCCCAGA | GTGGGCCCAGA | GCCCAGAC | GGCCCAGAC | GGGCCCAGAC | TGGGCCCAGAC | GTGGGCCCAGAC

Offset_Count_Ignore_Length(Subsequence2) = 6      (matches offset0, offset1, offset2, offset5, offset9, offset14)

Step 3: Calculate Codondex i-Score

Codondex i-Score(Subsequencei) = ( Offset_Count_Ignore_Length(Subsequencei) - Offset_Count_Bigger_Length(Subsequencei) ) ÷ Length(Subsequencei)

For above example:

Codondex i-Score(Subsequence2) = 0.3333333333333333

Step 4: PhV (hash) and PiV (Codondex i-Score) - Optional

Reverse complimentary offset (coffset) and inverted repeats are as described in classical biology.

Protein Codondex i-Score Vector (PiV) and Protein hash Vector (PhV) are measures to determine the relative ranking of each subsequence in the subsequence-protein/mRNA set derived from same gene transcripts. Described in further detail in patent application PCT/US2015/030478