Now, for the question "What if TAG immediately follows ATG?" That's an excellent question. It would consitute a trivial case and such a substring would be be rejected. In reality, one can expect a number of substrings that would satisfy all three conditions.
As an absolute rookie in that matter, I have the following question:
Is it possible to have the following sequence into a
single reading frame (+1, +2, or +3) ?
ATG ... ATG ... TAG
i.e. does it every ATG must be strictly followed by a TAG in a
single reading frame?
I am already aware that two sequences can overlap into
multiple reading frames, e.g. MT-ATP6, MT-ATP8 genes.
We usually choose the longest one, as the sequence with the highest probability of coding for a protein.
What do you mean by "longest one"? If it is supposed that every ATG to be strictly followed by TAG (in a single frame), then shouldn't they be considered separate entities for processing?
Otherwise, if ATG can be followed by another ATG (in a single frame), which one is considered "longest one"?
ATG(1) ... ATG(2) ... TAG(3) ... TAG(4)
* from (1) to (4), i.e. TAG(3) ends ATG(2)
* longer from (1) to (3), and (2) to (4), i.e. TAG(3) ends ATG(1) and TAG(4) ends ATG(2)
As I said, I'm a rookie in genetics, so forgive me for the stupid questions.