Lazarus

Free Pascal => Beginners => Topic started by: BornAgain on August 30, 2021, 11:23:46 pm

Title: How to determine if a result is an integer? [SOLVED]
Post by: BornAgain on August 30, 2021, 11:23:46 pm
I have a for loop where I am parsing through a string. Every time I parse through 3 characters in the string, I need to test the triplet of characters to see if it is a specific combination of letters (say ATT). I thought I would declare Num as an integer, assign the value J/3 to Num (where J is the index variable in the for loop) and see if Num is an integer. If it is, then J is a multiple of 3 and I can check the triplet. If it's not, then I want the loop to check the next character in the string.

Here is what I tried;

Code: Pascal  [Select][+][-]
  1. procedure TForm1.RunCmdBtnClick(Sender: TObject);
  2. var
  3.     J, Num: integer;
  4.  
  5.    Str, Triplet, : string;
  6.    IntArray: Array of integer;
  7. begin
  8. ....
  9.         for J := 1 to Length(Str) do
  10.         begin
  11.            Num := J/3;
  12.            if Num in IntArray then
  13.            begin
  14.             //
  15.            end
  16.            else continue;
  17. ...
  18.  

This did not work. It did not recognize "if Num in IntArray."

So I tried the following:

Code: Pascal  [Select][+][-]
  1. procedure TForm1.RunCmdBtnClick(Sender: TObject);
  2. var
  3.     J, Num: integer;
  4.  
  5.    Str, Triplet, : string;
  6.    IntArray: Array of integer;
  7. begin
  8. ....
  9.         for J := 1 to Length(Str) do
  10.         begin
  11.            Num := J/3;
  12.            try
  13.              Num := J/3;
  14.            except
  15.              on e: Incompatible types do
  16.              continue;
  17.            end;
  18. ...
  19.  

I thought if J/3 was not an integer, it would throw an exception and the "except on" would catch it. Unfortunately, I cannot even compile it as the "Num = J/3" is the problem (two errors: incompatible types, not overloaded).

A bit stuck. Help would be appreciated.

 

Title: Re: How to determine if a result is an integer?
Post by: BornAgain on August 31, 2021, 12:32:49 am
Thank you, @cdesim. Yes, I apologize for not being very clear. Let me explain. I am working with DNA sequences. A DNA sequence can look like this:

ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG

I need to find a substring here that 1) begins with the triplet ATG, ends with the triplet TAG, and is a multiple of 3.

I have highlighted the substring here:

ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG

In my code, I began to search for triplets, starting with the very first letter, so the triplets I would be checking out would be ACT, GCT, AAT, TGA, TTT, GGA, CTT, GGT, and AG. Clearly, there is no ATG among these triplets. So I now skip the first letter and start with C (that is I will look at the second frame in the string). These triplets would be CTG, CTA, ATG, ATT, TGG, ACT, TTG, TAG. I needn't go any further because all three of my criteria  have been met (starting with ATG, ending with TAG and the substring is a multiple of 3). I will need to check out the 3rd frame too (the one where I would skip the first two letters and the triplets would begin with TGC.

Since I wanted to look at triplets, I knew J/3 had to be an integer.

Hope this helps.

Title: Re: How to determine if a result is an integer?
Post by: jamie on August 31, 2021, 01:10:34 am
This looks like Home work..

Being that, you can investigate "POS" and "POSex" and MOD 3 operations  But I bet you can't use those ?


Title: Re: How to determine if a result is an integer?
Post by: alpine on August 31, 2021, 01:11:49 am
@BornAgain
To check J is divisible by N (in your case N=3) you do:
Code: [Select]
if J mod N = 0 then
begin
  ...
end;
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on August 31, 2021, 01:23:41 am
Thank you, @y.ivanov. I am still a newbie, so thanks for the help.

And @jamie, I assure you it is not homework, unless you count a self-imposed task as homework. Anyway, thanks for your suggestions too, although I don't get the need for sarcasm. This is a beginner's post, after all.
Title: Re: How to determine if a result is an integer?
Post by: speter on August 31, 2021, 01:54:32 am
I think Jamie was giving you the correct advice.

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. const
  3.   seq = 'ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  4.   m1 = 'ATG';
  5.   m2 = 'TAG';
  6. var
  7.   a,b : byte;
  8. begin
  9.   a := pos(m1,seq);
  10.   if a > 0 then
  11.     begin
  12.       memo1.append(m1+' found at pos '+a.tostring);
  13.       b := pos(m2,seq,a+1);
  14.  
  15.       if b > 0 then
  16.         begin
  17.           memo1.append(m2+' found at pos '+b.tostring);
  18.           memo1.append('sequence = '+copy(seq,a,b-a+3));
  19.         end
  20.       else
  21.         memo1.append(m2+' not found');
  22.     end
  23.   else
  24.     memo1.append(m1+' not found');
  25. end;

The code above assumes there is a Memo (memo1) and Button (button1) on a form...

cheers
S.
Title: Re: How to determine if a result is an integer?
Post by: alpine on August 31, 2021, 02:07:49 am
@speter
Then you should check the (b-a) is divisible by 3 :)
Let BornAgain find it's own way, IMHO it is best for a newbie.
Maybe he'll guess that he can speed up by skipping groups and then no need to check divisibility, etc.
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on August 31, 2021, 02:36:29 am
Thank you, @speter. The concept in your code is exactly what I was writing code for, but your code is much more concise and no doubt more efficient. I will try and learn from your code.

By the way, I did thank @jamie. What I thought was not necessary was the sarcasm. But then, I suppose that's the price one pays when one asks for help!
Title: Re: How to determine if a result is an integer?
Post by: winni on August 31, 2021, 03:24:23 am
Hi!

Keep it simple:

Code: Pascal  [Select][+][-]
  1. uses ....., strUtils;
  2.  
  3. procedure TForm1.Button2Click(Sender: TObject);
  4. const data : string=    'ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  5.       TripStart = 'ATG';
  6.       TripEnd  = 'TAG';
  7. var p,q, offset, delta: Integer;
  8.     msg : string;
  9. begin
  10.   offset := 1;
  11.   repeat
  12.   p := PosEx(TripStart,data,offset);
  13.   if p > 0 then
  14.     begin
  15.     offset := p+3;
  16.     q := PosEx(TripEnd,data,offset);
  17.     if q >0 then
  18.        begin
  19.         delta := (q-p ) mod 3;
  20.         if delta = 0 then msg := 'Hit' else msg := 'No hit';
  21.         msg := msg +lineEnding+ TripStart+' at '+IntToStr(p)+
  22.                      lineEnding+TripEnd+' at '+IntToStr(q) ;
  23.         showMessage (msg);
  24.          offset := q+3;
  25.  
  26.           end;
  27.         end;
  28.  
  29.         until (p=0) or (q=0);
  30.   showMessage ('No more hits');
  31. end;
  32.  

Winni
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on August 31, 2021, 07:47:00 am
Oh wow, now the experts are arguing over the best code! I love it. Thank you all. And yes, @cdesim, this is interesting indeed. It is called ORF finding ("Open Reading Frame", if you're interested). This helps us search for protein coding sequences in DNA. NCBI (National Center for Biotechnology Information) has a site where they do a great job and show you the ORFs in graphical form. Unfortunately, you can only search for ORFs for one sequence at a time there. I want to batch-process and number of sequences at one go and that is why I am attempting to write this code. So yes, I guess this is my "homework." I don't write code as a routine...only when I have to. And then, clearly, I see that I can use help!

Now, for the question "What if TAG immediately follows ATG?" That's an excellent question. It would consitute a trivial case and such a substring would be be rejected. In reality, one can expect a number of substrings that would satisfy all three conditions. We usually choose the longest one, as the sequence with the highest probability of coding for a protein.

Thanks for your interest.
Title: Re: How to determine if a result is an integer?
Post by: Kays on August 31, 2021, 08:23:32 am
I have a for loop where I am parsing through a string. […]
Maybe of interest for you is a forum topic from a year ago: A codon challenge (https://forum.lazarus.freepascal.org/index.php/topic,51319.0.html).

[…] Unfortunately, I cannot even compile it as the "Num = J/3" is the problem (two errors: incompatible types, not overloaded). […]
As it’s already pointed out to you, in Pascal you use the div operator (https://wiki.freepascal.org/Div) for integer/integer division yielding an integer. You can always use trunc (https://wiki.freepascal.org/Trunc)/round (https://wiki.freepascal.org/Round) to get an integer value anyway. [I just wanted to give you some links here.]

[…] a useful way to determine if a string is an integer is StrToIntDef. […]
This does not consider all possibilities to denote an integer value though. Confer Rosetta Code: Determine if a string is numeric (https://www.rosettacode.org/wiki/Determine_if_a_string_is_numeric#Free_Pascal).
Title: Re: How to determine if a result is an integer?
Post by: winni on August 31, 2021, 08:39:24 am

@winini

You are not doing it right. Try CTGCTAATGGTATGAGGACTTGGTAG and you'll see:

Code: Pascal  [Select][+][-]
  1. No hit
  2. ATG at 8
  3. TAG at 25
  4. No more hits

Hi!

Try to type my name correct.

And

(25 -8)  mod 3 = 2

Only when mod 3 = 0 then it is a hit.
So this is totaly ok.

Winni
Title: Re: How to determine if a result is an integer?
Post by: engkin on August 31, 2021, 09:17:44 am
...but there's no mod 3. Can we do this with a regular expression?

Not going to be the fastest, but here is one:
(ATG(?:[A-Z]{3})+TAG)



It needs a better PosEx that moves in multiples of 3, or even SIMD one.
Title: Re: How to determine if a result is an integer?
Post by: Seenkao on August 31, 2021, 10:21:45 am
так можно узнать, делится число на 3  или нет, если оно целое.

google translate:
so you can find out whether a number is divisible by 3 or not, if it is an integer.
Code: Pascal  [Select][+][-]
  1. numOst := j mod 3;
  2. if numOst = 0 then
  3.   Num := j / 3;
  4. ...

P.S. решение уже было... не дочитал. (the decision has already been ... I have not finished reading.)
Title: Re: How to determine if a result is an integer?
Post by: alpine on August 31, 2021, 11:45:20 am
...but there's no mod 3. Can we do this with a regular expression?

Not going to be the fastest, but here is one:
(ATG(?:[A-Z]{3})+TAG)

Hmm, it is a greedy one... may be it should be something like
Code: [Select]
(ATG([ACGT]{3})+?TAG)
Title: Re: How to determine if a result is an integer?
Post by: alpine on August 31, 2021, 02:38:45 pm
@cdesim
BornAgain did not say the sequences can overlap or include each other:
*snip*
I need to find a substring here that 1) begins with the triplet ATG, ends with the triplet TAG, and is a multiple of 3.
*snip*
He didn't say also the shortest or longest possible match he is looking for.
I suspect there are more domain specific rules in that, I'm not familiar with and I can't say either.
As long the above 2 rules were given, winni's solution should be OK.
Title: Re: How to determine if a result is an integer?
Post by: alpine on August 31, 2021, 03:23:21 pm
@cdesim
You're right, I sincerely apologize for my previous post. Apparently I didn't read the entire thread.

Edit: Same for my Reply #17 - it shall not be considered for the same reason.

But, we know that when comparing intervals, we can have the following cases:
* one is entirely outside the other
* they overlap
* they are nested

While the 1-st case is clear, what about the other two cases? Are they possible in these sequences?
When they overlap can we consider their union for a separate group?
When they are nested which one shall be considered, the inner or the outer?
Title: Re: How to determine if a result is an integer?
Post by: alpine on August 31, 2021, 03:42:15 pm
Yes, each class is independent. Each class will produce only one response (0 or more characters)

This is by design. You cannot mix the mod 1 with the mod 2, they may as well be on different planets. Amirite?
Sorry, I didn't get you. Classes? As of your use of 'class' in the:
Actually I asked those questions and we're interested in the longest match.

Here is another proposal:

Create a class with 2 lists, one containing the positions of ATG and the other of TAG.

We'll need 3 instances of this class, 1 for numbers where mod = 2, another for mod = 1 and finally another where mod = 0.

*snip*
Title: Re: How to determine if a result is an integer?
Post by: Zvoni on August 31, 2021, 04:03:04 pm
Chipping in.
From an algorithm's POV i'd use at least one (more likely two) loop(s) with PosEx using the sample from #20: CTGCTAATGGTATGAGGACTTGGTAG

Outer Loop: Use PosEx to find Occurence of "ATG" with starting point the last found "ATG"-Position (First Run LastATG would be 1)
If ATGFound Then Inner Loop: Use PosEx  (Or maybe RegEx) looking for "TAG" with starting Point the Position of the found "ATG"
If TAGFound Then Check if length(ResultString) mod 3 = 0 and If currentlength > LengthOf LastResult
If Yes save result - If no continue to look for "TAG" using last "TAG"'s Position as starting point
End Inner Loop
End Outer Loop


That said: I've no idea if i just talked crap :-)
Title: Re: How to determine if a result is an integer?
Post by: Bart on August 31, 2021, 04:24:14 pm
Thank you, @cdesim. Yes, I apologize for not being very clear. Let me explain. I am working with DNA sequences. A DNA sequence can look like this:

ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG

I need to find a substring here that 1) begins with the triplet ATG, ends with the triplet TAG, and is a multiple of 3.

Normally there would be a sewuence that starts the DNA codon sequence (like AUG, see related forum post about decoding codons (https://forum.lazarus.freepascal.org/index.php/topic,51319.0.html) ).
Once you know the start, you can just iterate 3 chars at a time to find ATG and just count triplest from there until you hit TAG.

Bart
Title: Re: How to determine if a result is an integer?
Post by: alpine on August 31, 2021, 06:38:49 pm
Now, for the question "What if TAG immediately follows ATG?" That's an excellent question. It would consitute a trivial case and such a substring would be be rejected. In reality, one can expect a number of substrings that would satisfy all three conditions.
As an absolute rookie in that matter, I have the following question:
Is it possible to have the following sequence into a single reading frame (+1, +2, or +3) ?
Code: [Select]
ATG ... ATG ... TAG i.e. does it every ATG must be strictly followed by a TAG in a single reading frame?

I am already aware that two sequences can overlap into multiple reading frames, e.g. MT-ATP6, MT-ATP8 genes.

We usually choose the longest one, as the sequence with the highest probability of coding for a protein.
What do you mean by "longest one"? If it is supposed that every ATG to be strictly followed by TAG (in a single frame), then shouldn't they be considered separate entities for processing?
Otherwise, if ATG can be followed by another ATG (in a single frame), which one is considered "longest one"?
Code: [Select]
ATG(1) ... ATG(2) ... TAG(3) ... TAG(4)* from (1) to (4), i.e. TAG(3) ends ATG(2)
* longer from (1) to (3), and (2) to (4), i.e. TAG(3) ends ATG(1) and TAG(4) ends ATG(2)

As I said, I'm a rookie in genetics, so forgive me for the stupid questions.  :-[
Title: Re: How to determine if a result is an integer?
Post by: winni on August 31, 2021, 07:32:49 pm
Hi!

"If my grandma had wheels she was a bus" said former german chancellor Willy Brandt.

We should wait until BornAgain tells us exactly the parameters and conditions.
Especially the question of  y.ivanov

Winni
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on August 31, 2021, 09:09:59 pm
My goodness! Needless to say, I am very happy that this has triggered so much interest among the experts. Alas, also needless to say, I am completely lost :-).

First, thanks @Kays for the codon example (although that addresses a different problem). By the way, a triplet of DNA molecules (called nucleotides) that codes for another type of molecule called amino acid is calle a codon (for anyone who is interested). So, when strings of nucleotide triplets code for amino acids, this is called an ORF (or open reading frame). Note that I am simplifying this here and ignoring other details that are not relevant here.

The first triplet (codon) of an ORF (for our purposes) has to be ATG. This denotes the start of an ORF. And then there are a bunch of codons until a termination codon is reached (and there are three options here: TAG, TGA, and TAA; I had mentioned only one of them).

Next, in an actual ORF, any other ATGs found among the codons after the initial ATG and before the termination codon are just like any other codon and do not perform the role of the start codon. Also, the moment a termination codon is encountered that is in the same frame as the start codon, the ORF terminates.

So in the example given by @y.ivanov, ATG....ATG....TAG....TAG

the ORF is likely from the first ATG to the first TAG. I say "likely" because proteins are complex molecules and so one usually assumes that the longer ORF is more likely to code for a protein than a shorter one.

There were a couple other questions about overlapping and nested ORFs. I hope my answers here help answer these questions as well. Thank you all so much for your interest.

Let me know if there are other questions.


Title: Re: How to determine if a result is an integer?
Post by: winni on August 31, 2021, 11:27:57 pm
Hi!

Read your text twice.
My example from #8 fits your needs, but I ennhanced it a bit:

It now shows the string that hits the requirements and the length measured in triplets including start and end.

So 2 triplets mean a string of length zero.

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. const data : string=    'ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  3.       TripStart = 'ATG';
  4.       TripEnd  = 'TAG';
  5. var p,q, offset, delta: Integer;
  6.     msg, DNAhit : string;
  7. begin
  8.   offset := 1;
  9.   repeat
  10.   p := PosEx(TripStart,data,offset);
  11.   if p > 0 then
  12.     begin
  13.     offset := p+3;
  14.     q := PosEx(TripEnd,data,offset);
  15.     if q >0 then
  16.        begin
  17.         delta := (q-p ) mod 3;
  18.         if delta = 0 then msg := 'Hit' else msg := 'No hit';
  19.         msg := msg +lineEnding+ TripStart+' at '+IntToStr(p)+
  20.                      lineEnding+TripEnd+' at '+IntToStr(q) +LineEnding;
  21.         if delta = 0 then
  22.           begin
  23.           DNAhit := copy(data,p,q+3-p);
  24.           msg := msg + DNAhit + LineEnding+'Length Triplets: '+IntToStr(Length(DNAhit) div 3) ;
  25.           end;
  26.         showMessage (msg);
  27.           end;
  28.         end;
  29.         until (p=0) or (q=0);
  30.   showMessage ('No more hits');
  31. end;
  32.  

Last time faced with DNA was in school. Long time ago.
But always learning ....

Winni
Title: Re: How to determine if a result is an integer?
Post by: alpine on September 01, 2021, 12:34:51 am
@winni
Not quite right, IMHO. For data : string = 'ATGAATGTGGTAGTTTAGT';
Code: [Select]
No hit
ATG at 1
TAG at 11

Hit
ATG at 5
TAG at 11
ATGTGGTAG
Length Triplets: 3
No more hits
One hit. But in my understanding:
Code: Pascal  [Select][+][-]
  1. data : string = 'ATGAATGTGGTAGTTTAGT';
  2. // should be considered as
  3. 'ATG AAT GTG GTA GTT TAG T' // hit, frame +1, length 6
  4. 'A TGA ATG TGG TAG TTT AGT' // hit, frame +2, length 3
2 hits in different frames.

It is not the best example, but should be good enough to illustrate overlapping in multiple reading frames.

Also, ATGTAG gives a hit.

Title: Re: How to determine if a result is an integer?
Post by: BornAgain on September 01, 2021, 04:08:54 am
You're right, @y.ivanov. Two hits in two frames. All three frames need to be looked into. Actually, there is yet another interesting twist, which is that DNA is double stranded (I have shown only one strand here) and the longest ORF may be on the other strand, and so that needs to be checked out too (all three frames). There is a method to figuring out what the other strand is, but I have already coded that part. Just thought I'd mention it since there is interest in understanding the biology here.
Title: Re: How to determine if a result is an integer?
Post by: alpine on September 01, 2021, 01:59:08 pm
Hi there!

It is the most KISSed code I was able to produce. Hope it is correct:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. const
  4.   Data: String = 'ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  5.   //Data: String = 'AATGAATGTGGTAGTTTAGTTATGTAG';
  6.   CodonStart: String = 'ATG';
  7.   CodonEnd: String = 'TAG';
  8.   NoStart = MaxInt;
  9.  
  10. procedure FindFrames(AData: String);
  11. var
  12.   I, DataLen, FN: Integer;
  13.   Start: array[0..2] of Integer;
  14.   Starts: Integer;
  15. begin
  16.   DataLen := Length(AData);
  17.   Start[0] := NoStart;
  18.   Start[1] := NoStart;
  19.   Start[2] := NoStart;
  20.   Starts := 0;
  21.  
  22.   I := 1;
  23.   while (I < DataLen - 1) do
  24.   begin
  25.     FN := I mod 3; // Frame number
  26.  
  27.     // For start codons
  28.     if (AData[I] = CodonStart[1]) then
  29.     begin
  30.       // All 3 letters match?
  31.       if (AData[I + 1] = CodonStart[2]) and (AData[I + 2] = CodonStart[3]) then
  32.       begin
  33.         // No start codon for the frame?
  34.         if (Start[FN] = NoStart) then
  35.         begin
  36.           Start[FN] := I; // Save start
  37.           Inc(Starts); // Increment Starts count
  38.         end;
  39.         Inc(I, 3); // Skip codon
  40.       end
  41.       else
  42.         Inc(I); // Skip one
  43.     end
  44.  
  45.     // For end codons
  46.     else if (Starts > 0) and (AData[I] = CodonEnd[1]) then
  47.     begin
  48.       // All 3 letters match?
  49.       if (AData[I + 1] = CodonEnd[2]) and (AData[I + 2] = CodonEnd[3]) then
  50.       begin
  51.         Inc(I, 3); // Skip codon
  52.  
  53.         // Start codon for the frame?
  54.         if (Start[FN] <> NoStart) then
  55.         begin
  56.  
  57.           // Show frame --------------------------------------
  58.           if (I - Start[FN] < 3 * 3) then
  59.             WriteLn('Empty frame.')
  60.           else
  61.           begin
  62.             WriteLn('Frame +', (2 + FN) mod 3 + 1, // 0->3, 1->1, 2->2
  63.               ', Start: ', Start[FN],
  64.               ', End: ', I,
  65.               ', Seq: ', Copy(AData, Start[FN], I - Start[FN]));
  66.           end;
  67.           //---------------------------------------------------
  68.  
  69.           // Reset start for this frame
  70.           // Q: Shall we reset on emty frame?
  71.           Start[FN] := NoStart;
  72.         end;
  73.       end
  74.       else
  75.         Inc(I); // Skip one
  76.     end
  77.  
  78.     else
  79.       Inc(I); // Skip one
  80.  
  81.   end;
  82. end;
  83.  
  84. begin
  85.   FindFrames(Data);
  86. end.
  87.  

Title: Re: How to determine if a result is an integer?
Post by: BornAgain on September 02, 2021, 06:48:20 am
@y.inanov, that looks FAR more efficient than the code I am still trying to complete. I am a stage where it outputs multiple ORFs but I am trying to sort it (a TStringList) to find the longest. Surprisingly, there seems to be no easy way of sorting a TStringList based on string length.

Title: Re: How to determine if a result is an integer?
Post by: bytebites on September 02, 2021, 08:45:26 am
Easy way is

Code: Pascal  [Select][+][-]
  1. function longestfirst(List: TStringList; Index1, Index2: Integer): Integer;
  2. begin
  3.    result:=Length(list[index2])-Length(list[index1]);
  4. end;  
  5.  
  6. astringlist.customSort(@longestfirst);
  7.  
Title: Re: How to determine if a result is an integer?
Post by: alpine on September 02, 2021, 06:08:50 pm
@BornAgain
If you need just the longest sequence there is no need to put them all in TStringList and sort, you can just keep track which is the longest:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. const
  4.   Data: String = 'ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  5.   //Data: String = 'AATGAATGTGGTAGTTTAGTTATGTAG';
  6.   CodonStart: String = 'ATG';
  7.   CodonEnd: String = 'TAG';
  8.   NoStart = MaxInt;
  9.  
  10. procedure FindFrames(AData: String; out AStart, ALen: Integer);
  11. var
  12.   I, DataLen, FN: Integer;
  13.   Start: array[0..2] of Integer;
  14.   Starts: Integer;
  15.   LLen, FLen: Integer;
  16. begin
  17.   DataLen := Length(AData);
  18.   Start[0] := NoStart;
  19.   Start[1] := NoStart;
  20.   Start[2] := NoStart;
  21.   Starts := 0;
  22.  
  23.   LLen := 0;
  24.   AStart := 0;
  25.   ALen := 0;
  26.   I := Pos(CodonStart, AData);
  27.   if I > 0 then while (I < DataLen - 1) do
  28.   begin
  29.     FN := I mod 3; // Frame number
  30.  
  31.     // For start codons
  32.     if (AData[I] = CodonStart[1]) then
  33.     begin
  34.       // All 3 letters match?
  35.       if (AData[I + 1] = CodonStart[2]) and (AData[I + 2] = CodonStart[3]) then
  36.       begin
  37.         // No start codon for the frame?
  38.         if (Start[FN] = NoStart) then
  39.         begin
  40.           Start[FN] := I; // Save start
  41.           Inc(Starts); // Increment Starts count
  42.         end;
  43.         Inc(I, 3); // Skip codon
  44.       end
  45.       else
  46.         Inc(I); // Skip one
  47.     end
  48.  
  49.     // For end codons
  50.     else if (Starts > 0) and (AData[I] = CodonEnd[1]) then
  51.     begin
  52.       // All 3 letters match?
  53.       if (AData[I + 1] = CodonEnd[2]) and (AData[I + 2] = CodonEnd[3]) then
  54.       begin
  55.         Inc(I, 3); // Skip codon
  56.  
  57.         // Start codon for the frame?
  58.         if (Start[FN] <> NoStart) then
  59.         begin
  60.           FLen := I - Start[FN];
  61.  
  62.           // Show frame --------------------------------------
  63.           if (FLen < 3 * 3) then
  64.             WriteLn('Empty frame.')
  65.           else
  66.           begin
  67.  
  68.             // Keeping track of the longest one
  69.             if FLen > LLen then
  70.             begin
  71.               LLen := FLen;
  72.               AStart := Start[FN];
  73.             end;
  74.  
  75.             WriteLn('Frame +', (2 + FN) mod 3 + 1, // 0->3, 1->1, 2->2
  76.               ', Start: ', Start[FN],
  77.               ', End: ', I,
  78.               ', Seq: ', Copy(AData, Start[FN], FLen));
  79.  
  80.           end;
  81.           //---------------------------------------------------
  82.  
  83.           // Reset start for this frame
  84.           // Q: Shall we reset on emty frame?
  85.           Start[FN] := NoStart;
  86.         end;
  87.       end
  88.       else
  89.         Inc(I); // Skip one
  90.     end
  91.  
  92.     else
  93.       Inc(I); // Skip one
  94.  
  95.   end;
  96.  
  97.   // Return length of the longest
  98.   ALen := LLen;
  99. end;
  100.  
  101. begin
  102.   FindFrames(Data);
  103. end.
I have added two output arguments to the procedure: AStart, ALen, where the start index and the length of the longest sequence will be returned.

Also added a small modification before the scan, instead of assigning I := 1 used the I := Pos(CodonStart, AData) assuming it will be a little bit faster than the sequential scan. Not sure, though.

Further enhancements:
* Logic can be included for premature exit in case we need only the longest sequence and there is no chance to find longer one into the unscanned portion
* Boyer-Moore like pattern scan, but then two scan counters should be used because of the shift irregularities
 
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on September 02, 2021, 07:56:06 pm
Thank you, @bytebites. I had tried this, but it didn't work:

Code: Pascal  [Select][+][-]
  1.         for J := 0 to ORFList.Count - 2 do
  2.         begin
  3.           s := Length(ORFList[J]);
  4.           for K := (J+1) to ORFList.Count-1 do
  5.           begin
  6.             t := Length(ORFList[ORFList.Count - 1]);
  7.             if s < t then ORFList[J] := ORFList[ORFList.Count - 1]
  8.             else continue;
  9.           end;
  10.         end;    

I tried your code, (which seems incredibly simple!), but it wasn't clear what I pass to the function for the Index1 and Index2. Also, if the two indexes are constants, how does the single line of code in the function find the longest string?
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on September 02, 2021, 08:47:40 pm
Thank you, @y.ivanov. Really appreciate your help. I finally couldn't resist the temptation of just running your code. I didn't want to do so since I didn't understand much of it. But now I finally did :-). It gave me a ElnoutError (File not open). I was wondering about that when I saw a WriteLn statement. Perhaps that was obvious that I needed to have opened Text file? And the WriteLn was to this text file? Sorry for the newbie questions?
Title: Re: How to determine if a result is an integer?
Post by: alpine on September 02, 2021, 10:16:54 pm
It must be compiled as "Simple console program". Then WriteLn with no specified file as 1-st argument writes to the standard output, which is the console.

Edit:
Sorry,  I had to assume that you're Pascal novice (you've asked about modulo operation).

The EInOutError is probably because you're trying the code inside a GUI program and the standard INPUT and OUTPUT files are not open in that case. Those are assumed when you use Read/ReadLn or Write/WriteLn without Text file as 1-st argument. See https://www.freepascal.org/docs-html/rtl/system/write.html, the second remark in yellow.

As an alternative (in GUI) you can use:
Code: Pascal  [Select][+][-]
  1. // msg declared as:
  2. //    var msg: String;
  3. WriteStr(msg, 'Frame +', (2 + FN) mod 3 + 1, // 0->3, 1->1, 2->2
  4.               ', Start: ', Start[FN],
  5.               ', End: ', I,
  6.               ', Seq: ', Copy(AData, Start[FN], FLen));
  7. ShowMessage(msg);
  8.  
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on September 03, 2021, 07:46:59 pm
Thanks, @y.ivanov. I certainly am! So thanks for the explanation. You are right - I was trying it inside the GUI program I'd written (the only type I know). I'll try your modification for the GUI. I wish there was a REAL beginner's manual or book to help someone like me understand what they are doing! For example, I have no idea what the initial parts of a program even are (type, public, private, implementation, etc.). I just blindly follow an example that works! Anyway, thanks for being patient with me.

Just checked by placing your code inside my GUI code - and it works. Thank you! When I try with a real sequence, however, it is giving me an error during compilation: "String exceeds line" (the sequence is 1501 characters, but I thought "string" essentially had no limit to the length.
Title: Re: How to determine if a result is an integer?
Post by: alpine on September 03, 2021, 08:18:33 pm
@BornAgain
Probably the compiler itself has a limit when parsing the source file. Try splitting the string in smaller chunks:
Code: Pascal  [Select][+][-]
  1. Data: String =
  2.   'ACTGCTAATGA...' +
  3.   'TTTGGACTTGG...' +
  4.   'TAGCGTTACCTG...' +
  5.   ...
Title: Re: How to determine if a result is an integer?
Post by: winni on September 03, 2021, 09:34:46 pm
Hi!

Turbo Pascal had a limitation of 132 chars per line.

fpc allows AnsiStrings which can contain up to 2GB bytes.

Must be enough to put a whole unit in only one line.
Delete the lineEndings ....

And enough for very long DNA strings.

Winni
Title: Re: How to determine if a result is an integer?
Post by: winni on September 03, 2021, 10:54:31 pm
Just checked by placing your code inside my GUI code - and it works. Thank you! When I try with a real sequence, however, it is giving me an error during compilation: "String exceeds line" (the sequence is 1501 characters, but I thought "string" essentially had no limit to the length.

Hi!

This message tells you that there is a closing '  at the end of the line is missing.
Nothing else.

Winni
Title: Re: How to determine if a result is an integer?
Post by: BobDog on September 03, 2021, 11:26:42 pm
Going to the beginning.
Create an artificial string.
Tally all the ATG and get their positions
Tally all the TAG and get their positions
Compare positions, if they suit (multiple of three apart), then write the relevant parts of the original string.
Code: Pascal  [Select][+][-]
  1. program tally;
  2.  
  3. Type  
  4.   intArray = Array of integer;
  5.  
  6. // =========  number of partstring in somestring =============//
  7.  function tally(somestring:pchar;partstring:pchar;var arr: intarray ):integer;
  8. var
  9. i,j,ln,lnp,count,num:integer ;
  10. filler:boolean;
  11. label
  12. skip ,start,return;
  13. begin
  14.  ln:=length(somestring);
  15. lnp:=length(partstring);
  16. filler:=false;
  17. start:
  18. count:=0;
  19. i:=-1;
  20. repeat
  21. i:=i+1;
  22.    if somestring[i] <> partstring[0] then goto skip ;
  23.      if somestring[i] = partstring[0] then
  24.      begin
  25.      for j:=0 to lnp-1 do
  26.      begin
  27.      if somestring[j+i]<>partstring[j] then goto skip;
  28.      end;
  29.       count:=count+1;
  30.       if filler = true then arr[count]:=i+1 ;
  31.       i:=i+lnp-1;
  32.      end ;
  33.    skip:
  34.    until i>=ln-1 ;
  35. SetLength(arr,count); // size is now known, repeat the operation to fil arr
  36. arr[0]:=count;        // save tally in [0]
  37. num:=count;
  38. if filler=true then goto return;
  39. filler:=true;
  40.   goto start;
  41.    return:
  42.   result:=num;
  43. end; {tally}
  44.  
  45.     //=========== Use =========== //
  46.  
  47.  var
  48.  arr,arr2:intarray;
  49.  p:pchar;
  50.  s,s2,s3:ansistring;
  51.  i,j,num,diff:integer;
  52.  comma:string;
  53.  label
  54.  lbl;
  55.  
  56.  begin
  57.  s:='ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  58.  s2:='XACTGCTAXATTTGGACTTGGTAGCGTTACCTG';
  59.  s3:='XYACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  60.  for i:=1 to 3 do
  61.  begin
  62.  s:=s+s2+s3;
  63.  end;
  64.  
  65.  p:=pchar(s);        // cast
  66.  writeln('The string',' length = ',length(s));
  67.  writeln(s);
  68.  writeln;
  69.  num:=tally(p,'ATG',arr);
  70.  writeln('Tally of ATG  ',num);
  71.  
  72.  
  73.  writeln('Positions:');
  74.   for i:=1 to arr[0] do
  75.   begin
  76.   if i<arr[0] then comma:=',' else comma:='';
  77.  write(arr[i],comma);
  78.  end;
  79.  writeln;
  80.  writeln;
  81.  
  82.  num:=tally(p,'TAG',arr2);
  83.  writeln('Tally of TAG  ',num);
  84.  
  85.  
  86.  writeln('Positions:');
  87.   for i:=1 to arr2[0] do
  88.   begin
  89.   if i<arr2[0] then comma:=',' else comma:='';
  90.  write(arr2[i],comma);
  91.  end;
  92.   writeln;
  93.   writeln;
  94.  
  95.  for i:=1 to arr[0]  do
  96.  for j:=1 to arr2[0] do
  97.  begin
  98.  begin
  99.  if ((i>arr[0]) or (j>arr2[0])) then goto lbl; // outwith bounds.
  100.    diff:= abs(arr[i]-arr2[j]);
  101.  if (( diff mod 3=0) and (arr[i] < arr2[j])) then writeln(s[arr[i] .. arr2[j]+2],'   ',arr[i],' to ',arr2[j]+2);
  102.  end ;
  103.  end;
  104.   lbl:
  105.   writeln;
  106.  writeln('Press enter to end');
  107.  
  108.    readln;
  109.  end.
  110.  
  111.      
Title: Re: How to determine if a result is an integer?
Post by: alpine on September 04, 2021, 12:57:16 am
@BobDog
Projecting i occurrences of ATG over j occurrences of TAG in:
Code: Pascal  [Select][+][-]
  1.  for i:=1 to arr[0]  do
  2.  for j:=1 to arr2[0] do
  3.  begin
  4.  begin
  5.    ...
Will give you two sequences in the case of:
Code: [Select]
ATG xxx TAG xxx TAGInstead of one (the 2-nd TAG has no start codon).

The projection wouldn't do the job, there is no point to scan ATG, TAG positions separately.
Title: Re: How to determine if a result is an integer?
Post by: BobDog on September 04, 2021, 02:34:18 am

Hello y.ivanof.
I really have to bundle the two arrays to get all starts and stops.
If I make the initial string shorter and on one line then things look clearer.
Code: Pascal  [Select][+][-]
  1. program tally;
  2.    uses
  3.    crt;
  4. Type  
  5.   intArray = Array of integer;
  6.  
  7. // =========  number of partstring in somestring =============//
  8.  function tally(somestring:pchar;partstring:pchar;var arr: intarray ):integer;
  9. var
  10. i,j,ln,lnp,count,num:integer ;
  11. filler:boolean;
  12. label
  13. skip ,start,return;
  14. begin
  15.  ln:=length(somestring);
  16. lnp:=length(partstring);
  17. filler:=false;
  18. start:
  19. count:=0;
  20. i:=-1;
  21. repeat
  22. i:=i+1;
  23.    if somestring[i] <> partstring[0] then goto skip ;
  24.      if somestring[i] = partstring[0] then
  25.      begin
  26.      for j:=0 to lnp-1 do
  27.      begin
  28.      if somestring[j+i]<>partstring[j] then goto skip;
  29.      end;
  30.       count:=count+1;
  31.       if filler = true then arr[count]:=i+1 ;
  32.       i:=i+lnp-1;
  33.      end ;
  34.    skip:
  35.    until i>=ln-1 ;
  36. SetLength(arr,count); // size is now known, repeat the operation to fil arr
  37. arr[0]:=count;        // save tally in [0]
  38. num:=count;
  39. if filler=true then goto return;
  40. filler:=true;
  41.   goto start;
  42.    return:
  43.   result:=num;
  44. end; {tally}
  45.  
  46.     //=========== Use =========== //
  47.  
  48.  var
  49.  arr,arr2:intarray;
  50.  p:pchar;
  51.  s,s2,s3:ansistring;
  52.  i,j,num,diff:integer;
  53.  comma:string;
  54.  label
  55.  lbl;
  56.  
  57.  begin
  58.  s:='ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  59.  s2:='XACTGCTAXATTTGGACTTGGTAGCGTTACCTG';
  60.  s3:='XYACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  61.  for i:=1 to 1 do
  62.  begin
  63.  s:=s+s2+s3;
  64.  end;
  65.  
  66.  p:=pchar(s);        // cast
  67.  writeln('The string',' length = ',length(s));
  68.  writeln(s);
  69.  writeln;
  70.  num:=tally(p,'ATG',arr);
  71.  writeln('Tally of ATG  ',num);
  72.  
  73.  
  74.  writeln('Positions:');
  75.   for i:=1 to arr[0] do
  76.   begin
  77.   if i<arr[0] then comma:=',' else comma:='';
  78.  write(arr[i],comma);
  79.  end;
  80.  writeln;
  81.  writeln;
  82.  
  83.  num:=tally(p,'TAG',arr2);
  84.  writeln('Tally of TAG  ',num);
  85.  
  86.  
  87.  writeln('Positions:');
  88.   for i:=1 to arr2[0] do
  89.   begin
  90.   if i<arr2[0] then comma:=',' else comma:='';
  91.  write(arr2[i],comma);
  92.  end;
  93.   writeln;
  94.   writeln;
  95.  
  96.  for i:=1 to arr[0]  do
  97.  for j:=1 to arr2[0] do
  98.  begin
  99.  begin
  100.  if ((i>arr[0]) or (j>arr2[0])) then goto lbl; // outwith bounds.
  101.    diff:= abs(arr[i]-arr2[j]);
  102.  if (( diff mod 3=0) and (arr[i] < arr2[j])) then writeln(s[arr[i] .. arr2[j]+2],'   ',arr[i],' to ',arr2[j]+2);
  103.  end ;
  104.  end;
  105.   lbl:
  106.   writeln;
  107.    for i:=1 to length(s) do
  108.    begin
  109.    TextColor(white);
  110.    if (i>=8) and (i<=25) then textcolor(green);
  111.    write(s[i]);
  112.    end;
  113.    writeln;
  114.  
  115.    for i:=1 to length(s) do
  116.    begin
  117.    TextColor(white);
  118.    if (i>=8) and (i<=58) then textcolor(green);
  119.    write(s[i]);
  120.    end;
  121.    writeln;
  122.  
  123.     for i:=1 to length(s) do
  124.    begin
  125.    TextColor(white);
  126.    if (i>=8) and (i<=94) then textcolor(green);
  127.    write(s[i]);
  128.    end;
  129.    writeln;
  130.  
  131.     for i:=1 to length(s) do
  132.    begin
  133.    TextColor(white);
  134.    if (i>=77) and (i<=94) then textcolor(green);
  135.    write(s[i]);
  136.    end;
  137.    writeln;
  138.  
  139.   writeln;
  140.  writeln('Press enter to end');
  141.  
  142.    readln;
  143.  end.
  144.  
  145.      
If you could give an example of your method (console, I don't use Lazarus or forms), maybe I'll see what you mean.


Title: Re: How to determine if a result is an integer?
Post by: alpine on September 04, 2021, 10:43:07 am
@BobDog,
I've already wrote it in my Reply #26. If you have time to read the last few posts, you'll see that BornAgain had some difficulties to try it in console mode.

My user name is y.ivanov, not y.ivanof
Title: Re: How to determine if a result is an integer?
Post by: BobDog on September 04, 2021, 11:30:01 am

Thanks y.ivanov, I see your method now.
Title: Re: How to determine if a result is an integer?
Post by: BobDog on September 04, 2021, 08:22:40 pm
I am unsure how long a DNA sequence can be.
Forensic detectives (on telly), doesn't go so deeply into the subject, catching villains is their priority.
However, as a test I have created quite a long fake sequence.
And tested my parser to cope with it.
Code: Pascal  [Select][+][-]
  1.  
  2.  
  3. program tally;
  4.  
  5.    type
  6.  stringsegment=record
  7.  seg:ansistring;
  8.  pos:int32;
  9.  lngth:int32;
  10.  end;
  11.  
  12. Type  
  13.   intArray = Array of int32;
  14.   segarray = array of stringsegment;
  15.  
  16. // =========  number of partstring in somestring =============//
  17.  function tally(somestring:pchar;partstring:pchar;var arr: intarray ):integer;
  18. var
  19. i,j,ln,lnp,count,num:integer ;
  20. filler:boolean;
  21. label
  22. skip ,start,return;
  23. begin
  24.  ln:=length(somestring);
  25. lnp:=length(partstring);
  26. filler:=false;
  27. start:
  28. count:=0;
  29. i:=-1;
  30. repeat
  31. i:=i+1;
  32.    if somestring[i] <> partstring[0] then goto skip ;
  33.      if somestring[i] = partstring[0] then
  34.      begin
  35.      for j:=0 to lnp-1 do
  36.      begin
  37.      if somestring[j+i]<>partstring[j] then goto skip;
  38.      end;
  39.       count:=count+1;
  40.       if filler = true then arr[count]:=i+1 ;
  41.       i:=i+lnp-1;
  42.      end ;
  43.    skip:
  44.    until i>=ln-1 ;
  45. SetLength(arr,count); // size is now known, repeat the operation to fil arr
  46. arr[0]:=count;        // save tally in [0]
  47. num:=count;
  48. if filler=true then goto return;
  49. filler:=true;
  50.   goto start;
  51.    return:
  52.   result:=num;
  53. end; {tally}
  54.  
  55.  
  56.  
  57. procedure dubblesort(var arr :array of stringsegment);
  58. var
  59. n1,n2:int32;
  60. temp:stringsegment;
  61. begin
  62. for n1:=low(arr) to high(arr)-1 do
  63. begin
  64.  for n2:=n1+1 to high(arr)  do
  65.  begin
  66.  if length(arr[n1].seg) > length(arr[n2].seg) then
  67.  begin
  68.   temp:=arr[n1];
  69.     arr[n1]:=arr[n2];
  70.     arr[n2]:=temp;
  71.     end;
  72.  end;
  73.  end;
  74.   // now sort the start pos
  75.  for n1:=low(arr) to high(arr)-1 do
  76. begin
  77.  for n2:=n1+1 to high(arr)  do
  78.  begin
  79.  if ((length(arr[n1].seg) = length(arr[n2].seg)) and (arr[n1].pos > arr[n2].pos))  then
  80.  begin
  81.   temp:=arr[n1];
  82.     arr[n1]:=arr[n2];
  83.     arr[n2]:=temp;
  84.     end;
  85.  end;
  86.  end;
  87.  
  88. end;
  89.  
  90.  
  91.  
  92. procedure getsegments(s:ansistring;first:ansistring;second:ansistring;var segs:segarray);
  93. var
  94. p:pchar;
  95. arr,arr2: array of integer;
  96. i,j,diff,counter:int32;
  97. label
  98. lbl;
  99. begin
  100.  counter:=0;
  101.  diff:=0;
  102. p:=pchar(s);
  103.   tally(p,pchar(first),arr);
  104.   tally(p,pchar(second),arr2);
  105.   for i:=1 to arr[0]  do
  106.  for j:=1 to arr2[0] do
  107.  begin
  108.  begin
  109.  if ((i>arr[0]) or (j>arr2[0])) then goto lbl; // outwith bounds.
  110.    diff:= abs(arr[i]-arr2[j]);
  111.  if (( diff mod 3=0) and (arr[i] < arr2[j])) then
  112.    begin
  113.     setlength(segs,counter+1);
  114.    segs[counter].seg:=s[arr[i] .. arr2[j]+2];
  115.    segs[counter].pos:=arr[i];
  116.    segs[counter].lngth:=length(segs[counter].seg);
  117.    counter:=counter+1;
  118.    end;
  119.  
  120.  end ;
  121.  end;
  122.  
  123.   lbl:
  124.     dubblesort(segs);
  125. end;
  126.  
  127.     //=========== Use =========== //
  128.  
  129.  var
  130.  arr,arr2:intarray;
  131.  p:pchar;
  132.  s,s1,s2,s3:ansistring;
  133.  i,j,num,diff,lastlength:int32;
  134.  
  135.  
  136.  segs:array of stringsegment;
  137.  label
  138.  lbl;
  139.  
  140.  begin
  141.  s1:= 'ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  142.  s:= 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG';
  143.  s2:='ACTGCTAXATTTGGACTTGGTAGCGTTACCTG';
  144.  s3:='ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  145.  for i:=1 to 70 do
  146.  begin
  147.  s:=(s1+s2+s3+s);
  148.  end;
  149.  
  150.     writeln ('The string');
  151.    writeln(s);
  152.    writeln('Length = ',length(s));
  153.    writeln;
  154.  
  155.  
  156.   getsegments(s,'ATG','TAG',segs);
  157.  
  158.   for i:=low(segs) to high(segs) do
  159.   begin
  160.   if ( length(segs[i].seg) < 100 ) then
  161.   write(segs[i].seg,' [start = ':5,segs[i].pos:4,', length = ',segs[i].lngth,']');
  162.  
  163.   if ( length(segs[i].seg) >= 100 ) then
  164.   begin
  165.    if (lastlength <>   length(segs[i].seg)) then writeln;
  166.   write('. . . [start = ':5,segs[i].pos:4,', length = ',segs[i].lngth,']');
  167.    end;
  168.   lastlength:= length(segs[i].seg);
  169.   writeln;
  170.   end;
  171.  
  172.  writeln('Press enter to end');
  173.  
  174.    readln;
  175.  end.
  176.  
  177.      
Or if the end triplet is not repeated in the segment:
Code: Pascal  [Select][+][-]
  1.  
  2.  
  3. program tally;
  4.  
  5.    type
  6.  stringsegment=record
  7.  seg:ansistring;
  8.  pos:int32;
  9.  lngth:int32;
  10.  end;
  11.  
  12. Type  
  13.   intArray = Array of int32;
  14.   segarray = array of stringsegment;
  15.  
  16. // =========  number of partstring in somestring =============//
  17.  function tally(somestring:pchar;partstring:pchar;var arr: intarray ):int32;
  18. var
  19. i,j,ln,lnp,count,num:integer ;
  20. filler:boolean;
  21. label
  22. skip ,start,return;
  23. begin
  24.  ln:=length(somestring);
  25. lnp:=length(partstring);
  26. filler:=false;
  27. start:
  28. count:=0;
  29. i:=-1;
  30. repeat
  31. i:=i+1;
  32.    if somestring[i] <> partstring[0] then goto skip ;
  33.      if somestring[i] = partstring[0] then
  34.      begin
  35.      for j:=0 to lnp-1 do
  36.      begin
  37.      if somestring[j+i]<>partstring[j] then goto skip;
  38.      end;
  39.       count:=count+1;
  40.       if filler = true then arr[count]:=i+1 ;
  41.       i:=i+lnp-1;
  42.      end ;
  43.    skip:
  44.    until i>=ln-1 ;
  45. SetLength(arr,count); // size is now known, repeat the operation to fil arr
  46. arr[0]:=count;        // save tally in [0]
  47. num:=count;
  48. if filler=true then goto return;
  49. filler:=true;
  50.   goto start;
  51.    return:
  52.   result:=num;
  53. end; {tally}
  54.  
  55.  
  56.  
  57. procedure dubblesort(var arr :array of stringsegment);
  58. var
  59. n1,n2:int32;
  60. temp:stringsegment;
  61. begin
  62. for n1:=low(arr) to high(arr)-1 do
  63. begin
  64.  for n2:=n1+1 to high(arr)  do
  65.  begin
  66.  if length(arr[n1].seg) > length(arr[n2].seg) then
  67.  begin
  68.   temp:=arr[n1];
  69.     arr[n1]:=arr[n2];
  70.     arr[n2]:=temp;
  71.     end;
  72.  end;
  73.  end;
  74.   // now sort the start pos
  75.  for n1:=low(arr) to high(arr)-1 do
  76. begin
  77.  for n2:=n1+1 to high(arr)  do
  78.  begin
  79.  if ((length(arr[n1].seg) = length(arr[n2].seg)) and (arr[n1].pos > arr[n2].pos))  then
  80.  begin
  81.   temp:=arr[n1];
  82.     arr[n1]:=arr[n2];
  83.     arr[n2]:=temp;
  84.     end;
  85.  end;
  86.  end;
  87.  
  88. end;
  89.  
  90. procedure getsegments(s:ansistring;first:ansistring;second:ansistring;var segs:segarray);
  91. var
  92. p:pchar;
  93. arr,arr2: array of integer;
  94. i,j,diff,counter,t2:int32;
  95. label
  96. lbl;
  97. begin
  98.  counter:=0;
  99.  diff:=0;
  100. p:=pchar(s);
  101.   tally(p,pchar(first),arr);
  102.   tally(p,pchar(second),arr2);
  103.   for i:=1 to arr[0]  do
  104.  for j:=1 to arr2[0] do
  105.  begin
  106.  begin
  107.  if ((i>arr[0]) or (j>arr2[0])) then goto lbl; // outwith bounds.
  108.    diff:= abs(arr[i]-arr2[j]);
  109.    t2:=pos(second,s[arr[i] .. arr2[j]]);
  110.  if (( diff mod 3=0) and (arr[i] < arr2[j])) and (t2=0)    then
  111.    begin
  112.     setlength(segs,counter+1);
  113.    segs[counter].seg:=s[arr[i] .. arr2[j]+2];
  114.    segs[counter].pos:=arr[i];
  115.    segs[counter].lngth:=length(segs[counter].seg);
  116.    counter:=counter+1;
  117.    end;
  118.  
  119.  end ;
  120.  end;
  121.  
  122.   lbl:
  123.     dubblesort(segs);
  124. end;
  125.  
  126.     //=========== Use =========== //
  127.  
  128.  var
  129.  arr,arr2:intarray;
  130.  p:pchar;
  131.  s,s1,s2,s3:ansistring;
  132.  i,j,num,diff,lastlength:int32;
  133.  
  134.  
  135.  segs:array of stringsegment;
  136.  label
  137.  lbl;
  138.  
  139.  begin
  140.  s1:='ACTGCTAATGATTTGGACTTGGTAGCGTTACCTG';
  141.  s:= 'ACTGCTAATGATTTGGAATTTGGACTTGGTAGCGTTACCTG';
  142.  s2:='ACTGCTAATGATTTGGACTTTGGAATTTGGACTTGGGTAGCGTTACCTG';
  143.  s3:='ACTGCTAATGATTTGGACTTGGACTTGGTAGCGTTACCTG';
  144.  for i:=1 to 70 do
  145.  begin
  146.  s:=(s1+s2+s3+s);
  147.  end;
  148.  
  149.     writeln ('The string');
  150.    writeln(s);
  151.    writeln('Length = ',length(s));
  152.    writeln;
  153.  
  154.  
  155.   getsegments(s,'ATG','TAG',segs);
  156.  
  157.   for i:=low(segs) to high(segs) do
  158.   begin
  159.   if ( length(segs[i].seg) < 100 ) then
  160.   write(segs[i].seg,' [start = ':5,segs[i].pos:4,', length = ',segs[i].lngth,']');
  161.  
  162.   if ( length(segs[i].seg) >= 100 ) then
  163.   begin
  164.    if (lastlength <>   length(segs[i].seg)) then writeln;
  165.   write('. . . [start = ':5,segs[i].pos:4,', length = ',segs[i].lngth,']');
  166.    end;
  167.   lastlength:= length(segs[i].seg);
  168.   writeln;
  169.   end;
  170.     writeln;
  171.  
  172.  writeln('Press enter to end');
  173.  
  174.    readln;
  175.  end.
  176.  
  177.      
Title: Re: How to determine if a result is an integer?
Post by: alpine on September 07, 2021, 10:43:28 am
@BobDog,
Regarding your extensive use of goto statement instead of else clauses - I am just curious, have you started programming on a Basic language?

@BornAgain,
Quote
I  wish there was a REAL beginner's manual or book to help someone like me understand what they are doing! For example, I have no idea what the initial parts of a program even are (type, public, private, implementation, etc.). I just blindly follow an example that works!
I can't recommend a single book or course on Free Pascal, but you can browse at: https://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines, maybe you can find something helpful.
Since FPC is mostly compatible with the Delphi, perhaps you can try some of Marco Cantù's  books from the "Mastering Delphi" sequence, or "Essential Pascal". 
Title: Re: How to determine if a result is an integer?
Post by: Jonvro on September 07, 2021, 11:40:53 am
@BornAgain: I found a useful PDF where all this stuff is explained to some extent https://pdfcoffee.com/learningpdf-learninghowardpageclark-blanc-pdf-free.html (https://pdfcoffee.com/learningpdf-learninghowardpageclark-blanc-pdf-free.html).
Title: Re: How to determine if a result is an integer?
Post by: BobDog on September 07, 2021, 12:17:57 pm
@BobDog,
Regarding your extensive use of goto statement instead of else clauses - I am just curious, have you started programming on a Basic language?
. . .
. . .
 
Hello y.ivanov.
I have no problems using goto in pascal, with all variables already having been set, there is no chance of passing across variable creations.
However in basic care must be taken in this regard.
I could of course use clauses as you say, but I find them cumbersome and bloaty when a simple goto does the job.
 I just like to get a task done.
Basic and pascal being educational languages, pascal more for modular programming, I started both in the 1970's, but I have only recently returned to pascal via this forum, so I am still a little bit twitchy here.
I have been using freebasic for many years as a hobby coder, that's all I do regarding computers - a hobby.
Thanks.
 



Title: Re: How to determine if a result is an integer?
Post by: BornAgain on October 15, 2021, 08:14:48 pm
Thanks to the many who contributed to answering this question. I need to do a little more work in this project and now have a question related to the same general project and so thought I'd post it right here. I remember having read somewhere that it is not good form to use "break" in a program, but I don't know what else I can use in this context. Worse, "break" seems to be misbehaving! This is what is happening:

I have a number of DNA sequences in a file. For each of them (a string) I needed to find a substring that begins with "ATG" and ends with "TGA," "TAG" or "TAA" in the same frame (that is, if you count triplets of letters, the final three need to be one of the three: "TGA," "TAG" or "TAA." So the length of the string needs to be a multiple of 3. Thanks to your help, I have found such a substring for each sequence. Let us say that for the first sequence, I have found a substring that ends with TGA. Now, I need to find if there are any "TGAs," "TAGs" or "TAAs" nested within the sequence in the same frame. If there is even one, the substring is of no use and I need to move to the next ATG in the DNA sequence and find the substring that begins with this ATG and ends with one of the three termination triplets. Without going into the actual code, I have this nested loop structure:


Code: Pascal  [Select][+][-]
  1. for I := 0 to SequenceFile - 1 do //SequenceFile is a TStringList
  2. begin
  3.    .
  4.    .
  5.    .
  6.    for J := 1 to NumATGs do //NumATGs is the number of ATGs in the Ith sequence
  7.    begin
  8.       //here I find the sequence that meets the criteria of beginning with ATG and ending with TGA
  9.       .
  10.       .
  11.       .
  12.       //here I am looking for any nested TGAs (except for the last three which are already TGA). If I find one, then I need to get out of this loop and move to the next where I need to find any nested TAGs.
  13.       PosTGA := Pos('TGA', TryORF) + 3;//TryORF is a substring that begins with ATG and ends with one of three termination triplets
  14.       for K := 1 to NumTGAs_Temp do
  15.       begin
  16.          if (PosTGA < lenTryORF) then //if the TGA is in the middle and not the end of the ORF; lenTryORF is the length of TryORF
  17.          begin
  18.             if ((PosTGA - 1) mod 3 = 0) then //if there is a nested ORF then move to next ATG
  19.                break
  20.             else PosTGA := PosEx('TGA', TempStr, PosTGA) + 3;//see if the next TGA is nested
  21.          end
  22.          else PosTGA := PosEx('TGA', TempStr, PosTGA) + 3;
  23.       end;    
  24.  
  25.       //here I am looking for any nested TAGs (just like above).
  26.       PosTAG := Pos('TAG', TryORF) + 3;
  27.       for K := 1 to NumTAGs_Temp do
  28.       begin
  29.          if (PosTAG < lenTryORF) then //if the TAG is in the middle and not the end of the ORF
  30.          begin
  31.             if ((PosTAG - 1) mod 3 = 0) then //if there is a nested ORF then move to next ATG
  32.                break
  33.             else PosTAG := PosEx('TAG', TempStr, PosTAG) + 3;
  34.          end
  35.          else PosTAG := PosEx('TAG', TempStr, PosTAG) + 3;
  36.       end;            
  37.  
  38.  
  39.       //here I am looking for any nested TAAs (just like above).
  40.       PosTAA := Pos('TAA', TryORF) + 3;
  41.       for K := 1 to NumTAAs_Temp do
  42.       begin
  43.          if (PosTAA < lenTryORF) then //if the TAA is in the middle and not the end of the ORF
  44.          begin
  45.             if ((PosTAA - 1) mod 3 = 0) then //if there is a nested ORF then move to next ATG
  46.                break
  47.             else PosTAA := PosEx('TAA', TempStr, PosTAA) + 3;
  48.          end
  49.          else PosTAA := PosEx('TAA', TempStr, PosTAA) + 3;
  50.       end;            
  51.  
  52.        
  53.    end;
  54. end;
  55.  


By the way, If my program finds a nested termination codon (say TGA in the first loop for index K) it is supposied to break from that loop and move to the next ATG in the sequence (that is the J loop) or move to the next K loop (to look for any nested TAGs. Instead, it moves to the I loop, abandoning the current sequence and moving to the next sequence in the file.

What am I doing wrong?



Title: Re: How to determine if a result is an integer?
Post by: BornAgain on October 16, 2021, 12:00:09 am
@jamie, you got it right in your first educated guess! I did indeed have a "Try..finally..end" in my program and all the code was nested between the Try and Finally. I removed it and now the program nicely skips over to the next K loop when it encouters a nested termination triplet. So thank you!

Since I'm not terribly familiar with the ettiquette of the forum, am I supposed to somehow acknowledge when my question was answered? I mean, apart from a "thank you." If there is, I'd be happy to do it.
Title: Re: How to determine if a result is an integer?
Post by: BornAgain on October 27, 2021, 09:21:46 pm
Thank you. Will do, but I thought there was a way to acknowledge that your answer helped. Anyway, sorry for the delay in replying.
TinyPortal © 2005-2018