Recent

Author Topic: Calculation of the pI of proteins according to Skoog and Wichman  (Read 7413 times)

VTwin

  • Hero Member
  • *****
  • Posts: 1215
  • Former Turbo Pascal 3 user
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #15 on: April 26, 2019, 03:26:38 pm »
Although it's a little hard to read the code ...

Indeed!  :D

In my experience, not uncommon in 1980-90's scientific literature though. Now you are required to provide a github link, avoids tedious retyping and typos.  ;)
“Talk is cheap. Show me the code.” -Linus Torvalds

Free Pascal Compiler 3.2.2
macOS 12.1: Lazarus 2.2.6 (64 bit Cocoa M1)
Ubuntu 18.04.3: Lazarus 2.2.6 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.2.6 (64 bit on VBox)

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #16 on: April 26, 2019, 04:01:58 pm »
For starters I don't even understand the input questions.
At some point it asked me for a number: I answered 1 and the whole program just crashed.

So, if I want to test bradykinine, what should be the anwers to the input questions?
Same for the other examples (with expected output) you gave.

Bart

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #17 on: April 26, 2019, 05:23:41 pm »
There are some mistakes on your source code.
My attached example for Lazarus compiles, but it needs correct values for input and output for testing it.

Just curious, after 3 decades since 1986, is this calculus still valid?

My supervisor and a prof. requested to calculate it like that (because they say it's accurate) but as I tried out two sequences it shows major differences to the ones you get from https://web.expasy.org/compute_pi/  (which is common practice). Here you just insert the letters for the amino acids if you want to double check:

Alanin                   Ala   A   
Arginin               Arg   R   
Asparagin               Asn   N   
Asparaginsäure      Asp   D   
Cystein               Cys   C   
Glutamin               Gln   Q   
Glutaminsäure       Glu   E   
Glycin               Gly   G
Histidin               His   H   
Isoleucin               Ile   I   
Leucin               Leu   L   
Lysin                       Lys   K   
Methionin               Met   M   
Phenylalanin       Phe   F   
Prolin               Pro   P   
Serin                       Ser   S   
Threonin               Thr   T   
Tryptophan       Trp   W   
Tyrosin               Tyr   Y   
Valin                       Val   V

I tried with Bradykinin (ArgProProGlyPheSerProPheArg --> RPPGFSPFR ) calculated by lazarus: pI 9.605 calculated by expasy: 12.00
Also I tried with a C-Phycocyaninbetasubunit (MetPheAspAlaPheThrLysValValSerGlnAlaAspThrArgGlyGluMetLeuSerThrAlaGlyIleAsp -->
MFDAFTKVVS QADTRGEMLS TAQID) calculated by lazarus was something really high while expasy comes to: 4.23
If you want to get amino acid sequences you can find all of them at https://www.uniprot.org/

I don't know your knowledge about Peptides the N terminus starts at the left hand side and the C terminus at the right hand side just in case you want to check the code by yourself. 

Now I have some questions. Might it be that the sequences are too big because max_chains is set to 5?
Also it is quite difficult to enter the number of tyrosins because it's on the bottom of the box. How do I move it further up?
Last but not least I want to change the way the program asks for the c and n subunits (first all the n subunits and afterwards all c subunits) is that possible?

Thank you all so far! It's really a great help!  :-[ :-*
Be aware that I just took your source code, put it on Lazarus, compared it to your pdf file's source code and fixed it until it got compiled.
I don't know enough Bio-informatics to fix its logic.

It seems you are going too fast...
First, on my viewpoint, you should learn a bit more of Free Pascal and Lazarus.
Second, you should learn how to debug your programs.
Third, you should understand the Skoog and Wichman's algorithm to calculate the pI of proteins to the level of performing it using only paper and pencil.
And finally, you should double check my solution to fix its errors and logic.

As an example, you could start here:
Devstructor.com/DelphiTutorialsTk
https://www.youtube.com/channel/UCGT_qnRHoLnmrUxZrgaDSUQ

And here:
http://wiki.lazarus.freepascal.org/Main_Page
https://forum.lazarus.freepascal.org/index.php?action=search


And here:

Tutorials
https://forum.lazarus.freepascal.org/index.php/topic,42003.0.html

Best Tutorial
https://forum.lazarus.freepascal.org/index.php/topic,45078.msg317864.html#msg317864

Pascal Books
https://forum.lazarus.freepascal.org/index.php/topic,43921.0.html

New user introduction
https://forum.lazarus.freepascal.org/index.php/topic,43494.0.html

Conversion from Delphi
https://forum.lazarus.freepascal.org/index.php/topic,42231.msg294453.html#msg294453

Book recommendation on programming language concepts and design ?
https://forum.lazarus.freepascal.org/index.php/topic,42156.0.html

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #18 on: April 27, 2019, 03:27:00 pm »
For example imagine a protein consisting out of all the charged amino acids arg, asp, cys, glu, his, lys, tyr all of them only once and the total peptide consisting only out of these 7 chains it calculates a pI of 1.967.

So that peptide would be arg-asp-cys-glu-his-lys-tyr ?

If I enter the order RDCEHKY (short forms of the amino acid short forms  :D) in expasy it comes to a pI of 6.74.

It seems that if I enter this in any order of amino-acids, the result is always the same.
Does that make sense?

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #19 on: April 27, 2019, 03:49:04 pm »
Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. // Calculation of the pI of proteins according to Skoog and Wichman
  4. // https://forum.lazarus.freepascal.org/index.php/topic,45188.0.html
  5. ...
  6. end.

I refactored your code a bit.
Used SpinEdits for input.
Removed unnecessary Temp variable.
Replaced Check() with LowerCase.
Renamed Input to GetInput (because CodeTools don't agree with it)
Remove "C_term[n] := TermC" after the first Findend call in GetInput, because this is not there in the original sourcecode
Did a sanity check in Findend (any input for Test would be processed)
Input could be made more robust by letting the user select aminoacids from a dropdownlist instead of a free text edit field.

However: I have no idea what all the input variables mean whatsoever.
The original code is horrible. Global variables all over the place.
Why does it compute an answer if all the numbers of aminoacids are set to zero?

@TS: how would I enter  Bradykinin (ArgProProGlyPheSerProPheArg) into the attached program (GUI version, see attachment) or the original one?
I have absolutely no idea.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #20 on: April 28, 2019, 01:35:04 pm »
Refactored some more.
No more global variables!
Procedures that use the form are now method of the form.
Renamed some procedures.
Made the NtermpK and CtermpK constant arrays.

As a result the code flow is better to follow.
Still, I have absolutely no idea how it is supposed to work.

It will make testing a bit easier though.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #21 on: April 28, 2019, 02:01:40 pm »
@topic starter: since the (original) code gives results that differ from your reference tool, it might be worth while to check if the Skkog and Wichman article has ever been quoted or folowed up.
The original code as it appeared in that article may be flawed and perhaps corrected later on?

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #22 on: April 28, 2019, 02:18:47 pm »
Later research seems to have used the calculations of Skoog and Wichman. See: http://www.prodoric.de/download/hiller03.pdf (Section: Materials and Methods) or https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=31&ved=2ahUKEwi2k9a84vLhAhWKr6QKHUpDB7g4HhAWMAB6BAgAEAI&url=https%3A%2F%2Fpermegear.com%2F%3Fmdocs-file%3D8621&usg=AOvVaw0UmDnZ4ksa_NjBUICxWsYN (section 3.1).

So, their seems to be some validity of this program?

Bart

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #23 on: April 28, 2019, 02:44:19 pm »
On the other hand, based on ExPASy's docs:
Quote
1-Protein pI is calculated using pK values of amino acids described in Bjellqvist et al., which were defined by examining polypeptide migration between pH 4.5 to 7.3 in an immobilised pH gradient gel environment with 9.2M and 9.8M urea at 15°C or 25°C. Prediction of protein pI for highly basic proteins is yet to be studied and it is possible that current Compute pI/Mw predictions may not be adequate for this purpose.

2-The buffer capacity of a protein will affect the accuracy of its predicted pI, with poor buffer capacity leading to greater error in prediction (Bjellqvist et al.). Because of this, pI predictions for small proteins can be problematic.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #24 on: April 30, 2019, 12:08:29 am »
There are 20 amino acids. I suggest changing TAminoAcidEnums to include them all:
Code: Pascal  [Select][+][-]
  1.   TAminoAcidEnums = (Ala, Arg, Asn, Asp, Cys, Gln, Glu, Gly, His, Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr, Trp, Tyr, Val);

One letter code for amino acids could be used to count them:
Code: Pascal  [Select][+][-]
  1. function Amino1ToCounts(AStr: String): TAminoAcidCounts;
  2. var
  3.   LChar: Char;
  4. begin
  5.   Result := Default(TAminoAcidCounts);
  6.   for LChar in AStr do
  7.   begin
  8.     case LChar of
  9.     'A':inc(Result[Ala]);
  10.     'R':inc(Result[Arg]);
  11.     'N':inc(Result[Asn]);
  12.     'D':inc(Result[Asp]);
  13.     'C':inc(Result[Cys]);
  14.     'Q':inc(Result[Gln]);
  15.     'E':inc(Result[Glu]);
  16.     'G':inc(Result[Gly]);
  17.     'H':inc(Result[His]);
  18.     'I':inc(Result[Ile]);
  19.     'L':inc(Result[Leu]);
  20.     'K':inc(Result[Lys]);
  21.     'M':inc(Result[Met]);
  22.     'F':inc(Result[Phe]);
  23.     'P':inc(Result[Pro]);
  24.     'S':inc(Result[Ser]);
  25.     'T':inc(Result[Thr]);
  26.     'W':inc(Result[Trp]);
  27.     'Y':inc(Result[Tyr]);
  28.     'V':inc(Result[Val]);
  29.     else
  30.        // Maybe report an error
  31.     end;
  32.   end;
  33. end;

Would be nice to provide a way to switch between one letter and three letters encoding, Amino3To1 and Amino1To3:
Code: Pascal  [Select][+][-]
  1. function Amino3To1(AStr: string):string;
  2. var
  3.   i:integer;
  4.   LStr: string;
  5. begin
  6.   Result := '';
  7.   i := 1;
  8.   while i < Length(AStr) do
  9.   begin
  10.     LStr := lowercase(copy(AStr, i, 3));
  11.     case LStr of
  12.     'ala':Result := Result + 'A';
  13.     'arg':Result := Result + 'R';
  14.     'asn':Result := Result + 'N';
  15.     'asp':Result := Result + 'D';
  16.     'cys':Result := Result + 'C';
  17.     'gln':Result := Result + 'Q';
  18.     'glu':Result := Result + 'E';
  19.     'gly':Result := Result + 'G';
  20.     'his':Result := Result + 'H';
  21.     'ile':Result := Result + 'I';
  22.     'leu':Result := Result + 'L';
  23.     'lys':Result := Result + 'K';
  24.     'met':Result := Result + 'M';
  25.     'phe':Result := Result + 'F';
  26.     'pro':Result := Result + 'P';
  27.     'ser':Result := Result + 'S';
  28.     'thr':Result := Result + 'T';
  29.     'trp':Result := Result + 'W';
  30.     'tyr':Result := Result + 'Y';
  31.     'val':Result := Result + 'V';
  32.     else
  33.        Result := Result + '?';
  34.     end;
  35.     inc(i,3);
  36.   end;
  37. end;
  38.  
  39. function Amino1To3(AStr: string):string;
  40. var
  41.   i:integer;
  42. begin
  43.   Result := '';
  44.   for i := 1 To Length(AStr) do
  45.   begin
  46.     case AStr[i] of
  47.     'A':Result := Result + 'Ala';
  48.     'R':Result := Result + 'Arg';
  49.     'N':Result := Result + 'Asn';
  50.     'D':Result := Result + 'Asp';
  51.     'C':Result := Result + 'Cys';
  52.     'Q':Result := Result + 'Gln';
  53.     'E':Result := Result + 'Glu';
  54.     'G':Result := Result + 'Gly';
  55.     'H':Result := Result + 'His';
  56.     'I':Result := Result + 'Ile';
  57.     'L':Result := Result + 'Leu';
  58.     'K':Result := Result + 'Lys';
  59.     'M':Result := Result + 'Met';
  60.     'F':Result := Result + 'Phe';
  61.     'P':Result := Result + 'Pro';
  62.     'S':Result := Result + 'Ser';
  63.     'T':Result := Result + 'Thr';
  64.     'W':Result := Result + 'Trp';
  65.     'Y':Result := Result + 'Tyr';
  66.     'V':Result := Result + 'Val';
  67.  
  68.     else
  69.        Result := Result + '???';
  70.     end;
  71.   end;
  72. end;

It seems that only seven of them (arg, asp, cys, glu, his, lys, tyr) hold a charge (positive in red). Which mean instead of:
Code: Pascal  [Select][+][-]
  1.    pKa3_arg       = 12.5;
  2.    pKa3_asp       = 3.9;
  3.    pKa3_cys       = 8.3;
  4.    pKa3_glu       = 4.3;
  5.    pKa3_his       = 6.0;
  6.    pKa3_lys       = 10.5;
  7.    pKa3_tyr       = 10.1;

An array could be used, most if its elements are zeros, and three are positive values. Probably not needed.

Edit:
Code: Pascal  [Select][+][-]
  1.     alanine,                     ala, A
  2.     arginine,                    arg, R
  3.     asparagine,                  asn, N
  4.     aspartic acid,               asp, D
  5.   //asparagine or aspartic acid, asx, B
  6.     cysteine,                    cys, C
  7.     glutamic,                    glu, E
  8.     glutamine,                   gln, Q
  9.   //glutamine or glutamic acid,  glx, Z
  10.     glycine,                     gly, G
  11.     histidine,                   his, H
  12.     isoleucine,                  ile, I
  13.     leucine,                     leu, L
  14.     lysine,                      lys, K
  15.     methionine,                  met, M
  16.     phenylalanine,               phe, F
  17.     proline,                     pro, P
  18.     serine,                      ser, S
  19.     threonine,                   thr, T
  20.     tryptophan,                  trp, W
  21.     tyrosine,                    tyr, Y
  22.     valine,                      val, V
  23.  
« Last Edit: April 30, 2019, 12:12:06 am by engkin »

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #25 on: April 30, 2019, 06:09:08 pm »
It seems that if I enter this in any order of amino-acids, the result is always the same.
Does that make sense?
Yes, the order of the amino acid in the peptide sequence is immaterial. Wherever it sits in the chain, if it has a charged side chain, it makes the same contribution to the overall charge.

I've included a few pre-built peptide/protein examples in the attached project, and three different sources for the experimentally obtained pKa values. Although there are slight variations between the sets of data, it rarely makes more than a difference of 0.1 to the theoretical pI calculated value. Nevertheless, I think the calculated pI value should only ever have 2 significant digits. Values such as 8.53 have a false precision, not warranted by the spread of experimental values for the pKa data among the masses of published tables for these determinations.

The visualisation of the amino acid sequence in the attached project is shown starting at the NH2 end and ending at the COOH end. Of course, if these molecules are in aqueous solution, they most likely exist as zwitterions, and should then  be shown as H3N+... ...COO-

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #26 on: May 01, 2019, 02:13:34 pm »
I did dsome more refactoring on the original code:
  • Separated all calculation related stuff into a separate unit
  • Renamed alsmost all types.
  • The type TAminoAcid now in fact represents an aminoacid (as an emum)
  • Redesigned C_Term and N_Term array into an array of records.
  • Got rid of all global viriables
  • Exposed only those types the end user needs
  • Exposed aminoacid names for convenience
  • Redesigned the GUI
  • Made a custom dialog that asks for terminating aminoacids (foolproof input)
  • Use (Float)SpinEdits for input variables

Bart

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #27 on: May 01, 2019, 07:39:26 pm »
@Bart

Wow

Downloaded to look at the code. A Lot neater than mine.

Nice, Very nice.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #28 on: May 01, 2019, 11:21:32 pm »
One more adjustment.
The internal type TpK_Values is now an array[TAminoAcid] of Double, instead of arra[1..20].

Bart

s_bue

  • Newbie
  • Posts: 6
Re: Calculation of the pI of proteins according to Skoog and Wichman
« Reply #29 on: May 05, 2019, 07:42:52 pm »
One more adjustment.
The internal type TpK_Values is now an array[TAminoAcid] of Double, instead of arra[1..20].

Bart

WOW That's such a great job you did there, and all the others as well! In fact I started trying to do more by myself but I had a tough week at work  %) and therefore not a lot of time. Now I come back to the topic to ask more questions and look at this! Thanks a lot everyone!  :-*

 

TinyPortal © 2005-2018