Lazarus

Miscellaneous => Other => Topic started by: Bart on February 23, 2020, 10:56:50 pm

Title: Question about statistics
Post by: Bart on February 23, 2020, 10:56:50 pm
Hi,

Sorry for being off-topic here.
(Well, I do plan to program this in Pascal if at all applicable).

It has been more than 20 years since I was tought basic statistsics at uni.

Given that I have a sample of patients.
Some of them have characteristic X (at start of treatment), some don't.
They all recieve the same treatment.
Some have the desired outcome, some don't.

Desired OutcomeNot Desired Outcome
Has XAB
Does not have XCD

(Unfortunately the table does not show any borders, bbcode doesn't seem to support that.)

A,B,C and D are the respective number of patients.

What test should I use to test wether or not the presence of X has a significant effect on wether or not we achieve the desired outcome?

The total number of patiens (A+B+C+D) is somewhere around 150 (and may grow to 300 or more).

Bart
Title: Re: Question about statistics
Post by: TheLastCayen on February 23, 2020, 11:47:32 pm
Not too sure if I follow your tough but I assume you have to store the data in a database of some sort...
If it's the case, you can just create the field HasX as a boolean(if supported by your SQL)

With sqlite for example you will create the field as an Integer(boolean not supported)  0 = false, 1 = true
Code: SQL  [Select][+][-]
  1. ...
  2.   CREATE TABLE "Data"(
  3.     "ID" INTEGER NOT NULL PRIMARY KEY,
  4.     "HasX"  INTEGER,
  5.     "DesiredOutcome" INTEGER,
  6.     "NotDesiredOutcome" INTEGER,
  7. ...
  8.   );
  9.  

I assumed you will store the data in the fields  "DesiredOutcome" and "NotDesiredOutcome" as integer since you mention the 150 to 300 result... but you could also use another boolean type and save the numbers in another variable... The magic will happen when you query your data from the database.

Code: SQL  [Select][+][-]
  1.   SELECT "DesiredOutcome", "NotDesiredOutcome" FROM "Data" WHERE "HasX" = 0
  2.  
This will only return the result for your patients who don't have X

Code: SQL  [Select][+][-]
  1.   SELECT "DesiredOutcome", "NotDesiredOutcome" FROM "Data" WHERE "HasX" = 1
  2.  

And this will return the result for patients who had X...

I would give you more details about how to do it in FPC but since you don't use  it at the moment, this link will be more useful:

https://wiki.freepascal.org/Lazarus_Database_Overview
Title: Re: Question about statistics
Post by: Bart on February 23, 2020, 11:54:53 pm
Determining how many patients are in each cell of the table is not the issue (that's trivial, even if my data are not in a database like structure and I cannot use SQL).

Calculating wether there is a significant relationship between having X (or not) and having the desired outcome is.

Bart
Title: Re: Question about statistics
Post by: MSABC on February 24, 2020, 12:00:08 am
Hi together

I think this question relates to statistic problems rather than programming.
There are several methods to prove significance for several situations.
See:
https://en.wikipedia.org/wiki/Statistical_significance

I'm not really good in this science, but,in this case something like student-test or t-test may be the solution.


Title: Re: Question about statistics
Post by: MaxCuriosus on February 24, 2020, 12:02:34 am
In my opinion there is no test because (most likely for logical or ethical reasons) you cannot "inverse" ( does not mean stop) a treatement and see how many from A or B would have the characteristic X or not X. So there is no correlation between the two sets, thus no meaningful test.
Title: Re: Question about statistics
Post by: TheLastCayen on February 24, 2020, 12:49:15 am
If you don't use a database, you can still create an array of your own Type to store HasX, Desiredoutcome and NotDesiredOutcome. it would not affect the result of A+B+C+D... 
Code: Pascal  [Select][+][-]
  1. ...
  2. Result := 0;
  3. For I := 0 to Length(Data) -1 do
  4.   result := result + Data[I].Desiredoutcom + Data[I].NotDesiredOutcome
  5.  
  6.  

In the same way, With a few cleavers If statements, you can now count how many Data.HasX has the desired outcome and compares with whatever you need... It will tell you if X had an impact or not and the result...

This is how I would do it as a programmer but let see if anyone else can come with a mathematical way;) Just don't restrict yourself;)

Good luck
Title: Re: Question about statistics
Post by: Bart on February 24, 2020, 11:00:39 am
If you don't use a database, you can still create an array of your own Type to store HasX, Desiredoutcome and NotDesiredOutcome. it would not affect the result of A+B+C+D... 

As I said before: that part is not the problem.

Bart
Title: Re: Question about statistics
Post by: Bart on February 24, 2020, 11:15:44 am
In my opinion there is no test because (most likely for logical or ethical reasons) you cannot "inverse" ( does not mean stop) a treatement and see how many from A or B would have the characteristic X or not X. So there is no correlation between the two sets, thus no meaningful test.

That doesn't realy make sense to me.
Let's say that my patient have brest cancer.
(To be clear: this is absolutely NOT the case!)

Condition X could then be: having BRCA mutation.
Treatment for all patient is drug Y (which looked like it was a potentially new and exciting drug in preliminary tests).

No metastatis presentMetastasis present
Has BRCA10012
Does not have BRC25096

There seems to be a relationship between having BRCA and a good outcome of the treatment.
If this turns out to be significant (and confirmed in larger studies) this would then imply that drug Y is not suitable for patients with brest cancer that do not have the BRCA mutation.

And beore you even start asking:

Bart
Title: Re: Question about statistics
Post by: wp on February 24, 2020, 11:59:51 am
If you can read German maybe this scriptum is useful for you: http://www.math.uni-duesseldorf.de/~braun/bio1112/printout1318.pdf - it explains the statistics for evaluating a blood pressure medication in comparison to a placebo. The only difference to your problem is that the blood pressure is measured as a continuous number, while your results are given as boolean ("Has X", "Does not have X"). But maybe you can generalize the calculation to the values 1 or 0.

Note that fcl's NumLib has functions to calculate the t statistics, tdist(), and its inverse, invtdist.
Title: Re: Question about statistics
Post by: Bart on February 24, 2020, 02:28:56 pm
If you can read German maybe this scriptum is useful for you: http://www.math.uni-duesseldorf.de/~braun/bio1112/printout1318.pdf ...
I'll certainly have a look at it.

Note that fcl's NumLib has functions to calculate the t statistics, tdist(), and its inverse, invtdist.

Now you tell me.
I already implemented a function that returns Min, Max, Mean, Median, Std. dev.
The math unit has those as separate functions, so if you need more, it will run through the loop many times.

The numlib library needs to be rewritten, so that all functions have a self-explanatory name.

Bart
Title: Re: Question about statistics
Post by: wp on February 24, 2020, 04:01:41 pm
The numlib library needs to be rewritten, so that all functions have a self-explanatory name.
Yes, but it must be rewritten from zero because - believe it or not - there is some software out there which uses it. And writing a full math package is a huge task... To avoid this I wrote the NumLib wiki article (https://wiki.lazarus.freepascal.org/NumLib_Documentation) - at least something.

As for statistics programs: Recently I came across this post: https://forum.lazarus.freepascal.org/index.php/topic,8437.msg40697.html. It introduces LazStats, a statistics program written in Lazarus. I copied the sources onto my GitHub (https://github.com/wp-xyz/LazStats) and began to improve the layout and several minor weaknesses, but seeing that the code is full of lots of GoTo's and endless procedures this is would be another project worth to be rewritten, but knowing my poor knowledge of statistics I won't do this. Again, it is better than nothing for someone who cannot use the expensive statistics programs.
Title: Re: Question about statistics
Post by: MaxCuriosus on February 24, 2020, 10:29:36 pm
Bart,
there is a confusion of logic in your table:

If Set-A has condition X (mutation), saying that Set-B condition is not-X is misleading and suggests that Set-A has either no condition or that the condition is somehow the opposite of Set-A. The correct way of saying is that Set-B has a condition Y, with Y<>X (since Set-B has the same desease, it must have a condition that leads to the undesired outcome, presence of metastasis).

That leads to this revised table:


                                                                  probability of desired outcome
                                                                               (no metastasis)
         
Set           Condition                           before treat.                             after treat.

Set-A             X                                       low                                        100/112            

Set-B            Y                                        low                                        250/96

Since there is no correlation between conditions X and Y, there isn't any between the two sets of clinical trials, and therefore your search for a "test" is in vain.

The only trivial conclusion is that the treatement is more effective when the desease is caused by one condition vs. the other.
Title: Re: Question about statistics
Post by: Bart on February 24, 2020, 10:46:26 pm
The only trivial conclusion is that the treatement is more effective when the desease is caused by one condition vs. the other.

I disagree.
Condition Has X vs does not Has X is a prognostic factor for the treatment in question.
The table is 2 x 2 dichotomic: both the dependig variable (Has X/not Has X) and the outcome (No meta's/Meta's) only have 2 options.

From the figures alone, you cannot say that the current distribution is NOT by change, given the H0 hypothesis that bot conditions have an equal "effect" on the outcome.

B.t.w. "Has X" does not necessarly cause the disease. In stead of BRCA, I might have chosen ER+ or ER- receptor status of the tumor.
That one actually has been proven to be a prognostic factor for hormona,l treatment: if the tumor is ER+, there are significantly less patient with metastasis after hormonal treatment than in the group with ER- tumors.
You would get a similar table as in my example, and given thet correct statistical test, significance can be proven.

there isn't any between the two sets of clinical trials

I guess you misunderstand me.
There are no 2 separate trials.
There is 1 group of patients, they all get the same treatment and outcome (seems to) differ(s) between the two groups.

Identifying what is the factor that separates the two groups often is only discovered after treatment of the whole group (of course you must have considered it as a possibility and register that "property" of the inidvidual patient).

Bart
Title: Re: Question about statistics
Post by: zamronypj on February 24, 2020, 11:48:33 pm
My statistic is rather rusty so CMIIW

initial condition (I) has two values, has X=1, not has X=0,
Outcome (O), Desire outcome=1, Not desire=0

Patient.    I    O
1.               1.   0
2.               0.   1
....and so on

Then you can find correlation between inital condition (I) and outcome( O) using statistical correlation analysis
Title: Re: Question about statistics
Post by: MSABC on February 25, 2020, 12:03:05 am
Hi together,

I guess it is a question of statistical hypothesis testing.
See: https://en.wikipedia.org/wiki/Statistical_hypothesis_testing for a beginning

It should be common knowledge for physician and pharmacist students, although I doubt they do it properly ...
(not to be mixed up with the "student test" :))

I was in it more some years ago, still having troubles with the negations:
 - I want to show my measure works
 - I have to build the hypothesis it doesn't work
 - I have to prove that it doesn't work
 - If the prove fails significantly...
 - ... I proved that my measure works
... or so - I'm still confused.

And there are still some statistical traps to get the wrong result - as far as I remeber.

@WP: I didn't read the article you mentioned yet  - maybe I just repeat your contribution
Title: Re: Question about statistics
Post by: MaxCuriosus on February 26, 2020, 12:19:52 am
Bart,
following your clarification here is my suggested formula:

P = ( Pa + Pd ) / 2

where

Pa = A / ( A + B )
Pd = D / ( C + D )

With your numeric example that leads to P = 0.58, meaning that there is a 58% chance the treatement applied to a patient with condition X will result in the desired outcome.
Title: Re: Question about statistics
Post by: Otto on March 05, 2020, 12:07:25 pm
Hello.
Bart, have you solved your statistical correlation problem yet?
Title: Re: Question about statistics
Post by: Bart on March 05, 2020, 06:03:27 pm
From what I've read I decided to use Fisher exact test.
That in itself presented another problem because I now had to calculate with large factorials, which tend to bomb out if you can't use 80-bit extended type.

Fotunately I rembered (from way back in high school) I could use logarithmics to avoid that.

Bart
Title: Re: Question about statistics
Post by: Otto on March 05, 2020, 10:08:16 pm
Very well Bart, have you implemented everything in Lazarus/FPC?
Title: Re: Question about statistics
Post by: Bart on March 05, 2020, 10:21:28 pm
Very well Bart, have you implemented everything in Lazarus/FPC?

Of course I have. O:-)

Bart
TinyPortal © 2005-2018