Forum > Databases

Random record from a database

(1/1)

jollytall:
Disclaimer: I am not sure that it is at all FPC related, probably more MySQL related, but as I found it through FPC and this is a very helpful community, (unlike some others), I share it here.

From a large database I wanted to find a random record, actually in two steps; first select a random value of Field1 and then from those records a random one. As I understood the best, recommended way is as follows:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function GetRandomField1 : string;  begin  result := '';  try    aSQLQuery.SQL.Text:='select distinct(Field1) from Table1 where not Field1 = "ExcludedValue" order by rand() limit 1';    aSQLQuery.Open;    result := aSQLQuery.Fields[0].AsString;  finally    aSQLQuery.Close;    end;  end;  and the second step

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function GetRandomField2(const aField1 : string) : tDateTime;  begin  result := 0;  try    aSQLQuery.SQL.Text:='select Field2 from Table1 where Field1 = "' + aField1 + '" order by rand() limit 1';    aSQLQuery.Open;    result := aSQLQuery.Fields[0].AsDateTime;  finally    aSQLQuery.Close;    end;  end; With these I indeed got reasonably random values.

BUT

I used it for some AI research. So, I generated INSIDE ONE PROGRAM in a loop N datasets, each of M random elements made with the above program and ran on them O learning cycles (but the same dataset was not used again later). When the next dataset was used for the first time (i.e. before it was used for training) I measured the quality of the model (cost) and not surprisingly as it progressed through the N datasets, the result got better and better. However it was really surprising that the result got MUCH better than expected. To be even more surprising, when I restarted the program, the result dropped back to about a reasonable level.
I looked for many places (if e.g. I save/restore the model correctly, etc.), but whatever I did it was always so. A new random dataset got improved results as the training progressed within a single program, but dropped back when the program was started again.
So, finally I started suspecting that the SQL random is not really random. That, sort-of, explains it. If the N datasets are not totally random, but are somewhat overlapping, then the training progresses on "already known" data, so it is not surprising that the results improve very fast and when the program restarted and the next dataset is independent then the result drops back.

To check it I re-wrote the two SQL subroutines as follows:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function GetRandomField1 : string;  var    Count : integer;  begin  result := '';  try    aSQLQuery.SQL.Text:='select distinct(Field1) from Table1 where not Field1 = "ExcludedValue"';    aSQLQuery.Open;    aSQLQuery.Last;    Count := aSQLQuery.RecordCount;    Count := Random(Count);    aSQLQuery.First;    aSQLQuery.MoveBy(Count);    result := aSQLQuery.Fields[0].AsString;  finally    aSQLQuery.Close;    end;  end;  and
--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function GetRandomField2(const aField1 : string) : tDateTime;  var    Count : integer;  begin  result := 0;  try    aSQLQuery.SQL.Text:='select Field2 from Table1 where Field1 = "' + aField1 + '" order by Field2'; // I know SQL inject problem, but here it is OK    aSQLQuery.Open;    aSQLQuery.Last;    Count := aSQLQuery.RecordCount;    Count := Random(Count);    aSQLQuery.First;    aSQLQuery.MoveBy(Count);    result := aSQLQuery.Fields[0].AsDateTime;  finally    aSQLQuery.Close;    end;  end; and the problem described earlier, has just disappeared.

So, it seems that the first version of random did not work as intended. However I still do not understand why:
- I could imagine a non randomize like problem, but that would mean that when the program is run again and again, the generated randoms are the same. In this case the effect would be the opposite of what I observed. A new run would generate the same datasets, so a new run would have better results than a continued run of a single program.
- I could also imagine some sort of random precision problem, i.e. that the records are "random" but can only take up certain values due to some rounding, but then a new run would still be in the same set of values than a longer old run, so the result would not drop back.

Anybody has any experience with it?

P.S. Because of the large database, large datasets and long runs it is not easy to make a kind of correlation analysis on the different runs within one program to prove that indeed the later datasets are not independent of the old ones; but after the workaround described, I am quite convinced that somewhere the SQL random has a problem.

Navigation

[0] Message Index

Go to full version