Recent

Author Topic: Sorting and Counting  (Read 36426 times)

mpknap

  • Full Member
  • ***
  • Posts: 155
Re: Sorting and Counting
« Reply #195 on: July 03, 2020, 07:36:42 am »
 
That's why it's important to first think of what you want, set it on paper, have a design, and then (and not sooner) go programming.

I understand everything you want to advise me. And thank you.

The point is that nobody knows what we're looking for and how to find it. These are just my guesses.
This can be compared to "looking for rain drops that will fall on three flowers in a large garden at the same time. If this event occurs again within a few minutes (at least once), there is a suspicion of success."

It's about searching for cascades of cosmic rays.
I conduct research on my own, in my free time. I'm not a programmer, engineer or scientist that's why so much chaos in my questions;)

And my progress is only thanks to you and your knowledge.

Quote
The only problem is that i do not know exactly what you mean by "it is not possible to display in the DEVICEID output column all the Devices participating in the multi event"
TRON.
If the value in the DUP column is, for example, "2", we still do not know which Device_ID make up it. There would have to be an additional column in which it will show Devices numbers ... See screen in the attachment
 

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Sorting and Counting
« Reply #196 on: July 03, 2020, 08:03:42 am »
The point is that nobody knows what we're looking for and how to find it.
Just a very general comment.  When you believe there might be something to be found in the data but don't even know what, that's when SQL databases are great.  (not the only thing they are excellent for but, that's one of them.)

SQL allows you to "play" with the data with little effort.  Of course, you'll get the most of out of it by acquiring a fairly decent level of knowledge in SQL.  Fortunately, SQL is quite easy and there are a lot of forums with helpful users willing to help when you hit a brick wall.

Personally, I like Postgres but, when it comes to user support, some of the users that participate in the Oracle SQL forum are literally incredible.  Both are extremely capable DBMS systems and they'll allow you to look at data just about every which way you want to look at it, in just a few lines of SQL.

In long winded way, what I'm saying is that I probably wouldn't use Pascal for what you're doing.  I'd use something that requires less work to try random things on the fly.


(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

TRon

  • Hero Member
  • *****
  • Posts: 2435
Re: Sorting and Counting
« Reply #197 on: July 03, 2020, 08:05:02 am »
The point is that nobody knows what we're looking for and how to find it. These are just my guesses.
I have no idea what your program is suppose to be doing as a final result or how this should be presented to the user, since you are the programmer you are the one in control. So, yes unless you do not know what you wish to achieve/accomplish in the end then we do not know either  ;)

Quote
This can be compared to "looking for rain drops that will fall on three flowers in a large garden at the same time. If this event occurs again within a few minutes (at least once), there is a suspicion of success."
The answer to that question is 42 btw.

Quote
If the value in the DUP column is, for example, "2", we still do not know which Device_ID make up it. There would have to be an additional column in which it will show Devices numbers ... See screen in the attachment
Yeah, and that is impossible to realise because the dupcount is/can be made up of multiple Device_ID's.

Come to think about it, why do you need a duplicate-count to begin with ? imho it isn't helpful at all to know the number of duplicates, unless you have a specific purpose for it ? (which is currently unknown to us, or at least to me).

afaik this is how you manage to create a list of unique duplicates (assuming data is the name of the SQL table):
Code: SQL  [Select][+][-]
  1. SELECT DISTINCT t1.datetime, t1.device_id FROM DATA AS t1 INNER JOIN DATA AS t2 ON t1.datetime = t2.datetime WHERE t1.device_id <> t2.device_id ORDER BY t1.datetime, CAST(t1.device_id AS INTEGER);
  2.  

And, again afaik, this creates a list of the duplicate items.
Code: SQL  [Select][+][-]
  1. SELECT t1.datetime, t1.device_id, t1.user_id FROM DATA AS t1 INNER JOIN (SELECT datetime, device_id, COUNT(*) AS dupcount FROM DATA GROUP BY datetime, device_id HAVING dupcount > 1) AS t2 ON t1.datetime = t2.datetime AND t1.device_id = t2.device_id ORDER BY t1.datetime, t1.device_id;
  2.  

Both show the device_id's that have a duplicate datetime field.

edit: and that picture ... is exactly the kind of cascading that makes those rays act in chaos.... you will never see those figures, at least not by the provided raindrops  ;D
« Last Edit: July 03, 2020, 10:37:48 am by TRon »

mpknap

  • Full Member
  • ***
  • Posts: 155
Re: Sorting and Counting
« Reply #198 on: July 05, 2020, 04:36:52 pm »

Code: Pascal  [Select][+][-]
  1. procedure TForm1.FormCreate(Sender: TObject);
  2. begin
  3.   FOriginalCSV := TStringList.Create;
  4.   try
  5.     FOriginalCSV.LoadFromFile('pewniacy_odstycznia_true.csv');
  6.     CollectDuplicateDates(FOriginalCSV, DuplicatesMemo);
  7.   finally
  8.     FOriginalCSV.Free;
  9.   end;
  10. end;
  11.  
  12.  
  13. procedure TForm1.CollectDuplicateDates(aCSVList: TStrings; aMemo: TMemo);
  14. var
  15.   i, dups: integer;
  16.   dups_string: String;
  17.   deviceids: TStringList;
  18.   A: array of string;
  19. begin
  20.   deviceids := TStringList.Create;
  21.   try
  22.  
  23.     deviceids.Duplicates := dupAccept;
  24.     deviceids.Sorted := True;
  25.  
  26.     // make a sorted list with DATE+TIME as first entry
  27.     for i := 0 to Pred(aCSVList.Count) do
  28.     begin
  29.       A := aCSVList[i].Split(',');
  30.       if (Length(A) > 2) then deviceids.Add(A[2] + ',' + A[0] + ',' + A[1]);
  31.     end;
  32.  
  33.     dups := 0;
  34.     dups_string := '';
  35.     for i := 1 to Pred(deviceids.Count) do
  36.     begin
  37.       // ONLY check date+time entry, first 19 characters
  38.       if copy(deviceids[i - 1], 1, 19) = copy(deviceids[i], 1, 19) then
  39.       begin
  40.         Inc(dups);
  41.         if dups = 1 then dups_string := deviceids[i - 1];
  42.         dups_string := dups_string + ' // ' + deviceids[i];
  43.       end
  44.       else
  45.       begin
  46.         A := deviceids[i].Split(',');
  47.         if dups > 0 then
  48.           aMemo.Lines.Add('"%s"  count=%d   device ids: "%s"', [A[0], dups + 1, dups_string]);
  49.         dups := 0;
  50.         dups_string := '';
  51.       end;
  52.     end;
  53.  
  54.   finally
  55.     deviceids.Free;
  56.   end;
  57. end;


RVK.
I need one more condition in your code.

I want them to be displayed in Memo, only records where DeviceID are not the same.
If there are 3 dup for DateTime and DeviceID are 3 times the same then we reject it.
I try to do it myself but it doesn't work out.

This will bring me closer to finding "raindrops";)

rvk

  • Hero Member
  • *****
  • Posts: 6111
Re: Sorting and Counting
« Reply #199 on: July 05, 2020, 05:48:42 pm »
I need one more condition in your code.

I want them to be displayed in Memo, only records where DeviceID are not the same.
If there are 3 dup for DateTime and DeviceID are 3 times the same then we reject it.
I try to do it myself but it doesn't work out.

This will bring me closer to finding "raindrops";)
Is you user-id always the same as device-id on the same date+time?
In that case my initial thought was correct and you can just match the entire line during sorting. Set duplicates to dupIgnore and same user,device,datetimes are ignored.

So this should be sufficient
Code: Pascal  [Select][+][-]
  1. deviceids.Duplicates := dupIgnore;

mpknap

  • Full Member
  • ***
  • Posts: 155
Re: Sorting and Counting
« Reply #200 on: July 05, 2020, 07:06:01 pm »

Is you user-id always the same as device-id on the same date+time?
In that case my initial thought was correct and you can just match the entire line during sorting. Set duplicates to dupIgnore and same user,device,datetimes are ignored.
 

Yes . UserID and DeciceID are the same and repeat themselves.

UserID is the user number.
DeviceID is the smartphone number for the user. Users can have multiple devices.

A flower in the garden is just a smartphone.
And the raindrop is the detection of cosmic radiation in the phone.

All data comes from the CREDO IFJ Poland project. :)

I am looking for whether two phones will catch radiation detection in the same second, and whether they will repeat themselves in a short interval of time for the same devices. If they are found, there is suspicion of "Air Shower"

https://en.wikipedia.org/wiki/Air_shower_(physics)
 8)


Quote
So this should be sufficient
Code: Pascal  [Select][+][-]
  1. deviceids.Duplicates := dupIgnore;

Its work :)

TRon

  • Hero Member
  • *****
  • Posts: 2435
Re: Sorting and Counting
« Reply #201 on: July 05, 2020, 08:39:40 pm »
Its work :)
So does
Code: SQL  [Select][+][-]
  1. SELECT rowid, datetime, device_id, user_id, COUNT(datetime) AS dupes, GROUP_CONCAT(DISTINCT device_id || ' (' || user_id || ')' ) AS dup_ids FROM DATA GROUP BY datetime HAVING dupes > 1 AND instr(dup_ids, ',') > 0;
It still doesn't mean that picture of yours is reproducible or representable for your data ...  ;)

mpknap

  • Full Member
  • ***
  • Posts: 155
Re: Sorting and Counting
« Reply #202 on: July 05, 2020, 08:51:06 pm »
Its work :)
So does
Code: SQL  [Select][+][-]
  1. SELECT rowid, datetime, device_id, user_id, COUNT(datetime) AS dupes, GROUP_CONCAT(DISTINCT device_id || ' (' || user_id || ')' ) AS dup_ids FROM DATA GROUP BY datetime HAVING dupes > 1 AND instr(dup_ids, ',') > 0;
It still doesn't mean that picture of yours is reproducible or representable for your data ...  ;)

You are genius!!!! You don't even know how much I was looking for! Revelation :)
Thx!!!

TRon

  • Hero Member
  • *****
  • Posts: 2435
Re: Sorting and Counting
« Reply #203 on: July 05, 2020, 09:47:38 pm »
You are genius!!!!

Albert Einstein was a genius, so was Stephen Hawking. I even consider those that work on compilers such as Free Pascal, or an IDE as Lazurus genius. I am merely someone ploughing my way through boring documentation and attempt to apply what I've just read  :)

The sad part about it really is that I've been sitting on that since I have edited my reply at #197 but, wasn't able to share because you were still at an intermediate step/position in your quest for an answer (one wrong turn heading towards your destination w/could have rendered that SQL line completely useless, in which case I would have had to read even more boring documentation :D ).

Quote
You don't even know how much I was looking for! Revelation :)
I'm pleased to learn that it is useful for you.

I wish you much happy raindrops and balanced cosmic rays !

mpknap

  • Full Member
  • ***
  • Posts: 155
Re: Sorting and Counting
« Reply #204 on: July 05, 2020, 10:47:52 pm »
 
[/quote]

Albert Einstein was a genius, so was Stephen Hawking.
[/quote]
iam Bob. Bob the Builder ;) Simple worker. I love Pascal :)

mpknap

  • Full Member
  • ***
  • Posts: 155
Re: Sorting and Counting
« Reply #205 on: July 11, 2020, 08:27:32 pm »
Its work :)
So does
Code: SQL  [Select][+][-]
  1. SELECT rowid, datetime, device_id, user_id, COUNT(datetime) AS dupes, GROUP_CONCAT(DISTINCT device_id || ' (' || user_id || ')' ) AS dup_ids FROM DATA GROUP BY datetime HAVING dupes > 1 AND instr(dup_ids, ',') > 0;
It still doesn't mean that picture of yours is reproducible or representable for your data ...  ;)

Welcome back. ;)
Ultimately, I'm using the code in this form:
Code: MySQL  [Select][+][-]
  1. [code=mysql]SELECT  datetime(timestamp/1000,'unixepoch') as czas,  
  2. COUNT(datetime(timestamp/1000,'unixepoch')) AS dupes,
  3. GROUP_CONCAT( DISTINCT device_id  ) AS dup_ids
  4. FROM detections
  5.  
  6. GROUP BY datetime(timestamp/1000,'unixepoch') HAVING dupes > 2
  7.  
  8.         AND instr(dup_ids, ',') >0
[/code]
It works great, but I thought about something.
You can make the condition that only those DUP_IDS are displayed where the number of "," is greater than 3 (comma).

Because in this way I can have a triplet shown but different user_ID. It shows your code but also shows most triplets where there really are 2 users which is also correct.

In the JPG attachment with explanation ;)



TRon

  • Hero Member
  • *****
  • Posts: 2435
Re: Sorting and Counting
« Reply #206 on: July 11, 2020, 09:38:13 pm »
You can make the condition that only those DUP_IDS are displayed where the number of "," is greater than 3 (comma).
Yes that is possible to realise, just not very reliable (it involves deleting characters from the original string and comparing the length of both strings in order to determine how many comma's there are).

However, this is starting to turn into a string manipulation contest. At least SQLite was not designed to do such things (in an easy manner). Other SQL databases perhaps might though.

Have you considered creating your own custom function(s) using Pascal ? see also: http://www.sqlite.org/c3ref/create_function.html as unfortunately SQLite does not seem to support the statement "create function".

Quote
In the JPG attachment with explanation ;)
Just for the record. In the dataset you shared with us, there is no such data. In that selection, there doesn't seem to exist any data that matches the criteria with having more than 3 3 or more distinct ID's.

edit: stupid typo.
« Last Edit: July 11, 2020, 09:47:14 pm by TRon »

rvk

  • Hero Member
  • *****
  • Posts: 6111
Re: Sorting and Counting
« Reply #207 on: July 11, 2020, 10:25:33 pm »
You can make the condition that only those DUP_IDS are displayed where the number of "," is greater than 3 (comma).
Yes that is possible to realise, just not very reliable (it involves deleting characters from the original string and comparing the length of both strings in order to determine how many comma's there are).
Something with having count(DISTINCT device_id) > 2 or likewise???

So (but I can't test this)
Code: SQL  [Select][+][-]
  1. SELECT  datetime(TIMESTAMP/1000,'unixepoch') AS czas,  
  2. COUNT(datetime(TIMESTAMP/1000,'unixepoch')) AS dupes,
  3. GROUP_CONCAT( DISTINCT device_id  ) AS dup_ids
  4. FROM detections
  5. GROUP BY datetime(TIMESTAMP/1000,'unixepoch')
  6. HAVING dupes > 2 AND COUNT(DISTINCT device_id) > 2

I'm not sure you even need the dupes column then??
Code: SQL  [Select][+][-]
  1. SELECT  datetime(TIMESTAMP/1000,'unixepoch') AS czas,  
  2. GROUP_CONCAT( DISTINCT device_id  ) AS dup_ids
  3. FROM detections
  4. GROUP BY datetime(TIMESTAMP/1000,'unixepoch')
  5. HAVING COUNT(DISTINCT device_id) > 2
??
« Last Edit: July 11, 2020, 10:31:01 pm by rvk »

TRon

  • Hero Member
  • *****
  • Posts: 2435
Re: Sorting and Counting
« Reply #208 on: July 11, 2020, 11:16:04 pm »
Something with having count(DISTINCT device_id) > 2 or likewise???
Interesting.

I didn't know you where allowed to do that, thank you rvk.

Seems to work like a charm.

So, your first statement does the job, the second one (and I agree that it is tempting to want to try) unfortunately seem to include more results then the original statement we started out with. I haven't been able to determine which data exactly it concerns (so unable to tell why).

Other than that I am also unable to test it further as the provided dataset does not contain any data matching the criteria.

TRon

  • Hero Member
  • *****
  • Posts: 2435
Re: Sorting and Counting
« Reply #209 on: July 12, 2020, 12:17:30 am »
...the second one (and I agree that it is tempting to want to try) unfortunately seem to include more results then the original statement we started out with. I haven't been able to determine which data exactly it concerns (so unable to tell why).
Oh, wait... seems I made an error in my verification SQL statement there.  :-[ Sorry about that.

Yes, your second solution @rvk seems to work also.

@mpknap: as stated before: Change the objective and you can start redesigning your statement(s)  ;)

 

TinyPortal © 2005-2018