Recent

Author Topic: distinct values from very large table  (Read 1332 times)

Zvoni

  • Hero Member
  • *****
  • Posts: 2690
Re: distinct values from very large table
« Reply #15 on: September 06, 2024, 10:16:15 am »
It’s a shame that sql doesn’t support passing different tables as a parameter to a procedure. It seems painfully redundant compared to pascal. I guess that is the big difference between compiled and scripting languages?
That doesn't have to do with SQL.
Not a single DBMS i know allows passing of tablenames or Columnames as a Parameter.
It is as it is


Quote
Zvoni the colum values in this particular table are not allowed to be null. Default is 0. Also I probably should have mentioned that this table is a lookup table which is not supposed to be modified by users.
OK, the separate AFTER INSERT Triggers can be coalesced into one single Trigger
Code: SQL  [Select][+][-]
  1. CREATE TRIGGER trg_joanna_master_AI AFTER INSERT ON tbl_joanna_master
  2. FOR EACH ROW
  3. BEGIN
  4.         INSERT INTO tbl_joanna_distinct(DistValue) VALUES(NEW.Col1) ON CONFLICT(DistValue) DO UPDATE SET DistCount=DistCount+1 WHERE DistValue=NEW.Col1;  
  5.         INSERT INTO tbl_joanna_distinct(DistValue) VALUES(NEW.Col2) ON CONFLICT(DistValue) DO UPDATE SET DistCount=DistCount+1 WHERE DistValue=NEW.Col2;
  6.         INSERT INTO tbl_joanna_distinct(DistValue) VALUES(NEW.Col3) ON CONFLICT(DistValue) DO UPDATE SET DistCount=DistCount+1 WHERE DistValue=NEW.Col3;    
  7.         -- and so on
  8. END;

Quote
I’m glad you gave code for update and deleting though, I will save it in case I ever need it.
I would use them in any case. Because they still work if you do something from a different DB-Client (DBBrowser, DBeaver etc.)

Quote
If I want to populate the distinct table from existing table as a test , how would I do it? I’m guessing something along the lines of
Insert into tbl_distinct select columns I want ?
Close.
it would be
1) Drop ForeignKey-Constraints between your Big table and any Child-tables that have a reference to the primary key of your big table
2) rename your big table to "tbl_joanna_master_old"
3) create "tbl_joanna_master" anew with exactly the same columns and attributes
4) create "tbl_joanna_distinct" and create the Triggers
5) execute
Code: SQL  [Select][+][-]
  1. INSERT INTO tbl_joanna_master
  2. SELECT * FROM tbl_joanna_master_old

6) Go for Lunchbreak  :D :D :D :D

7) Create the ForeignKey-Constraints again
8 ) Check plausibilty in "tbl_joanna_distinct"
9) drop "tbl_joanna_master_old"

Of course: Adjust column- and tablenames to your scenario!
« Last Edit: September 06, 2024, 10:22:12 am by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Zvoni

  • Hero Member
  • *****
  • Posts: 2690
Re: distinct values from very large table
« Reply #16 on: September 06, 2024, 10:17:41 am »
  insert into distinct_table select distinct field1 from master_table;

would work.
No it wouldn't.
It's missing the Counts, nevermind, that Joanna needs the Distinct values across all 8 columns, meaning an 8-way UNION with deduplication, which KABOOM's her Computer
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Zvoni

  • Hero Member
  • *****
  • Posts: 2690
Re: distinct values from very large table
« Reply #17 on: September 06, 2024, 10:48:59 am »
One interesting fact about Triggers:
https://sqlite.org/forum/info/2a650a70a6136537fa2d08789b63f1863ad28a55b871a7e7e9564664c8c7a87e

Triggers are always executed within the context of the CALLING Transaction.
Meaning: If you do an INSERT into your Big-Table, it fires all necessary Triggers with all succeeding "changes" to wherever
But if you rollback your INSERT, it also rolls back all other changes done by the triggers (Updates/Deletes)!!
It also means: If any Statement fails (your original statement, or a Statement from within a Trigger), EVERYTHING fails/gets rolled back

That said: Especially if you use triggers, it's always good practice to use explicit transactions
« Last Edit: September 06, 2024, 10:51:33 am by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Aruna

  • Sr. Member
  • ****
  • Posts: 452
Re: distinct values from very large table
« Reply #18 on: September 06, 2024, 04:27:22 pm »
Hi everyone
I have an sqlite  table with over 73 million rows that has 8 numerical columns. I want to gather the distinct values from the 8 columns and put them into a combobox.
Hello @Joanna, What if you query a smaller portion of the data-set at a time, say 1 million records at a time? Limit Result Sets using LIMIT and OFFSET to fetch smaller chunks of data. Optimize Queries to retrieve only the columns and rows you need. Use Proper Query Filters to reduce the amount of data processed.

The process of getting distinct values for even one column seems to be very slow and consumes a lot of cpu cycles.
Returning very large result sets can be slow, especially if all results need to be processed or transferred. Slow disk I/O, limited memory, or insufficient CPU resources can impact query performance. I am curious @Joanna would it be ok if I request a small subset of your 73 million record whopper? The database structure and any relations, please. I want to see how this is laid out, if I may please.

Is there any other way to do this ?
I would use indexes, fetch only a subset of the 73 million records at a time and paginate Results using LIMIT and OFFSET to fetch data in smaller chunks. ( How much RAM do you have in your system and what kind of processor is it? )
« Last Edit: September 06, 2024, 04:31:45 pm by Aruna »
Debian GNU/Linux 11 (bullseye)
https://pascal.chat/

Joanna

  • Hero Member
  • *****
  • Posts: 1069
Re: distinct values from very large table
« Reply #19 on: September 07, 2024, 02:26:35 am »
Zvoni, when generating data how often do you recommend doing the commits ? I remember that permanently writing to database was more time consuming than just inserting. I think I was doing it every 10,000 rows
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

egsuh

  • Hero Member
  • *****
  • Posts: 1449
Re: distinct values from very large table
« Reply #20 on: September 07, 2024, 02:55:55 am »
I'm sorry that I didn't understand the program correctly at the beginning.

I think extracting distinct values across over all 8 fields (and count of each value) is the first target.
To get values only (not counts), SQL select distinct f1 from... and then selecting distinct values from them again would be the simplest method. And this takes quite long time.

As the whole number of distinct values does not seem large for now, I think we can count it in Lazarus.

Code: Pascal  [Select][+][-]
  1.            valcounts:= TFPGMap<integer, integer>.Create;
  2.            valcounts.sorted := True;
  3.  
  4.            with sqlquery do begin
  5.                sql.text := 'select * from table1';  
  6.                open;
  7.                first;
  8.                while not eof
  9.                    for (f in fields) do begin
  10.                          v := f.asinteger;
  11.                          if valcounts.find(v, j)
  12.                             then inc(valcounts.data[j])
  13.                             else valcounts.add(v, 1);
  14.                    end;
  15.                    next;
  16.                end;
  17.                close;
  18.           end;
  19.  

Conceptually this will give all the distinct values and counts.
           
But in reality, simply opening the sqlquery will stop PC (for several hours at least).

So, I think of opening subsets of table several times, using limit and offset, for example,

           select ... limit 10000 offset 10000


I hope this approach provides some manageable points.

Once you have the values here, then you may add a new table with these values, and triggers Zvoni provided.

silvercoder70

  • New Member
  • *
  • Posts: 48
    • Tim Coates
Re: distinct values from very large table
« Reply #21 on: September 07, 2024, 09:38:32 am »
Putting aside any code related ideas have you tried these configuration settings...

https://phiresky.github.io/blog/2020/sqlite-performance-tuning/
P Plate on FPC | YouTube - https://www.youtube.com/@silvercoder70

Joanna

  • Hero Member
  • *****
  • Posts: 1069
Re: distinct values from very large table
« Reply #22 on: September 07, 2024, 04:53:23 pm »
Thanks for all the information, there is so much to know about sqlite ..
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

Zvoni

  • Hero Member
  • *****
  • Posts: 2690
Re: distinct values from very large table
« Reply #23 on: September 07, 2024, 06:55:09 pm »
Zvoni, when generating data how often do you recommend doing the commits ? I remember that permanently writing to database was more time consuming than just inserting. I think I was doing it every 10,000 rows
It depends, how often data comes in.
Of course, if you get a new insert every 0.1 seconds, it doesn’t make sense to commit after each insert.
It’s something only you can know
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

 

TinyPortal © 2005-2018