Recent

Author Topic: [solved] Best practise - foreign key versus own field  (Read 8218 times)

Zvoni

  • Hero Member
  • *****
  • Posts: 2963
Re: Best practise - foreign key versus own field
« Reply #30 on: January 30, 2024, 03:21:19 pm »
Is it unusual for one table to be used to reference alothrr table though?
Like in my project I have a table in dbgrid that serves sort of as a table of contents for viewing what is in larger table.
For instance something to look up the history of multiple events which happened on certain date..
Not unsual at all. On the contrary: It's the main principle behind "Normalization" --> "Don't store Data twice (or more times)"

Imagine a Database having the Addresses of all Citizens of China, and you want to store the Adresses of Citizens of Shanghai, China (Shanghai has some 25 Million Citizens)
If you would have only one table with Fields like (very simplified!!)
First Name, Last Name, Street, ZIP-Code, City

What's the Consequence?
Exactly, you would have 25 Million times the word/text/string "Shanghai" in that table
Since in this era Unicode is common it would mean, that each Entry of "Shanghai" would take up 16 Bytes (or more if you use UTF32)
25,000,000 x 16 Bytes = 400,000,000 Bytes (400MB) (DON'T START ON CONVERSION OF BYTE TO MB!! THIS IS JUST TO ILLUSTRATE)
Result: You "wasted" 399,999,984 Bytes

In a relational model you would have a Table "City" which has only one single Record of "Shanghai" with an additional Field "ID" which would be an Auto_ID Integer Primary Key.
That additional Field "ID" takes up 4 (or 8 ) Bytes, depending which Integer-Type you use.
So for this entry, we need 20 Bytes (4 Bytes for the ID, 16 Bytes for "Shanghai")
In a second table you would actually have the Addresses of the People with Fields like
First Name, Last Name, Address, ZIP-Code, ForeignKeyToCity_ID
And in that last Field "ForeignKeyToCity_ID" you store the Value of the ID you get from "City", which is again only 4 (or 8 ) Bytes
Result: a "save" of 300,000,000 bytes (4 Bytes instead of 16)

Yes, i'm aware that this can be broken down even further (ZIP-Code in its own table), but as i said: This is to illustrate, WHY you (want) to have/use Keys referencing other Tables
« Last Edit: January 30, 2024, 03:25:46 pm by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Joanna from IRC

  • Hero Member
  • *****
  • Posts: 1380
Re: Best practise - foreign key versus own field
« Reply #31 on: January 30, 2024, 03:56:07 pm »
If you have table for every city that would be a lot of cities and what if someone moves to another city? How about a table for zip codes and their cities then only use zip codes instead of city name for addresses?
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

rvk

  • Hero Member
  • *****
  • Posts: 6714
Re: Best practise - foreign key versus own field
« Reply #32 on: January 30, 2024, 04:07:19 pm »
If you have table for every city that would be a lot of cities and what if someone moves to another city?
No no, you don't have a table for every city. You have a tables WITH ALL the cities.

Not normalized:

CLIENT:
ID bigint
NAME varchar(50)
STREET varchar(50)
POSTALCODE varchar(50)
CITY varchar(50)
COUNTRY varchar(50)

The CITY and COUNTRY takes up much space... especially if most of your clients live in the same city and country it's better to normalize.
https://en.wikipedia.org/wiki/Database_normalization

CLIENT:
ID bigint
STREET varchar(50)
POSTALCODE varchar(50)
CITY_ID bigint foreign key to city
COUNTRY bigint foreign key to country

CITY:
ID bigint
NAME varchar(50)

COUNTRY:
ID bigint
NAME varchar(50)

Here the each city name only take up 1x varchar(50) PER city + per client a pointer to that city per client. But's that way less then repeating every city per client.

Now when the clients moves to another city you can just lookup the city in the table. If it doesn't exist you can add it and use the foreign key to point to its ID.

That's how normalization works (which is a normal stage of database design).

Now... you can go too far with this and sometimes it defeats its purpose.

I myself, DO have the city name in my table for client (because it changes a lot per client). BUT I have a separate table per country. This is because I need a table for country anyway to put the specific VAT percentages, country export code, etc. per country. And because I already have that table I can just as well use it.




Zvoni

  • Hero Member
  • *****
  • Posts: 2963
Re: Best practise - foreign key versus own field
« Reply #33 on: January 30, 2024, 04:30:48 pm »
If you have table for every city that would be a lot of cities and what if someone moves to another city? How about a table for zip codes and their cities then only use zip codes instead of city name for addresses?
Joanna, my example was just to ILLUSTRATE.
As rvk pointed out: This could be broken down even further, nevermind running into "m:m"-scenarios,
but you will reach a point, Normalization defeats itself, specifically if a ForeignKey takes up the same amount of space as the actual Data you want to reference.
That's why we were all so puzzled why someone would use a Date-Field  (which is basically just an Integer!) in one table as a reference-Field for a ForeignKey in another table, because "space"-wise you gain nothing
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

alpine

  • Hero Member
  • *****
  • Posts: 1379
Re: Best practise - foreign key versus own field
« Reply #34 on: January 30, 2024, 05:10:58 pm »
Is it unusual for one table to be used to reference alothrr table though?
Оn the contrary, it is the normal way.

Like in my project I have a table in dbgrid that serves sort of as a table of contents for viewing what is in larger table.
For instance something to look up the history of multiple events which happened on certain date..
Splitting tables in a master-detail ones is actually a result of database Normalization and that is the process of structuring the database in a way that prevents insertion, update and deletion anomalies, see the above link. That is the primary aim of it, not the size reduction. The reduction comes as a result of normalization. Most of the experienced database designers even make that normalization intuitively.

Your example with the larger-smaller (consolidated) table is not an example of normalized relation, since the data into the smaller table is actually contained into the larger one. Redundancy isn't allowed.

Sometimes the opposite process exists, de-normalization, but that is even more advanced topic as it looks like you're not yet familiar with the former one.
« Last Edit: January 30, 2024, 06:14:11 pm by alpine »
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

Nicole

  • Hero Member
  • *****
  • Posts: 1095
Re: Best practise - foreign key versus own field
« Reply #35 on: January 30, 2024, 07:52:00 pm »
The migration is complete. The database is quick like hell.
Space? - about 60 percent larger than before.


Zvoni

  • Hero Member
  • *****
  • Posts: 2963
Re: Best practise - foreign key versus own field
« Reply #36 on: January 30, 2024, 07:59:50 pm »
The migration is complete. The database is quick like hell.
Space? - about 60 percent larger than before.
Then there is still something wrong with your design, but at least good news on performance
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

rvk

  • Hero Member
  • *****
  • Posts: 6714
Re: Best practise - foreign key versus own field
« Reply #37 on: January 30, 2024, 08:57:05 pm »
The migration is complete. The database is quick like hell.
Space? - about 60 percent larger than before.
Then there is still something wrong with your design, but at least good news on performance
That does indeed sound strange but we've never seen anything about the database. So as long as this stays the same, we can't say anything for certain.

Maybe the old database didn't have any indexes etc.
But there can also be something terribly wrong with the new DB.
(Huge default page, inefficient table design etc.)

Anyway... With gstat you can get a lot of statistics about the DB.
https://firebirdsql.org/file/documentation/html/en/firebirddocs/gstat/firebird-gstat.html

Joanna from IRC

  • Hero Member
  • *****
  • Posts: 1380
Re: Best practise - foreign key versus own field
« Reply #38 on: January 30, 2024, 11:41:57 pm »
Maybe her new database is very well indexed and the indices take a lot of space ?
 I’ve never thought about converting redundant strings into one string with numerical code before.

About using the date field in two tables , it could have different purposes in the different tables. In my case selecting the date in one dbgrid viewing the table1 will display a dataset in a different dbgrid from the Much larger table with everything related to the date in smaller table.

Just because both contain dates does not mean that the dates should be combined into one place because there is different information attached to the dates in table1.

There is something that concerns me though. I’m storing the date time as a float. Does that take up too much space?
« Last Edit: January 30, 2024, 11:45:31 pm by Joanna »
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

rvk

  • Hero Member
  • *****
  • Posts: 6714
Re: Best practise - foreign key versus own field
« Reply #39 on: January 30, 2024, 11:53:39 pm »
Just because both contain dates does not mean that the dates should be combined into one place because there is different information attached to the dates in table1.
Exactly. And that last piece of information (having different information attached to that date field) was not present in the original question. Just a table with a single date, nothing else. In your case (date with the extra info) it's even recommended to create its own table.

But with both dates in their own table it might not even be necessary to link them together with a foreign key. They are both their own identity (dates) and you can still use them when JOINing them together (you might want to add an index on them in that case but for that a constraint/fk isn't strictly needed). Unless you need to check that the date exists in both tables, then you can add the constraint (i.e. foreign key).

rvk

  • Hero Member
  • *****
  • Posts: 6714
Re: Best practise - foreign key versus own field
« Reply #40 on: January 30, 2024, 11:58:44 pm »
There is something that concerns me though. I’m storing the date time as a float. Does that take up too much space?
In Firebird?

DATE is 32 bits. FLOAT is also 32 bits.
https://firebirdsql.org/file/documentation/chunk/en/refdocs/fblangref40/fblangref40-datatypes.html

But why did you use float?

Edit: O wait. Date time. Did you mean TIMESTAMP?


Joanna from IRC

  • Hero Member
  • *****
  • Posts: 1380
Re: Best practise - foreign key versus own field
« Reply #41 on: January 31, 2024, 12:05:45 am »
I use sqlite. It says that tdatetime is also a float so I decided it would be easier to store as float. I’m storing dates for both noon and midnight so two different timestamps per date.
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

rvk

  • Hero Member
  • *****
  • Posts: 6714
Re: Best practise - foreign key versus own field
« Reply #42 on: January 31, 2024, 12:21:51 am »
I use sqlite. It says that tdatetime is also a float so I decided it would be easier to store as float. I’m storing dates for both noon and midnight so two different timestamps per date.
That's a whole different story.
SQLite has its own datatypes and translates those to a few types.
https://www.sqlite.org/datatype3.html

SQLite is a bit of an odd duck regarding datatypes.

DATE can be stored as different types.
The fields there are actually typeless. So you can store anything in every field. It depends on the way you execute the insert. It's completely different from standard SQL in that way.

https://www.sqlite.org/datatypes.html

It's useful for a lot of things but sometimes you do need to remind yourself about some of its quirks ;)
https://www.sqlite.org/quirks.html

BTW it depends how you do the date to float. Are you converting it in code and storing it? Float is not an extract datatype so I wouldn't trust it with a datetime for milliseconds. Days and time to servings should be fine I think.

« Last Edit: January 31, 2024, 12:28:18 am by rvk »

Joanna from IRC

  • Hero Member
  • *****
  • Posts: 1380
Re: Best practise - foreign key versus own field
« Reply #43 on: January 31, 2024, 12:48:54 am »
To represent noon, tdatetime puts a .5 at end . I’m not sure how other times are represented. I’m not doing anything precise with milliseconds or anything like that
« Last Edit: January 31, 2024, 12:54:13 am by Joanna »
✨ 🙋🏻‍♀️ More Pascal enthusiasts are needed on IRC .. https://libera.chat/guides/ IRC.LIBERA.CHAT  Ports [6667 plaintext ] or [6697 secure] channel #fpc  #pascal Please private Message me if you have any questions or need assistance. 💁🏻‍♀️

Zvoni

  • Hero Member
  • *****
  • Posts: 2963
Re: Best practise - foreign key versus own field
« Reply #44 on: January 31, 2024, 08:27:27 am »
To represent noon, tdatetime puts a .5 at end . I’m not sure how other times are represented. I’m not doing anything precise with milliseconds or anything like that

Other times are represented like this:
xxx.75 --> 6 PM / 18:00h ("Three quarters of the Day")
xxx.25 --> 6 AM / 06:00h ("One quarter of the Day")
and so on....

Then use TEXT DataType in SQLite, and store everything in ISO-Format as a String and be done with it. SQLite understands datetime-strings natively.
Upside: If for whatever reason you have to dive into the Database/Table directly with a Management-Program (e.g. DB Browser for SQLite) you get "readable" raw dates.

Fetching those dates is easy enough with
Code: SQL  [Select][+][-]
  1. SELECT DATE(MyDateTimeAsText) AS MyDateTime FROM MyTable WHERE .....
and an Entry for MyDateTimeAsText looks like
'2024-01-31 08:27:25'

and even sorting by the raw DateTime-String works
...ORDER BY MyDateTimeAsText

I would be careful with floating-point Datatypes for DateTimes.
as rvk pointed out, Float is 4 Bytes, a TDateTime is actually an alias for Double, which is 8 Bytes

That said: I'm aware, that a DateTime as String takes up more space as a simple Float (or REAL how it's called in SQLite), but you have to weigh the Pros and Cons of it
« Last Edit: January 31, 2024, 08:33:22 am by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

 

TinyPortal © 2005-2018