How this makes difference at the file storage level (CHAR vs VARCHAR)? Explain please.
AFAIK they differ in returned values to the client, CHAR being padded to its field length and VARCHAR being not. But on a storage level they must accommodate the biggest possible length , which is max number of chars x4 for UTF-8. Is it somehow related to the MVCC?
No, the VARCHAR doesn't need to reserve all the space. It just saves the record with the length of the string +2 characters for length.
It's in the link I gave.
VARCHAR is the basic string type for storing texts of variable length, up to a maximum of 32,765 bytes. The stored structure is equal to the actual size of the data plus 2 bytes where the length of the data is recorded.
https://firebirdsql.org/file/documentation/chunk/en/refdocs/fblangref30/fblangref30-datatypes-chartypes.htmlYes, I increased the number of digits of my character in some of my fields, as VarChar, not as Char.
The main idea was, that rare cases of longer entries should work as well.
Most of my fields are defined as VarChar(2).
I'm not sure if that's very efficient. I would define it as CHAR(2).
VARCHAR(2) is 2 bytes for length and 8 bytes for the characters UTF-8.
While CHAR(2) is just only the 8 bytes.
(BTW. I wouldn't even store that, what was it? region digit?, in a CHAR or VARCHAR, I would make a separate table for it. If it's a region, branche or something, you may want to add something else specific about that entity. And that's where the separate table comes in. Just linked with a BIGINT.
For CHAR(x) you do need to take into account that the string is padded with spaces to the length of 2.
This UTF8 - I thought, this 8 stands for 8 bits = 1 byte?
I thought, my VarChar(2) would take 2 bytes. Is this correct?
Yes and No. UTF-8 does stand for 8 bits encoding but characters can be encoded to multiple bytes.
So while a to z just take 1 byte, other characters (for example Chinese or some accent characters) are encoded in multiple characters.
See an example here:
https://en.wikipedia.org/wiki/UTF-8So £ is saved as $C2 + $A3
code points in the ASCII range (0-127) are represented by a single byte
code points in the range (128-2047) are represented by two bytes
code points in the range (2048-65535) are represented by three bytes
and code points in the range (65536-1114111) are represented by four bytes.
(This may seem like a lot of possible characters, but keep in mind that in Chinese alone, there are 100,000s of characters.)
https://www.freecodecamp.org/news/what-is-utf-8-character-encodingSo UTF-8 will be maximum 4 bytes long and Firebird needs to reserve those bytes for a (per) CHAR.
And just to make it more confusing... it's possible to reach even more characters by combining the code points.
A "character" can take more than 4 bytes because it is made of more than one code point. For instance a national flag character takes 8 bytes since it is "constructed from a pair of Unicode scalar values" both from outside the BMP.
So if you save 1 national flag character, both code points of your CHAR(2) will be occupied and you can't add anything more (you will get a string too long error when saving).