Recent

Author Topic: What is Indexing?  (Read 1015 times)

OC DelGuy

  • Full Member
  • ***
  • Posts: 121
What is Indexing?
« on: March 29, 2023, 04:17:17 am »
What is Indexing?
What exactly is happening that makes an indexed file faster to search?

Let's start with that and then I'll probably have a million questions after I read some of your comments.   :o   :D :D   8-)
Free Pascal Lazarus Version #: 2.2.4
Date: 24 SEP 2022
FPC Version: 3.2.2
Revision: Lazarus_2_2_4
x86_64-win64-win32/win64

Zvoni

  • Hero Member
  • *****
  • Posts: 2317
Re: What is Indexing?
« Reply #1 on: March 29, 2023, 08:22:13 am »
Imagine a file with some thousands of (unique) strings in no particular order, each string in its own line

Now to find something in that file you only have one way: running through it sequentially (one line after the next).

Now, if you have an index on that file, the index being built on that "attribute" (that string for example), your search wouldn't run sequentially through the file, but through the index, which is sorted!! Usually, an index is a binary tree, so the search would be way more faster.
When you find your string in the index, the search returns the Position of that string in your original file, which you can use to retrieve that information

In laymans terms: Basically, an index is a Lookup-table to get the "Position" of the information in your original file

Note: An Index only makes sense if the Attribute you want to "index" is unique!
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: What is Indexing?
« Reply #2 on: March 29, 2023, 09:11:30 am »
An index is basically the important bit of your lines/records. Like, for a record, the number or name. And a pointer to where that record is. If you have more than one important part, you need multiple indexes.

To search stuff in a sorted index, you start with the one in the middle. If what you are looking for comes before that, you take the one at a quarter, etc. If that still takes too much time, you can make an index of the index, where you store, say, only the first character.

Zvoni

  • Hero Member
  • *****
  • Posts: 2317
Re: What is Indexing?
« Reply #3 on: March 29, 2023, 12:40:59 pm »
If that still takes too much time, ......
Considering i do have such a file in production use with over a 100K records, my searches take under 1 second to find a record using a sorted index-approach
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

alpine

  • Hero Member
  • *****
  • Posts: 1038
Re: What is Indexing?
« Reply #4 on: March 29, 2023, 03:11:29 pm »
What is Indexing?
What exactly is happening that makes an indexed file faster to search?

Generally, the "indexing" means maintaining an additional data structure (or maintaining the file in a particular condition) in order to reduce the algorithmic complexity of the search. See https://www.geeksforgeeks.org/analysis-algorithms-big-o-analysis/

As Zvony mentioned, searching sequentially through your file will have a complexity of  O(n), which considering the high cost of file I/O operations is not the best thing you can achieve.

OTOH if you maintain the file ordered by one of the fields then you can achieve complexity of O(log n), which is fine, but the cost of maintaining file in order will be unbearable, even more if you want to search by several fields.

For example, in the RDBMS systems there is one field chosen for a primary key (may be segmented) and the file/table is maintained into an ordered (B-tree) structure according to that key. That ensures search complexity with something near to the O(log n). 

The additional "Indexes" on other fields are actually a separate tables consisting of pairs: <value of the field, physical address of the row>, they are also B-trees, the search into the index is with the same ~O(log n) complexity, and you can seek on the actual row directly by its physical address.   

There are also other data structures besides the B-tree that can be used for the purpose, but the B-trees are the most common. 
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

eny

  • Hero Member
  • *****
  • Posts: 1634
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

 

TinyPortal © 2005-2018