Forum > General

What is Indexing?

(1/2) > >>

OC DelGuy:
What is Indexing?
What exactly is happening that makes an indexed file faster to search?

Let's start with that and then I'll probably have a million questions after I read some of your comments.   :o   :D :D   8-)

Imagine a file with some thousands of (unique) strings in no particular order, each string in its own line

Now to find something in that file you only have one way: running through it sequentially (one line after the next).

Now, if you have an index on that file, the index being built on that "attribute" (that string for example), your search wouldn't run sequentially through the file, but through the index, which is sorted!! Usually, an index is a binary tree, so the search would be way more faster.
When you find your string in the index, the search returns the Position of that string in your original file, which you can use to retrieve that information

In laymans terms: Basically, an index is a Lookup-table to get the "Position" of the information in your original file

Note: An Index only makes sense if the Attribute you want to "index" is unique!

An index is basically the important bit of your lines/records. Like, for a record, the number or name. And a pointer to where that record is. If you have more than one important part, you need multiple indexes.

To search stuff in a sorted index, you start with the one in the middle. If what you are looking for comes before that, you take the one at a quarter, etc. If that still takes too much time, you can make an index of the index, where you store, say, only the first character.


--- Quote from: SymbolicFrank on March 29, 2023, 09:11:30 am ---If that still takes too much time, ......

--- End quote ---
Considering i do have such a file in production use with over a 100K records, my searches take under 1 second to find a record using a sorted index-approach


--- Quote from: OC DelGuy on March 29, 2023, 04:17:17 am ---What is Indexing?
What exactly is happening that makes an indexed file faster to search?

--- End quote ---

Generally, the "indexing" means maintaining an additional data structure (or maintaining the file in a particular condition) in order to reduce the algorithmic complexity of the search. See

As Zvony mentioned, searching sequentially through your file will have a complexity of  O(n), which considering the high cost of file I/O operations is not the best thing you can achieve.

OTOH if you maintain the file ordered by one of the fields then you can achieve complexity of O(log n), which is fine, but the cost of maintaining file in order will be unbearable, even more if you want to search by several fields.

For example, in the RDBMS systems there is one field chosen for a primary key (may be segmented) and the file/table is maintained into an ordered (B-tree) structure according to that key. That ensures search complexity with something near to the O(log n). 

The additional "Indexes" on other fields are actually a separate tables consisting of pairs: <value of the field, physical address of the row>, they are also B-trees, the search into the index is with the same ~O(log n) complexity, and you can seek on the actual row directly by its physical address.   

There are also other data structures besides the B-tree that can be used for the purpose, but the B-trees are the most common. 


[0] Message Index

[#] Next page

Go to full version