If you have a Users table with 1 billion rows, and you execute SELECT * FROM Users WHERE id = 500;, how does the database find that exact row?
Without an index, the DBMS must perform a Full Table Scan. It reads row 1 from the hard disk, checks the ID, reads row 2, checks the ID, and so on. Reading 1 billion rows from a mechanical hard disk takes several minutes.
An Index is an auxiliary data structure (usually a Tree or a Hash Table) used to speed up the retrieval of records in response to certain search conditions. It allows the DBMS to find the row in milliseconds.
Think of an index in a database exactly like the index at the back of a textbook.
Algorithm: Page 412.A database index works the same way. It stores the search key (e.g., the User ID) and a physical pointer to the exact disk block where that entire row is stored.
In an ordered index, index entries are stored sorted on the search key value.
A primary index is an ordered index whose search key also defines the sequential order of the file.
PRIMARY KEY column of a table.If the data file is sequentially ordered on a non-key attribute (an attribute that does not have distinct values, like Department_ID), the index is called a clustering index.
What if the table is physically sorted by ID, but you frequently search by Email? You can create a Secondary Index on the Email column.
Email, but the actual data blocks on the disk remain sorted by ID.The Hidden Cost of Indexing: Why not just create an index on every single column? Because every time you INSERT, UPDATE, or DELETE a row, the DBMS must update the table AND every single index attached to it. Adding too many indices makes read queries blazing fast, but makes write operations dreadfully slow.