Difference between Primary Index, Secondary Index and Clustered Index

Updated Jul 31, 2024

Primary Index vs. Secondary Index vs. Clustered Index

Indexing is a data structure technique to properly retrieve records from the database files based on some attributes on which the indexing has been done. Here's a comprehensive look at the different types of indexing:

Primary Index

Definition: The primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally the primary key of the relation.

Characteristics:

Unique: Makes sure every value in the index is distinct.
Sparse Index: Each data block has a single index entry.

Example:

Consider a table with a primary key column EmployeeID. The primary index will sort the table based on EmployeeID.

Advantages:

Quick access to documents.
Preserves the main key's uniqueness.

Disadvantages:

Primary index can only be created on unique keys.
Slightly slower for insertions and deletions as the index needs to be updated.

Secondary Index

Definition: A secondary index can be created on any column, including non-key columns. It can be on a candidate key (unique) or non-key field (with duplicate values).

Characteristics:

Non-Unique: Allows the index to contain duplicate values.
Dense Index: Contains an index entry for each table record.

Example:

If a table has a LastName column, a secondary index on LastName will create an entry for each record, allowing quick searches by last name.

Advantages:

Enhances non-primary key column query performance.
Flexible indexing across several columns.

Disadvantages:

Needs more storage space.
Write operations (insert, update, delete) may become slower as a result.

Clustering Index

Definition: A clustering index is defined on an ordered data file. The data file is ordered on a non-key field.

Characteristics:

Non-Unique Key: Indexes non-unique fields in which a clustering field has arranged the data.
Physical Data Order: The order in which the index and actual data rows are stored is the same.

Example:

In a table with a Department column, a clustering index on Department would arrange the table so that all employees in the same department are stored together.

Advantages:

Effective for sorting and range queries.
Minimizes I/O operations by grouping records that are similar together.

Disadvantages:

Each table may only have one clustering index created.
Not appropriate for columns with lots of distinct values.

Comparison Table: Primary Index vs Secondary Index vs Clustered Index

Feature	Primary Index	Secondary Index	Clustering Index
Uniqueness	Unique	Unique or Non-Unique	Non-Unique
Data Order	Key Field Order	Not Ordered	Non-Key Field Order
Type of Index	Sparse	Dense	Physical Data Order
Creation	Automatic with PK	Manual	Manual
Storage Requirements	Less	More	Moderate
Use Case	Fast PK lookups	Optimizing non-PK queries	Range queries, sorting

Practical Examples

Primary Index

Scenario: Consider a table Employees with columns EmployeeID (primary key), FirstName, and LastName.

When a query searches for a specific employee using EmployeeID, the primary index facilitates fast retrieval because the table is ordered by EmployeeID.

Example Query: SELECT * FROM Employees WHERE EmployeeID = 123;

Secondary Index

Scenario: The same Employees the table has a column LastName.

To optimize queries searching by LastName, create a secondary index on this column.

Example Query: SELECT * FROM Employees WHERE LastName = 'Smith';

Clustering Index

Scenario: In a table Employees, there is a column Department.

A clustering index on Department orders the actual data rows by the department.

Example Query: SELECT * FROM Employees WHERE Department = 'Sales';

Conclusion

In order to maximize query performance in databases, indexes are essential. Comprehending the distinctions among primary, secondary, and clustering indexes facilitates the development of effective and adaptable database architectures. Utilizing the right kind of index for your particular requirements can greatly improve the speed at which data is retrieved as well as the overall functionality of your application.

Comments

03 Jun 2016 3:59 pm Abhijeet Mudgal Helpful Answer

Indexing is a data structure technique to properly retrieve records from the database files based on some attributes on which the indexing has been done.
Indexing has following types:
Primary Index − Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally the primary key of the relation.
Secondary Index − Secondary index may be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.