Primary key looks similar to the clustered index but it’s not the same.
Primary key is the logical concept. Clustered index is the physical concept. By default SQL Server creates the clustered index on the table primary key. But this is default behavior only
How to choose correct clustered index? Obviously it would be beneficial for the system if clustered index covers most frequent/important queries. Unfortunately in real life it’s not always possible. Although there are 3 criteria you need to use. Those criteria are straightforward if you remember the structure of the indexes. I will post 2 old images here:
So what are the criteria?
- Clustered index needs to be unique. If you don’t define the clustered index as unique, SQL Server adds 4 bytes hidden “uniquifier” to the row. This is the method that helps SQL Server to distinguish between the rows with the same clustered index values (think about non-clustered index rows for example). If you need to create clustered index on non-unique column(s), I would suggest to add another identity column as the last column in the index. Same storage size but it gives you better control. You will be able to update row via clustered index values for example.
- Clustered index needs to be static. Remember, clustered index controls actual data sorting order. If you change it, SQL Server needs to move the row to the different place in the table. Besides that it needs to update all non-clustered index rows with the new clustered index values. Very expensive operations
- Clustered index needs to be narrow. Again, it exists in the every non-clustered leaf (and for non-unique indexes non-leaf) rows. Large clustered index means large non-clustered index value -> less rows per page -> more data pages -> more IO operations. So less efficient non-clustered indexes. And more storage space, of course
Those 3 factors are important. You have to keep them in mind when you design the table.
Update (2011-11-24). I covered Unifuifiers in much greater depth in that post