Also sometimes what appears to be the natural primary key, from the point of view of the data model and retrieval use cases, turns out to be the wrong primary key from the point of view of the storage model and how the data is generated.
A real world example from when I worked for a large bank (who shall remain anonymous) who were creating an online banking platform for their commercial banking customers. Of course this contained a "Transactions" table (unsurprisingly the largest table in the db).
In terms of the data model, the natural key for the transactions was, account_id, date, transaction_sequence_in_day (henceforth sequence for concision). In terms of the use case, it was pretty clear that the customers using the online banking app would want to retrieve their transactions in relation to a particular account, sorted by date and sequence in the day. So they created the cluster key based on this natural key, account_id, date, sequence, in that order. In testing the retrieval performance was great.
So, what's the problem?
The transactions were uploaded from the mainframe every night. So every 24 hours you added a days worth of transactions, for all accounts. Result? Massive fragmentation. To the extent that in production they found they had to rebuild the clustered index 2-3 times a week (!) to maintain acceptable performance. A total nightmare.
Given the storage model and how the data was uploaded, they really needed to change the order of the natural key to - date, account_id, sequence. Then the fragmentation created by the nightly upload would have disappeared (relatively). They could have added a non-clustered index on account_id if necessary.
Although, given the wideness of the key, there's an argument to be made that it might have been a good candidate for a surrogate key (internal) based, essentially on upload sequence. Even if you still kept the date, account_id, sequence composite key as the cluster key (with the increased size penalties for non-clustered indexes) such a surrogate key could be handy for partitioning and archiving purposes for future data management.