Friday, January 8. 2010OK, I'm joining SSSKATrackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
SSSKA, that's awesome.
I think a lot of naive database developers get it in their head that "primary key" means a serial integer column. I tell them a primary key is a constraint, not a column.
The following excerpt
"When you create a requisition, for example, SAP creates an electronic document for that particular transaction. SAP gives the transaction a document number and adds the document to the transaction data that is already in the system. Whenever you complete a transaction in SAP, that is, when you create, change, or print a document in SAP, this document number appears at the bottom of the screen. Tip Always write down the SAP document number. If you need to view a document in the future, using the document number is the easiest way to find to it." from http://web.mit.edu/sapr3/docs/webdocs/getstarted/gsOVERVIEW.html gives me the impression that their every table has a column to store serial numbers as the primary keys. Do I misunderstand that system or am I ignorant of the benefits/power of such design?
There is nothing wrong with using surrogate keys per say. However, many people mistakenly think that surrogate keys are the HOLY GRAIL of normalized database schema design.
Only later do they learn from experience that surrogate key do not guarantee a normalized database.
it all depends on what is your primary key. If it's varchar, you are better off with another serial. It is easier on index. So, 'junction' tables, connecting PKs serials is nothing wrong. As long as it is done for a reason!.
In this particular case, the related keys were integers. And if the related key were a varchar then that would mean that the FK on the foreign table was a varchar. If it's good enough in one place it's good enough in another.
I don't buy this idea that all PKs must be simple integers. I have seen complex PKs and PKs with varchars work perfectly well, and on large databases. There is a case for surrogate key use, but there are definitely places where they don't belong, and this was one of them.
I have been very opposed to the surrogate key mania, but as I am using ORMs more and more, it just becomes more and more convinient to just slap an auto gen PK onto all models. Of course a proper ORM can also deal with string PKs and multi column PKs, but its just easier to not bother.
when you join large number of tables, using variable length types as keys hurts performance. Especially if it is longer than 8 bytes. And usually varchar keys are. I've been through that. Also, I've been through joining on multiple values, cos they were PK. Doesn't work fast either.
There's good reason people fallback on surrogate keys. And I don't understand why would you consider that malpractice. The only reason people are usually afraid of doing it, is because they can't normalize tables, or because they don't know how to join tables. In either case the queries with multiple subselects obviously don't perform well. That's why I always put so much emphasis on using JOINs instead of subselects, especially when dealing with postgresql. Please do tests, and present them with results.
You are missing the point. Again. I am not against use of surrogate keys. I am against mis-use of surrogate keys, and this was a very clear instance. This wasn't somebody falling back to surrogate keys. It was somebody using them stupidly, mindlessly, unnecessarily.
If people think "Oh, I need a PK, so I'll just add a serial field," that is malpractice in my view. If they use one after proper consideration that is quite different.
It is true that there are appropriate times to use a surrogate key versus a natural key, but I never hear anyone give a concrete reason for choosing one over the other.
So I'll offer a few: 1. A meaningful difference between the two is that with a surrogate, information is *deferred*. With a natural key, information is *immediate*. Surrogates act much as a sort of template placeholder, where it stands in for the real value (hence, the name "surrogate"). BUT, it must be used as if there is a unique set of values it replaces, otherwise it's just a meaningless number that does little more than number the rows in a table without enforcing real-life information constraints. Here's a practical trade-off between immediately having the value versus deferring the value: READ PERFORMANCE VERSUS WRITE PERFORMANCE. You can keep consistency with natural keys if you are educated enough to know about cascading updates and such. But natural keys with cascading updates hurts write performance for the sake of helping read performance, because changes have to be propagated. Surrogate keys hurt reads because there are more tables and less help to the query planner. In most cases, it's better to use natural keys because genuine keys change so rarely that you don't worry about write performance, only read performance. 2. If you foreign key to a natural key in another table, you are saying: "this set of attributes shared by the thing this table row represents must also exist as a set of attributes in the other table". If you instead foreign key to a surrogate key, you are saying: "make sure that some row with a meaningless number in this table matches some row in the other table that has the same meaningless number, BUT WE DON'T KNOW OR CARE IF THE REAL ATTRIBUTES MATCH". It's NOT the attributes that are checked by the foreign key, it's the meaningless number. That's a huge problem with surrogates. In order to ensure you protect the business logic, you have to also add triggers to make up for the fact that the foreign key didn't help you much. Because people are rarely dilligent about this stuff, they often simply use surrogates alone. The result is the worst of all worlds: bad performance AND bad data. 3. Much of the problem with the key-choice debate is that SQL talks about "foreign keys" and "primary keys", but it requires the DBA to be educated what is IMPLICIT about keys: surrogates, natural keys, alternate keys, candidate keys, etc. Because it's not explicit in SQL or DBMS documentation, people don't know this stuff exists and must require careful thought and planning.
I attempted to define surrogate key abuse a little more precisely here:
http://thoughts.j-davis.com/2007/12/11/terminology-confusion/ |
My Links etcCalendar
QuicksearchArchivesCategoriesBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||