[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ProgSoc] NULL values in primary keys in MySQL
Myles Byrne wrote:
If we’d used the ISBN as the primary key in a table of books, we’d have to
go through and update each row to reflect this change. But then we’d have
another problem. There’ll be other tables in the database that reference
rows in the books table via the primary key. We can’t change the key in
the
books table unless we first go through and update all of these references.
And that will involve dropping foreign key constraints, updating tables,
updating the books table, and finally reestablishing the constraints.
All in all, something of a pain.
This is the red-herring I referred to RE: Jedd's question about
immutable data. The above glosses over real (as opposed to imagined)
complexity, possibly lulling you into a false sense of security.
(Although I use surrogate keys for much the same reason, that is, so
that I can have a 'generic' keying infrastructure that is not at the
mercy of domain specific data-types/formats. Changing values of natural
keys is a separate matter, though it is also related to surrogate vs.
natural keys.)
Adding the indirection of surrogate keys does make it easier to modify
the natural key data, but that only makes it easier for you to
completely bollocks up your database, because if you're mutating the
data in candidate keys (i.e. the 'natural keys', likely being
externalised and used as keys in the problem domain, e.g. "Hello,
welcome to Acme Inc. Can I please have your account number sir?") then
it might take you a little while before you start to begin feeling the
world of pain you've unleashed upon yourself, but you will eventually
feel it (someone will come to your office and ask you to regenerate last
months report, or they'll come and enquire why having just re-run the
end-of-year report the one they see now differs from the one that the
CEO released to share-holders last week, etc.).
Also, if you let surrogate keys 'leak' out of the system into domain
usage then you will create a problem. E.g. "Can I please have your
Account Number sir? ABC. No, I'm sorry sir, that looks like an Account
Code, you should have your Account Number on the top of the TPS report.
Oh, 456. Um, sir, I think you've given me your Account Id, I need your
Account Number." etc. (On a related note, I recently signed up with
Melbourne IT for a simple SMTP hosting package, and by the time I was
finished I'd been given at least 4 usernames.)
The book I referred to provided definitions for these orthogonal 'types'
of of things [1]. If you think surrogate keys make changing natural keys
easy, and that's important to you, then presumably you care because
you're mutating natural keys, in which case you'd better understand the
difference between all of these concepts, or else it's time to start
reading [2]. :P
Some databases can afford to re-write history, but most serious apps
need to know things like "this person's last name used to be Smith, but
as of the 12th it's Bloggs, according to our call-centre operator Mary
who logged a Change Of Details form on behalf of the customer last
week", or "account number 123 was active on the 23rd when this
transaction occurred, but de-activated on the 30th by Joe in Operations
because he found that the person whose last name used to be Smith
fraudulently claimed they'd been married and changed address", etc.
Accountants learnt the virtues of double-entry book-keeping and
immutable data a long time ago. Time-series data and 'version control
systems' can be difficult, but if you need that functionality, then you
need to tackle the complexity.
The real problem with mutations to natural keys is that usually changing
the value of the natural key changes the semantics of that entity in the
problem domain, meaning, often, that entity no longer represents what it
used to represent, meaning that any historical facts referring to the
'old' meaning now pick up the 'new' meaning, which might turn fact into
fiction. For example, say I have a schema like this:
CATEGORY( CAT_ID, NAME )
QUESTION( QUE_ID, CAT_ID, TEXT )
ANSWER( ANS_ID, USER_ID, QUE_ID, VALUE )
Say I have some data like this:
CAT_ID NAME
------ ----
1 Category A
QUE_ID CAT_ID TEXT
------ ------ ----
1 1 What is your favourite colour?
ANS_ID USER_ID QUE_ID VALUE
------ ------- ------ -----
1 123 1 Blue
2 456 1 Red
We're using a surrogate key QUE_ID on the QUESTION table, though {
CAT_ID, TEXT } is a natural key, thus a candidate key.
So, seeing as though it's now *so easy* for me to do so, why wouldn't I
want to do something like this..?
UPDATE QUESTION
SET ( TEXT = 'Is blue your favourite colour?' )
WHERE QUE_ID = 1
[1]
Temporal data-types:
- Instant
- Interval
- Period
Temporal domains:
- User-defined time
- Valid time
- Transaction time
Temporal statements:
- Current
- Sequenced
- Non-sequenced
[2] http://www.cs.arizona.edu/people/rts/tdbbook.pdf
-
You are subscribed to the progsoc mailing list. To unsubscribe, send a
message containing "unsubscribe" to progsoc-request@xxxxxxxxxxxxxxxxxxx
If you are having trouble, ask owner-progsoc@xxxxxxxxxxxxxxxxxx for help.