[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ProgSoc] NULL values in primary keys in MySQL



jedd wrote:
 First -- GUID's main benefit are when generating client-side,
 right?

Right. The other benefit is that they are homogeneous (and therefore readily generalisable).


 On a web app where everything comes out of the server,
 then auto-increment, or some other way of creating unique (surrogate)
 keys are fine, yeah?

A web-application is a *client* of your database. Or, it might be a client of an application server; in which case the application server would be the client of your database.


The point is that when I used the phrase 'client-side', I wasn't referring to physical architecture. You use GUIDs so that you can create new entities (i.e. 'objects') and then arrange their relationship on your web server. Then, at the very last moment, when everything is ready to go, you begin a serializable transaction against your database and insert your new data. I'll explain:

Say you need to run through the business process (use case, whatever; pick your terminology. (There should be a TLA for that... PYT.)) "Register New Account". There will be some role authorised to conduct this process, and from your schema and with a superficial understanding of your requirements, you might have a business document something like this:

 Account Registration Form (TPS-37B)
 ===================================

 Customer Information
 --------------------

 First Name: _______________________
 Initial:    _______________________
 Last Name:  _______________________
 (blah)

 Residential Address
 -------------------

 (blah)

 Postal Address
 --------------

 (blah)

 Contact Details
 ---------------

 (blah)

 Account Details
 ---------------

 Joint Account:   [ ] (Attach Joint Holder's Details TPS-42A)
 Opening Balance: $ ________ . ______


So, your application server (web server) would need to collect all the fields for this document. Then, when they were all ready, it would need to submit that document to the service which handles the "Register New Account" business process (that 'service' might simply be a 'function' in your PHP code (I know sfa about PHP)). Then that service would validate all the information (and return error if invalid), then it would arrange a set of inserts for the database. We'd need a new account record, at least one new member record, and an opening transaction record. The member would need to know what account it is for, and the transaction would need to know what account it is for. Problem. The account is not yet identifiable, because the database allocates keys. This means that you need to begin a transaction with the database, and while that transaction locks database resources you need to request the insert of a new account record, when that completes you need to ask the database to return the new account number, then you need to create the member specifying the (now known) account number, and then you need to insert that into the database, and so on; you get it. The problem with this is that you have to configure all of your entity relationships while you are holding a database transaction open. This creates two problems: 1) resource contention in the database, 2) data-access code and business-logic code are necessarily conflated under this scheme.


If you could allocate keys on the client (i.e. in the web application, in your case), then you could postpone (and minimise the duration of) the 'critical section'. I.e. you get the database transaction at the last minute and have everything ready to go (in the right order) as quickly as possible. That is to say that you could create a new account entity on the web server (which would imply a new guid account key), then you could create a new member entity (which would imply a new guid member key), then you could tell the new member that it's for the account identified by the new account key (note: I'm assuming you don't really want a 1:n account:member relationship, but I'll get to that), then you could create a new transaction entity (which would imply a new guid transaction key), then you could tell the new transaction that it's for the account identified by the new account key. *Then* you could begin your serializable database transaction, bang, bang, bang, commit; and you're done.

Also, in having a 'homogeneous' keying mechanism, you can generalise the implementation. I.e. if all PKs are GUIDs, then not only are they all client allocatable (is that a word?), but they are a single column, they are the same data type, they are the same data format, and they are unique in a domain-wide space (i.e. if you stumble across a random guid, then the guid will *imply* its type if you look in the database across all PKs; whereas if you stumble across a random integer, it might occur as an identifier in many contexts within your database, and none of those contexts would be implied by that integer). Having domain-wide keys is useful. The simplest way of saying why is that it means that all 'business entities' derive from 'entity', and each entity is identifiable independently of its specialisation (i.e. derived type).

Also... as a relational schema is implicitly a directed acyclic graph, you can do some pretty funky things provided you have a generic keying mechanism. (i.e. you can generate a planar graph and know the order that various types of entities must be CRUDed in, which means that all your data-persistence layer really needs to know how to do is receive a 'set' of 'entities' which require some action (i.e. CREATE, UPDATE, DELETE) and then it can (with a priori knowledge of the planar graph implied by your schema) serialise those transactions in a way to guarantee there are no constraint violations, and this has the added benefit of removing the possibility of dead-lock.) This is slightly more advanced, but it's an option that is more readily available to you if you have a generic keying mechanism. (Also, you don't actually need to persist GUIDs in every table in your database. It is possible to map GUIDs to INTs in a big [entity] table. You can then encapsulate your database with stored-procedures which encapsulate this implementation detail. Though I doubt you're likely to care about this in your scenario.)

 If the nature of your schema relies upon *not* being interfered with
 by external influences, then are you okay to go ahead and use a
 natural key?

I don't quite understand your agenda here. Why are you fighting it? It strikes me that you're trying to be pragmatic. The thing is, auto-incremented integer keys are *not* pragmatic. Natural keys are *not* pragmatic. GUIDs epitomise pragmatism.


IOW, if any changes that may un-uniquify your natural
key happens, then it's almost certain that the whole thing would need to be rebuilt .. does this provide an excuse?

No. ...but why are you trying to make excuses? :P

account (number(int), balance(int), created(date), locked(bool))

Are you aware that storing the balance in the account is denormalising your schema? Typically a summation of transactions would render the balance. I doubt that's likely to create a performance problem in your environment.


Also, I don't think it's a good idea for you to develop your database to require the 1:n account:member relationship. I accept that your business requirements at this time need nothing more that this. However, I find this unnatural, and I know exactly what the real world implications of this decision are likely to be. A single individual will want two accounts for some reason. How knows why, but it will happen. Your system can't handle members having two accounts. First they will complain that they can't have two accounts. Then, in light of the bureaucratic lockdown that is becoming evident to them, this single individual will get an additional membership. Problem. Said 'real member' now has two 'member' records. This muddies your waters. (How many people are using your system?) They forget they've done this two years later. They send you a change of details form with their new address. You modify a single 'member' record, blah, blah, blah. Complaints. Can't fix it. Encoded in the database schema. Would require a re-write. Programmer no longer available. Etc. You can enforce 1:n for now in your application-layer, but at the database layer I suggest you use an m:n mapping of accounts and members. Personally I find it more natural that a single member would have n accounts than a single account having n members.

I'd separate these business processes: "Register Member", "Create Account", and "Register Account Membership". Register Member turns a user into a member. Create Account creates a new account with an initial member. Register Account Membership allocates an additional member to an existing account. Instead of Register Account Membership you might prefer Register Joint-Account, which would have appropriately different semantics.

 member (number(int), jba(1 char), passwd, name(s), addr(s), locked,
      other contact details and preferences ...)

 JBA set to 0 (zero) by default.  For genuine joint-bank-accounts, ie
 wherever > 1 person will be attached to an account, the JBA field gets
 populated with an arbitrary char, say the first letter of the person's
 first name.

JBA(0) entities can easily become genuine JBA's,

So... let me get this straight. You say that a member's natural key is { number, jba }. Then you promise that your natural keys are immutable. Then you say that a jba mutation is possible. (Hint: I see a problem with this logic... :P)


I.e. JBA(0) -> Genuine JBA is a mutation. Per your schema you've basically changed a member's username.

 but I can't envisage
 a situation where that goes in the other direction.  I'm happy to put
 an arbitrary rule in that says you simply can't do that -- divorces
 (etc) require new separate accounts be created, monies split, and the
 old one locked in perpetuity -- a manual, straightforward process.

I am very unhappy with this proposed schema.

In the problem domain the *natural* (in a human sense) 'domain key' for a member is their username. That is, a member identifies themself to the business with their username. (Not with their { member.number, member.jba }, as you propose). So, I reckon you should have a user table, and forget about this { member.number, member.jba } business.

You only allow a member to have a single account (at this time). So, once the user has identified themself with their username it is possible for you to infer their account. In the long run, you'll probably want to expose account.number in the problem domain, but per your requirements you only need 'username'.

So... per your current requirements you don't need to externalise the primary keys of accounts or members, and the primary key on member is a foreign key to user.

We'll talk about what to do with account:member...

 JBA(char) entities can't be changed to a different char.  That is,
 I'm 27j, and I can't change to any other 27? identifier.  The ident
 you get at account-creation time, or at JBA-creation time, is what
 you're stuck with, even if you change your first name, sod you.
 Who'd change their first name, anyway?  Regardless -- at that point
 the user can simply see their identifier as a meaningless, rather than
 semi-meaningless, credential.

Why are you making this so difficult? Just give them a username. Let the username be independent of members and accounts.


To me it is pretty clear that the jba status of a member potentially varies as a function of time (i.e. from number=c, jba=0 to number=c, jba=x, number=c, jba=y; this happens when an existing account is 'split' for two members).

This means that there's no problem with obsoleted or changed data
that's attached to various transaction logs, etc. In other words, as far as I can tell I think that all my 'key' data here is immutable.

...except that it's not? :P

Now ... in this scenario, are there any problems you can
envisage with me sticking with natural primary keys of:
account (number) and member (number, jba)

I'm seriously confused dude. Why can't members have multiple accounts? Why does accounts being shared by multiple members have to impact their username? Why do you need an account before you can become a member? Isn't there a window of time during 'registration' where you would typically be a member with potentially no account? I.e. how can you 'log in' to 'register for an account' without a membership? ...and as a 'membership' requires an a priori 'account' aren't you sool?) Isn't a 'joint account' really an attribute of an account, implied by the number of members registered with an account (i.e. n > 1)?


In any case, I think it's extremely unwise to lock yourself into your database like this at such an early stage. Just make member:account m:n, you can enforce n:1 for now by only developing that functionality in your application.

I reckon you should be thinking more along these lines:

-- tables:

create table [user] (
  [user_key]             uniqueidentifier     not null,
  [username]             nvarchar(32)         not null,
  [passwd_hash]          nvarchar(64)         not null,
  [is_disabled]          bit                  not null
)

create table [member] (
  [member_key]           uniqueidentifier     not null
)

create table [account] (
  [account_key]          uniqueidentifier     not null,
  [account_number]       int  identity(1, 1)  not null
)

create table [account_member] (
  [account_member_key]   uniqueidentifier     not null,
  [account_key]          uniqueidentifier     not null,
  [member_key]           uniqueidentifier     not null
)

-- primary keys:

alter table [user] add
  constraint [pk_user] primary key ( [user_key] )

alter table [member] add
  constraint [pk_member] primary key ( [member_key] )

alter table [account] add
  constraint [pk_account] primary key ( [account_key] )

alter table [account_member] add
  constraint [pk_account_member] primary key ( [account_member_key] )

-- foreign keys:

-- note: this requires all members have a user record.
-- Then you can 'lock' user-accounts; which makes more sense than
-- 'locking' members. You might also 'freeze' accounts, but you'd
-- want to have an exceptional legal team...
-- note: this is an example of what you might call
-- 'relational inheritance'. I.e. 'member' inherits 'user': not all
-- users are members, but all members are users.
alter table [member] add
  constraint [fk_member_user] foreign key ( [member_key] )
  references [user] ( [user_key] )

alter table [account_member] add
  constraint [fk_account_member_account] foreign key ( [member_key] )
  references [member] ( [member_key] )

alter table [account_member] add
  constraint [fk_account_member_member] foreign key ( [account_key] )
  references [account] ( [account_key] )

-- just for fun:

create view [joint_account] as
  select
    [account].*
  from
    [account]
  where
    [account_key] in (
      select
        [account_key]
      from
        [account_member]
      group by
        [account_key]
      having
        count(*) > 1
    )

Disclaimer: I haven't tested, verified, or deeply considered any of the above. And there's undoubtedly a sexier way to create that view. (And I can begin a sentence with a conjunction if I want. (So there.))

 Once you make the mistake of proving you're bored enough to have
 read this far .. I'll ask again about a schema to handle trading post
 style data.  It's giving me a migraine.

I can't wait. :)

p.s. I have two server racks full of semi-configured gear, a heavily under-utilised 1.5 Mb DSL line, and zero time for sys-admin. If you pop over to my place and help me set it up, I can host your application on your preferred flavour of *nix on either sparc or i386 for free (although I can't extend any guarantees RE: availability or continuity of service).















-
You are subscribed to the progsoc mailing list. To unsubscribe, send a
message containing "unsubscribe" to progsoc-request@xxxxxxxxxxxxxxxxxxx
If you are having trouble, ask owner-progsoc@xxxxxxxxxxxxxxxxxx for help.