Re: Text search language field

From: Daniel Staal <DStaal(at)usa(dot)net>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, PGSQL-Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Text search language field
Date: 2012-05-13 17:32:34
Message-ID: 812551699419D670EC653837@mac-pro.magehandbook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

--As of May 13, 2012 3:37:22 PM +1200, Gavin Flower is alleged to have said:

>> Generally I have, though not always I'll admit.  'Primary key' implies
>> it, and that was the only one that actually has that restriction in the
>> selection of fields I showed.
>
> I am amazed at the number of times I see people specifying 'NOT NULL' and
> PRIMARY KEY' for the same field!  Mind you, these same people could
> probably justifiable laugh at the daft things I do!  :-)

I don't mind the belt-and-suspenders approach on this, if you want to use
it. Sure, it's redundant, but it's also clearer and a decent reminder to
the human reading the SQL.

>>>   • keeping primary and foreign key fields separate from user
>>> visible data.
>>
>> I disagree strongly here.  ;)  Primary keys should be whatever is
>> suitable for the data; creating an artificial primary key (when not
>> needed) is just storing more data and doing more work.  It also gives a
>> feeling of 'safety' which doesn't actually exist, as you can then create
>> data that is invalid, but that fits because your table design allows
>> it.  In particular, the language table keeping the the 'code' unique and
>> distinct is the *entire point* of the table, so there is no good reason
>> to use anything else as the primary key.  (I do have a generated ID in
>> the resource table, though it's a much more complex generation than a
>> simple serial.  Again, I didn't feel the need to show it.)
>
> Actually, If I had thought about it a bit more, and something I would (or
> should!) do if I was designing the table for real, would be to use an
> UNIQUE qualifier for the code field.
>
> Unless there are performance and/or data storage, or some such
> constraints - I prefer to linking tables with (non)user visible things 
One
> production database I worked on had 5 tables in a chain of parent/child
> relationships, and each child primary key was a concatenation of a
> character field with the fields comprising the primary key of its parent
> – could be over 45 bytes in characters in length!
>
> The current database I am designing is very complicated, but likely never
> to have more than a few thousand records in any table, and is likely to
> have many more reads (with some quite complicated queries) than writes.
> So I focus on trying to work in a very standardizing way, without having
> to worry over much about performance. Knowing my luck, my next project
> will be the exact opposite!

This project has the potential to be very performance-critical, so I'm
trying to operate on that assumption. ;)

And again, I don't see the point of creating extra fields and data just to
enforce some artificial separation between 'user-visible' and 'database'
fields. There's no performance benefit, there's a maintenance *penalty*
(in that your data is more complicated), and a programming penalty.
(Again: your data is more complicated.)

In this case, for example, doing it with a separate 'id' field would either
require a hash lookup in the application this database is being created to
support, or a separate lookup on nearly every write to the resources table
(AKA: The second-most common operation I'm expecting), just to get the
language code id. The first has obvious maintenance problems, and they
both have a performance penalty. And none of this has any benefit to
anyone, that I can see. So why?

> I remember many years ago, that there was a big argument about systematic
> as distinct from meaningful names. I was programming in COBOL (names
> could be up to 30 characters long), I thought the argument was silly, as
> it depends... In fact in one COBOL program I adopted both approaches, as
> short systematic names are better for use in a set complicated numerical
> expressions and only in a self contained stretch of code, and meaningful
> names for variables used throughout a program.

The only time it's not silly is when it's being decided upon as a
'company-wide standard'. Then it's tragic. ;) (Good guidelines are
invaluable. Enforced standards, especially given by those who aren't doing
any of the actual work, are not.)

> Never too sure what other people know, being helpful can run the risk of
> seemingly be patronising!  I remember in one job I was given a task whee
> for part of it I was effectivly at the level of a trainess, and for other
> parts i had greater experience - a little unsettling!

That's been most of my jobs. ;)

> P.S. about top posting before, I got a bit distracted by a work related
> call.

No problem. Now, if I could only get you to respect the 'Reply-To:'
header... (I'm also not a fan of HTML email, but I can live with that.)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author. Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes. This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Rob Richardson 2012-05-14 13:57:13 Unwanted time zone conversion
Previous Message Gavin Flower 2012-05-13 09:05:08 Re: Text search language field