Re: How to setup default value "0000-00-00" for "date"

From: Michal Taborsky <michal(at)taborsky(dot)cz>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, emilu(at)cs(dot)concordia(dot)ca
Subject: Re: How to setup default value "0000-00-00" for "date"
Date: 2004-08-20 08:39:42
Message-ID: 4125B8CE.5010801@taborsky.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Richard Huxton wrote:
> Where you don't have a valid date to store you should use NULL. This
> business of storing zeroes is a horrible MySQL design mistake.

Well, yes and no. It certainly is a design mistake and introduces
incosistency into the database, but after I was bitten several times by
NULL values I'd go for solution like this any day. Let me explain.

We had a table of messages, which was inserted to randomly and every few
minutes we'd walk through the unprocessed messages and perform some work
on them. I, trying to have the database as clean as possible, used this
table definition (simplified):

messages (
id serial,
data text,
arrived timestamp default now,
processed timestamp)

So after the message arrived, it had the processed field set to null,
which was perfectly consistent and represented what it realy was--an
unknown value.

We'd then simply SELECT * FROM messages WHERE processed IS NULL and were
happy ever after, only to realise after the table had grown to few
thousands rows, that the SELECT took ages, because the system had to
perform seqscan. Aha! So we added an index on processed, because common
sense told me, that as long as there are 100k rows and only 10 of them
are NULL, the index would be very selective and therefore useful.

I guess you know where it ends--the index is not used for IS [NOT] NULL
expressions. The obvious workaround was to add DEFAULT value to
"processed" in form of kind of anchor (we used '-infinity') and then do
SELECT * FROM messages WHERE processed='-infinity'.

Bingo! The thing went 100x faster. So we could choose to have
standards-compliant, very clean database design OR the thing that does
what it's supposed to do in reasonable time. And believe me, it's kind
of difficult to explain to our logistics department that we could have
done the thing to return results in milliseconds instead of 10 secs, but
chose not to for sake of clean design.

It'd be really nice if we didn't have to use such hacks, but hey, life's
inperfect.

And so that this would not be just a literary exercise and to answer
Emi's question--you can't do that, but use some valid date which you are
never going to use for your ordinary data (like the '-infinity', or
1.1.1970 00:00). Just make sure you make a note of it somewhere and my
suggestion is you write a COMMENT ON that column for future generations.

--
Michal Taborsky
http://www.taborsky.cz

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mark Gibson 2004-08-20 08:51:20 Re: Forwarding kerberos credentials
Previous Message Christian Kratzer 2004-08-20 08:20:42 Re: How to setup default value "0000-00-00" for "date"