Re: problems maintaining boolean columns in a large table

From: "Timo Klecker" <klecker(at)decoit(dot)de>
To: "'Ben Campbell'" <ben(at)scumways(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: problems maintaining boolean columns in a large table
Date: 2010-02-11 08:23:59
Message-ID: 000301caaaf3$8fb79b30$af26d190$@de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Ben,

could you post your trigger function? When you need to rebuild the index,
you could disable the trigger setting the flag if the article is modified.
This could speed up your UPDATE.

Mit freundlichen Grüßen
Timo Klecker

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org] Im Auftrag von Ben Campbell
Gesendet: Dienstag, 9. Februar 2010 12:26
An: pgsql-general(at)postgresql(dot)org
Betreff: [GENERAL] problems maintaining boolean columns in a large table

I've got a database that holds a bunch of articles in a table called
'article'. It has a bunch of columns, and each row might hold a few KB
of data, say.
I'm maintaining a separate fulltext database, and so I added a boolean
flag, 'needs_indexing' to my 'article' table to keep track of which
articles have been indexed (and I have some trigger functions on
'article' to automatically set the flag if the article is modified).

It all works fine.
Except when I want to rebuild my index from scratch. I need to set all
those flags, but it takes _ages_ to do "UPDATE article SET
needs_indexing=true;" (many hours at least - I've never let it run to
completion)

I _think_ the reason it takes so long is that postgresql doesn't modify
rows in place - it creates an entry for the modified row and zaps the
old one. So by touching _every_ row I'm basically forcing it to rebuild
my whole database... I've got about 2 million rows in 'articles'.
There are a few indexes on columns in 'articles' which obviously will
slow things down too.

I've had a minor attempt at tuning (increased checkpoint_segments) an d
I'm sure there are a bunch of other tricks I could use to bulk-set that
flag in much less time...

But my gut feeling is that the flag would be better off in it's own
table anyway, eg:

CREATE TABLE needs_indexing (
article_id integer references article(id)
);

So, if an article is listed in this table, it needs indexing.
(maybe with a constraint to ensure uniqueness - I only need articles
entered once in this table)

Does this sound like a reasonable way to go?
Any advice or insight welcome!

Thanks,
Ben.

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2010-02-11 08:26:26 Re: Warning when selecting column from pg_stat_user_tables.
Previous Message Carsten Kropf 2010-02-11 07:54:12 Re: Extending SQL in C using VARIABLE length type