Re: problems maintaining boolean columns in a large table

From: Richard Huxton <dev(at)archonet(dot)com>
To: Ben Campbell <ben(at)scumways(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: problems maintaining boolean columns in a large table
Date: 2010-02-09 11:45:01
Message-ID: 4B714ABD.4010608@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 09/02/10 11:25, Ben Campbell wrote:
[snip]
> I need to set all
> those flags, but it takes _ages_ to do "UPDATE article SET
> needs_indexing=true;"
[snip]
> I _think_ the reason it takes so long is that postgresql doesn't modify
> rows in place - it creates an entry for the modified row and zaps the
> old one. So by touching _every_ row I'm basically forcing it to rebuild
> my whole database... I've got about 2 million rows in 'articles'.
> There are a few indexes on columns in 'articles' which obviously will
> slow things down too.

Exactly so.

> I've had a minor attempt at tuning (increased checkpoint_segments) an d
> I'm sure there are a bunch of other tricks I could use to bulk-set that
> flag in much less time...

Probably not, actually.

> But my gut feeling is that the flag would be better off in it's own
> table anyway, eg:
>
> CREATE TABLE needs_indexing (
> article_id integer references article(id)
> );
>
> So, if an article is listed in this table, it needs indexing.
> (maybe with a constraint to ensure uniqueness - I only need articles
> entered once in this table)

That sounds sensible to me - "needs indexing" depends on two things:
1. Is the article more recent than it's FTS index
2. Have you asked for it to be re-indexed anyway.
The first point concerns two tables, and the second is (from the
database point of view) arbitrary. A separate table containing indexing
jobs to do seems the sensible approach. Lets you batch up your
re-indexing work too.

Oh - you might want to consider how/whether to handle multiple entries
for the same article in your queue.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vincenzo Romano 2010-02-09 12:26:30 Re: string reverse fucntion?
Previous Message Magnus Hagander 2010-02-09 11:41:40 Re: vacuumdb ERROR: out of memory