Re: Catalog Access (was: [GENERAL] Concurrency problem

From: Wes <wespvp(at)syntegra(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Wes <wespvp(at)syntegra(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Catalog Access (was: [GENERAL] Concurrency problem
Date: 2006-04-26 23:19:26
Message-ID: C0756A2E.B127%wespvp@syntegra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/26/06 5:34 PM, "Jim C. Nasby" <jnasby(at)pervasive(dot)com> wrote:

> Try running a first index build by itself and then running them in
> parallel. Hopefully once pg_class has an exact tuple count the
> conflicting update won't happen. If you actually have an exact tuple
> count you could also try updating pg_class manually beforehand, but
> that's not exactly a supported option...

I thought about that. It would work well for the table with 3 indexes (1/2)
either way, but would be an extra pass on the one with 4 (1/2/1 instead of
2/2).

> Another possibility would be to patch the code so that if the tuplecount
> found by CREATE INDEX is within X percent of what's already in pg_class
> it doesn't do the update. Since there's already code to check to see if
> the count is an exact match, this patch should be pretty simple, and the
> community might well accept it into the code as well.

I don't think that would help here. I assume after the COPY, the tuple
count is zero, and after the first index build, it is exact.

Dumb question... Since COPY has to lock the table, why doesn't it take the
current count in pg_class and increment it by the number of rows inserted?
If you're doing a clean load of a table, that would result in an exact
count.

What about your idea of retrying the request if it detects a conflict?

> BTW, why are you limiting yourself to 2 indexes at once? I'd expect that
> for a table larger than memory you'd be better off building all the
> indexes at once so that everything runs off a single sequential scan.

I don't know enough about the index build process. My presumption was that
while you might get a gain during the read process, the head contention
during the sort/write process would be a killer. I don't have enough
spindles (yet) to separate out the different indexes. I think you'd only
want to do one table at a time to avoid head contention during the read
process. Wouldn't this tend to exacerbate the current problem, too? In
this specific case, I guess I could do 1,2 and 1,3 parallel builds (for the
3 index/4 index tables).

Right now I'm running with rather restricted hardware (1GB memory, two
2-disk RAID 0's and a single disk). If the pilot proves what I think is
possible, and I can get real hardware (i.e. an intelligent caching array and
some serious memory), things change a bit.

Wes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-04-26 23:21:56 Re: Catalog Access (was: [GENERAL] Concurrency problem
Previous Message Tom Lane 2006-04-26 23:13:08 Re: Catalog Access (was: [GENERAL] Concurrency problem