Re: arrays and indexes

From: "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
To: Greg Stark <gsstark(at)mit(dot)edu>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: arrays and indexes
Date: 2004-07-26 18:47:03
Message-ID: 20040726184703.GA27820@cnx.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 26, 2004 at 02:27:20AM -0400, Greg Stark wrote:
>
> "Ross J. Reedstrom" <reedstrm(at)rice(dot)edu> writes:
>
> > In the new schema, the same thing is:
> >
> > SELECT * from content where 42 = ANY (authors);
> >
> > Works fine, but for the life of me I can't find nor figure out how to
> > build an index that will be used to speed this along. Any ideas?
>
> Well that's basically the problem with denormalized data like this.
>
> Have you resolved what you're going to do if two sessions try to add a user to
> the same group at the same time? Or how you'll go about removing a user from
> all his groups in one shot?

We've got plenty of interlocks in the middleware to handle the first
(mainly because this is an authoring system where everyone has to agree
to participate, and acknowledge the open license on the materials)

Second, they _can't_ be removed: we're effectively a write only archive.
Even if we weren't it would be a rare event and could go slowly (loop
over groups in the middleware, probably)

>
> Basically, if you denormalize in this fashion it becomes hard to use the
> groups as anything but single monolithic objects. Whereas normalized data can
> be queried and updated from other points of view like in the case you name
> above.

These groups _really are_ ideal for Joe Conway's work on arrays: we need
ordered vectors, so we'd be sorting all the time, otherwise. They're
static, and they're read only. The one thing they're not is fixed, known
size (Sorry Merlin). They work fine for the query as shown: the only
issue is performance.

> Postgres does have a way to do what you ask, though. It involves GiST
> indexes and the operators from the contrib/intarray directory from the
> Postgres source.

Well, yes, that's how it used to be done. I figured the new array
support should be able to handle it without the addon, however.

> However I warn you in advance that this is fairly esoteric stuff and
> will take some time to get used to. And at least in my case I found
> the indexes didn't actually help much for my data sets, probably
> because they just weren't big enough to benefit.

I know that they should help in this case: we've got lots of content.
Any particular author or maintainter will be in a small fraction of
those. i.e.: it's ideal for an index. And the current joined case uses
an index, when it's available. I'll take a look at the GiST/contrib work,
anyway.

Thanks -

Ross
--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Research Scientist phone: 713-348-6166
The Connexions Project http://cnx.rice.edu fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre-Frédéric Caillaud 2004-07-26 20:35:01 Re: arrays and indexes
Previous Message Harmon S. Nine 2004-07-26 18:37:48 Re: Timestamp-based indexing