Re: join with redundant results VS simpler join plus multiple selects

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: WireSpot <wirespot(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: join with redundant results VS simpler join plus multiple selects
Date: 2008-11-20 13:05:24
Message-ID: 49256094.3030107@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

WireSpot wrote:
> I have a schema with galleries, people and images. Each person has a
> bunch of private images. People can join any number of galleries and
> can publish any of their images to the galleries they join (or not).
>
> I'd like to retrieve a data set where for a given gallery id I get all
> the people AND all the images they've published to that gallery.
>
> I can do this in two ways.
> 1) Do a join that will give me the people that belong to said gallery,
> then loop in the code and do simple selects to retrieve images in that
> gallery for each of them.
> 2) Do a join between all three tables. The end result will have as
> many rows as total images for all the people in the gallery.
> Obviously, there's going to be redundant data, since a person's info
> will be repeated for each image.
>
> Which is better in terms of performance? I used EXPLAIN ANALYZE and
> actual queries and it seems to suggest that option 2, while returning
> redundant info, is faster.

That's probably going to be the case. PostgreSQL won't need to read the
redundant info in from disk each time, and relative to the image data
it's going to be pretty small. By doing it all in one join you're
avoiding the overhead of all those network round trips (if on a
network), statement preparation and planning, etc etc etc. Additionally,
PostgreSQL is probably going to be using a join plan that's much more
efficient than anything you'll get by looping over each user and asking
for images.

If you wanted to avoid returning too much redundant info, you could
always do it in two queries:

- Find a list of all users belonging to the gallery and any other
non-image data associated with them; then

- Retrieve all images in one query using a join against the list of
users who're members of the gallery, but only actually return (eg) the
user id, gallery id, and image data for each image.

In all honestly, though, it probably doesn't matter unless there's a LOT
of additional data you want to obtain about each user.

Note, however, that when testing method (1) in your post you will REALLY
need to make sure that you're using parameterized prepared statements
for the image queries.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Frank Bax 2008-11-20 13:22:25 Re: [SQL] date range query help
Previous Message Craig Ringer 2008-11-20 12:59:52 Re: hidden errors calling a volatile function inside a stable function