Re: need much better query perfomance

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: andy(at)mixonic(dot)com (Andy), pgsql-general(at)postgresql(dot)org
Subject: Re: need much better query perfomance
Date: 2003-01-29 16:49:51
Message-ID: 20030129164951.61C81103E8@polaris.pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

First the basics - have you run vacuum analyze? Also, if you can have
exclusive access to the database for a while, "vacuum full".

I haven't had my coffee but the query doesn't look right. If I understand the
question you want to find those tracks which are marked for deletion and do
not appear on any "active" (non-deleted) album.

I don't think union is the correct solution. I'd try something like:

select track_id from track where track.deleted = 'Y'
and
not exists (select * from album_track where
album_track.track_id = track.track_id
and
album_track.album_id = album.album.id
and
album.deleted is null)

Cheers,
Steve

> from track t
> where t.deleted = 'Y'
> UNION
> select track_id from track where track_id not in (
> select at.track_id from album_track at, album alb
> where at.album_id = alb.album_id and alb.deleted is null
> )

On Friday 24 January 2003 3:28 pm, Andy wrote:
> Here's the problem:
>
> Table track has 100,000 rows
> Table album has 20,000 rows
> Table album_track is a lookup table linking tracks to albums and has
> 80,000 rows
>
> The track table has a PK, a deleted field ('Y' for deleted tracks /
> null otherwise), and several track-specific fields.
> The album table also has a PK and a deleted field ('Y' or null), as
> well as several album-specific fields.
> The album_track table simple has two columns: The PKs of the album and
> track tables.
>
> A deleted track may still exist in an not-deleted album. A deleted
> track is simply one that cannot be added to new albums.
> Likewise, a deleted album may contain not-deleted tracks. It is
> simply an album that the user has deleted.
>
> So even though we've marked deleted albums & tracks in the database,
> we haven't removed "deleted" tracks from our fileserver. Now we're
> almost out of space, so we need to find all tracks that are marked as
> deleted and which are not a part of any non-deleted albums.
>
> I wrote what must be a very naive query to find such tracks. Doing an
> explain on this query gave a huge time estimate. My query is below.
> Would someone please suggest a faster approach?
>
> select t.track_id
> from track t
> where t.deleted = 'Y'
> UNION
> select track_id from track where track_id not in (
> select at.track_id from album_track at, album alb
> where at.album_id = alb.album_id and alb.deleted is null
> )
>
> The above query should work, but it takes too long to execute. The
> sample query below, however, is fast but incorrect:
>
> select t.track_id
> from track t
> where t.deleted = 'Y'
> UNION
> select at.track_id
> from album_track at, album alb
> where at.album_id = alb.album_id and alb.deleted = 'Y'
>
> This query is incorrect because it the bottom-most select is getting
> all deleted albums, but a track might be in both deleted AND
> non-deleted albums. This incorrect query will return all the
> "completely deleted" tracks, but will also return tracks that are part
> of both deleted and non-deleted albums, and we don't want this.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Stephan Szabo 2003-01-29 16:51:30 Re: index on timestamp performance
Previous Message Tom Lane 2003-01-29 16:49:19 Re: monitoring postgres