Re: Slow query on OS X box

From: "Patrick Hatcher" <PHatcher(at)macys(dot)com>
To: "Rod Taylor <rbt" <rbt(at)rbt(dot)ca>
Cc: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>, pgsql-performance-owner(at)postgresql(dot)org
Subject: Re: Slow query on OS X box
Date: 2003-01-23 00:05:39
Message-ID: OF2F8C390D.61ED20AE-ON88256CB7.00006621@fds.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Thanks everyone. I'll give your suggestions a try and report back.

Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM


Rod Taylor <rbt(at)rbt(dot)ca>
Sent by: To: Patrick Hatcher <PHatcher(at)macys(dot)com>
pgsql-performance-owner(at)post cc: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
gresql.org Subject: Re: [PERFORM] Slow query on OS X box


01/22/2003 02:54 PM

Yup, since you still need to pull everything off the disk (the slowest
part), which is quite a bit of data. You're simply dealing with a lot
of data for a single query -- not much you can do.

Is this a dedicated -- one client doing big selects like this?

Knock your shared_buffers down to about 2000, bump your sort mem up to
around 32MB (128MB or so if it's a dedicated box with a vast majority of
queries like the below).

Okay, need to do something about the rest of the data. 13million * 2k is
a big number. Do you have a set of columns that are rarely used? If
so, toss them into a separate table and link via a unique identifier
(int4). It'll cost extra when you do hit them, but pulling out a few of
the large ones information wise would buy quite a bit.

Now, wizard. For that particular query it would be best if entries were
made for all the values of wizard into a lookup table, and change
search_log.wizard into a reference to that entry in the lookup. Index
the lookup table well (one in the wizard primary key -- int4, and a
unique index on the 'wizard' varchar). Group by the number, join to the
lookup table for the name.

Any other values with highly repetitive data? Might want to consider
doing the same for them.

In search_log, index the numeric representation of 'wizard' (key from
lookup table), but don't bother indexing numbers that occur regularly.
Look up how to create a partial index. Ie. The value 'Keyword' could be
skipped as it occurs once in four tuples -- too often for an index to be
useful.

On Wed, 2003-01-22 at 15:49, Patrick Hatcher wrote:
> Sorry I'm being really dense today. I didn't even notice the 3.2 million
> row being returned. :(
>
> To answer your question, no, all fields would not have data. The data we
> receive is from a Web log file. It's parsed and then uploaded to this
> table.
>
> I guess the bigger issue is that when trying to do aggregates, grouping
by
> the wizard field, it takes just as long.
>
> Ex:
> mdc_oz=# explain analyze select wizard,count(wizard) from search_log
where
> sdate
> between '2002-12-01' and '2003-01-15' group by wizard;
> QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------

> Aggregate (cost=1083300.43..1112411.55 rows=388148 width=10) (actual
> time=229503.85..302617.75 rows=14 loops=1)
> -> Group (cost=1083300.43..1102707.84 rows=3881482 width=10) (actual
> time=229503.60..286014.83 rows=3717161 loops=1)
> -> Sort (cost=1083300.43..1093004.14 rows=3881482 width=10)
> (actual time=229503.57..248415.81 rows=3717161 loops=1)
> Sort Key: wizard
> -> Seq Scan on search_log (cost=0.00..575217.57
> rows=3881482 width=10) (actual time=91235.76..157559.58 rows=3717161
> loops=1)
> Filter: ((sdate >= '2002-12-01'::date) AND (sdate
> <= '2003-01-15'::date))
> Total runtime: 302712.48 msec
> (7 rows)

> On Wed, 2003-01-22 at 13:26, Patrick Hatcher wrote:
> > I have a table that contains over 13 million rows. This query takes an
> > extremely long time to return. I've vacuum full, analyzed, and
> re-indexed
> > the table. Still the results are the same. Any ideas?

> > mdc_oz=# explain analyze select wizard from search_log where wizard
> > ='Keyword' and sdate between '2002-12-01' and '2003-01-15';
> > QUERY PLAN
> >
>
-----------------------------------------------------------------------------------------------------------------------------

>
> > Seq Scan on search_log (cost=0.00..609015.34 rows=3305729 width=10)
> > (actual time=99833.83..162951.25 rows=3280573 loops=1)
> > Filter: ((wizard = 'Keyword'::character varying) AND (sdate >
> > = '2002-12-01'::date) AND (sdate <= '2003-01-15'::date))
> > Total runtime: 174713.25 msec
> > (3 rows)
> --
> Rod Taylor <rbt(at)rbt(dot)ca>
>
> PGP Key: http://www.rbt.ca/rbtpub.asc
> (See attached file: signature.asc)
>
>
> ______________________________________________________________________
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc
(See attached file: signature.asc)

Attachment Content-Type Size
signature.asc application/octet-stream 194 bytes

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-01-23 00:06:12 Re: [PERFORM] Proposal: relaxing link between explicit JOINs and execution order
Previous Message Tom Lane 2003-01-23 00:04:02 Re: Terrible performance on wide selects