Re: EXPLAIN detail

From: "Luigi N(dot) Puleio" <npuleio(at)rocketmail(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: EXPLAIN detail
Date: 2008-04-09 14:44:23
Message-ID: 981759.47577.qm@web33504.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>> With all that casting, is it possible that appropriate indexes aren't
>> being used because your WHERE / ON clauses aren't an exact type match
>> for the index?
>
> You mean to put an index on date with timestamptz datatype column?...

> Er ... I'm not quite sure what you mean. Do you mean an index on a cast
> of the column, eg:

> CREATE INDEX some_idx_name ON some_table ( some_timestamp_field::date )

> then ... maybe. It's hard to be sure when there is so little information
> available. It shouldn't be necessary, but there are certainly uses for
> that sort of thing - for example, I use a couple of functional indexes
> in the schema I'm working on at the moment. It's probably a good idea to
> look at ways to avoid doing that first, though.

>> Can you post the full EXPLAIN ANALYZE from the query? This snippet
>> doesn't even show how records are being looked up.
>
> HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual time=7004.779...7004.782 rows=1 loops=1)
> -> Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual time=3939.450..7004.592 rows=1 loops=1)
> Join filter: (("inner".calldate)::time without time zone => ("outer".calldate)::time without time zone)
> -> Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual time=2479.199..2485.266 rows=3 loops=1)
> Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = '410'::text) AND (substr((dst)::text, 1, 4)='*100'::text) AND ((lastdata)::text ='/dati/ita/loginok'::text))
> ->Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual time=1504.508..1506.374 rows=1 loops=3)
> Filter: ((((lastdata)::text ='/dati/ita/logoutok'::text) AND ('410'::text=(src)::text) AND ('2008-04-09'::date = (calldate)::date))
> Total runtime: 7005.706 ms

> Personally, I'd want to get rid of all those casts first. Once that's
> cleaned up I'd want to look at creating appropriate indexes on your
> tables. If necessary, I might even create a composite index on
> (lastdata,src,calldate) .

>> What about a \d of the table from psql, or at least a summary of the
>> involved column data types and associated indexes?
>
> this table has an acctid column which is PK then most of the other columns are varchar(80) or so....

> Do you mean that the columns involved in your WHERE and ON clauses, the
> ones you're casting to date, timestamp, etc, are stored as VARCHAR? If
> so, it's no surprise that the query is slow because you're forcing
> PostgreSQL to convert a string to a date, timestamp, or time datatype to
> do anything with it ... and you're doing it many times in every query.
> That will be VERY slow, and prevent the use of (simple) indexes on those
> columns.

> If you're really storing dates/times as VARCHAR, you should probably
> look at some changes to your database design, starting with the use of
> appropriate data types.

> That's all guesswork, because you have not provided enough information.

> Can you please post the output of psql's \d command on the table in
> question?

> If for some reason you cannot do that, please at least include the data
> type of the primary key and all fields involved in the query, as well as
> a list of all the indexes on both tables.

> The easy way to do that is to just launch "psql" then run:

> \d table

> and paste the output to an email.

> So for 4 records result, 7 seconds are too way a lot I guess... but as I said before I'm gonna wait if > the responsible of the server did a VACUUM on the table...
>
> What do you think?...

> If you're really casting VARCHAR to DATE, TIME, TIMESTAMP, etc on demand
> then personally I really doubt that dead rows are your problem.

Well, this table has a primary key index on first column called acctid which is an integer; instead the calldate column is a TIMESTAMPTZ and in fact I'm using to do (calldate)::date in the ON clause because since the time part of that column is always different and in the nesting I have to identificate the date is the same...

the other two columns (src and lastdata) are both VARCHAR(80) and the query is this one:

EXPLAIN ANALYZE
SELECT
(a.calldate)::date,
a.src,
a.dst,
MIN(e.calldate) - a.calldate
FROM
cdr a
INNER JOIN cdr e
ON ((e.calldate)::date = (a.calldate)::date AND e.src = a.src
AND e.lastdata = '/dati/ita/logoutok' AND e.calldate >= a.calldate)
WHERE
(a.calldate)::date = '2008-04-09'
AND a.src = '410'
AND substr(a.dst, 1, 4) = '*100'
AND a.lastdata = '/dati/ita/loginok'
GROUP BY
a.calldate, a.src, a.dst

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2008-04-09 15:28:51 Re: EXPLAIN detail
Previous Message Craig Ringer 2008-04-09 14:15:06 Re: EXPLAIN detail