Re: Simple join doesn't use index

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alex Vinnik <alvinnik(dot)g(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Simple join doesn't use index
Date: 2013-02-02 16:39:42
Message-ID: CAMkU=1w2xdj3tupyOkyBNku6N6uE_+Yg93Yzhr9RVSxYu+kLkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Monday, January 28, 2013, Alex Vinnik wrote:

> It sure turned out that default settings are not a good fit. Setting random_page_cost
> to 1.0 made query to run in 2.6 seconds and I clearly see that indexes are
> being used in explain plan and IO utilization is close to 0.
>

This is not surprising. The default settings are aimed at quite small
servers, while you seem to have a rather substantial one. Have you done
anything yet to fix effective_cache_size?

> QUERY PLAN
> Sort (cost=969787.23..970288.67 rows=200575 width=8) (actual
> time=2176.045..2418.162 rows=241238 loops=1)
>

However I noticed that sorting is done using disk("external sort Disk:
> 4248kB")
>

As far as pgsql is concerned, it is using the disk. But the kernel is
probably buffering that to an extent that the disk is probably being
touched rather little. So I wouldn't worry about it.

> which prompted me to take a look at work_mem. But it turned out that
> small increase to 4MB from default 1MB turns off index usage and query gets
> x10 slower. IO utilization jumped to 100% from literally nothing. so back
> to square one...
>
> QUERY PLAN
> Sort (cost=936642.75..937144.19 rows=200575 width=8) (actual
> time=33200.762..33474.443 rows=241238 loops=1)
>

And why should the IO utilization have jumped? Is everything in memory, or
is it not? You should run your EXPLAINs with (analyze, buffers), and also
you should turn on track_io_timings, at least in the local session; that
will give us some insights.

If everything is in memory, then why is the seq scan taking so long? If
not, then why is the nested loop such a good idea? (In my hands, when
everything does *not* fit in memory, the nested loop is very very bad)

You seem have a bit of an infatuation with Dec 15th, running that one query
over and over and over. Why? If the real
live query is not just for that one day repeatedly, then you should test
with different days, not just one day repeatedly. (And if your real query
really is like the movie "Groundhog Day", you should probably cluster or
partition with that in mind.)

Anyway, there was an issue introduced in 9.2.0 and to be removed in 9.2.3
which over-penalized nested loops that had large indexes on the inner side.
Since your different plans are so close to each other in estimated cost, I
think this issue would be enough to tip it into the seq scan. Also, your
poor setting of effective_cache_size might also be enough to tip it. And
both combined, almost certainly are.

But ultimately, I think you are optimizing for a case that does not
actually exist.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dan Fairs 2013-02-04 21:14:58 Re: Simple join doesn't use index
Previous Message Виктор Егоров 2013-02-01 22:17:54 Re: Fighting the planner >:-(