Re: [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dmitriy Sarafannikov <dsarafannikov(at)yandex(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Borodin Vladimir <root(at)simply(dot)name>, Хомик Кирилл <khomikki(at)yandex-team(dot)ru>
Subject: Re: [PROPOSAL] Use SnapshotAny in get_actual_variable_range
Date: 2017-04-29 01:07:17
Message-ID: CA+TgmobC==3OrzK8uoGx8yNQw-gvPwedfOrVreUBVo8wVhDJQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 28, 2017 at 3:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> You are confusing number of tuples in the index, which we estimate from
> independent measurements such as the file size, with endpoint value,
> which is used for purposes like guessing whether a mergejoin will be
> able to stop early. For purposes like that, we do NOT want to include
> dead tuples, because the merge join is never gonna see 'em.

I spent several hours today thinking about this and, more than once,
thought I'd come up with an example demonstrating why my idea was
better than yours (despite the fact that, as you point out, the merge
join is never gonna see 'em). However, in each instance, I eventually
realized that I was wrong, so I guess I'll have to concede this point.

> Or put another way: the observed problem is planning time, not that the
> resulting estimates are bad. You argue that we should cut planning
> time by changing the definition of the estimate, and claim that the
> new definition is better, but I think you have nothing but wishful
> thinking behind that. I'm willing to try to cut planning time, but
> I don't want the definition to change any further than it has to.

OK, I guess that makes sense. There can be scads of dead tuples at the
end of the index, and there's no surety that the query actually
requires touching that portion of the index at all apart from
planning, so it seems a bit unfortunate to burden planning with the
overhead of cleaning them up. But I guess with your new proposed
definition at least that can only happen once. After that there may
be a bunch of pages to skip at the end of the index before we actually
find a tuple, but they should at least be empty. Right now, you can
end up skipping many tuples repeatedly for every query. I tested a
two-table join with a million committed deleted but not dead tuples at
the end of one index; that increased planning time from ~0.25ms to
~90ms; obviously, a two-and-a-half order of magnitude increase in CPU
time spent planning is not a welcome development on a production
system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-29 03:39:40 Re: convert EXSITS to inner join gotcha and bug
Previous Message Peter Geoghegan 2017-04-29 01:02:26 A design for amcheck heapam verification