Re: Erroneous cost estimation for nested loop join

From: KAWAMICHI Ryoji <kawamichi(at)tkl(dot)iis(dot)u-tokyo(dot)ac(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Erroneous cost estimation for nested loop join
Date: 2015-11-12 08:17:39
Message-ID: 503167833.13041710.1447316259819.JavaMail.zimbra@tkl.iis.u-tokyo.ac.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> TBH, you lost me there already. I know of no hardware on which that would
> be a sane depiction of reality, so I think you've probably overfitted the
> model to some particular case it was already inaccurate on. Any results
> you're getting using this setting will likely fall into the category of
> "garbage in, garbage out".
>
> What led you to choose that number?

Uh, it seems that the scale of our machine caused confusion.
It has a bit large disk array, and I think random_page_cost = 92.89 is reasonable
(the reason is described later).

This problem could be observed in normal environment like random_page_cost = 4.
I checked estimated cost of query (1) and query (2) with random_page_cost = 4,
and there were noticeable cost estimation difference between them (see an attached graph).
As I explained in the first mail, execution time of query (1) and query (2) are
expected to be almost the same. If so, there is something wrong with cost estimation
logic, and this problem is not specific to particular environment. The result I
showed was just an example of confirmation of the problem.

Next, I'd like to explain our environment and the reason of random_page_cost value.
Our machine has a RAID6 array with 24 NL-SAS HDDs and its raw I/O performance is:
Sequential access: 2.5GB/s
Random access (8KB block): 1.6MB/s (=200 IOPS * 8KB )

For Postgres, we have filesystems and some calculations like tuple processing or
aggregations, so sequential access becomes slower. Here I show basic performance
values of I/O intensive queries like simple scan on a large table:
Sequential access: 200-300MB/s(*)
Random access: 200 IOPS * 8KB = 1.6MB/s
(*) depends on aggregation or any other calculations

So sequential access is 125-188 times faster than random access. We assume OLAP
workload, so cache hit ratio is not so high. That means performance gap between
sequential I/O and random I/O would not be mitigated by cache effect.
Therefore we think random_page_cost should be around 100 in our environment.

random_page_cost = 92.89 is the result of calibration with multiple TPC-H queries,
which covers from simple scan on single table to complex joins of multiple tables,
and this value gives good cost estimation for TPC-H queries.

Thanks.
Ryoji

Attachment Content-Type Size
Figure2-1.png image/png 278.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KAWAMICHI Ryoji 2015-11-12 08:19:17 Re: Erroneous cost estimation for nested loop join
Previous Message Craig Ringer 2015-11-12 07:56:38 Re: pglogical_output - a general purpose logical decoding output plugin