From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: bitmaps and correlation |
Date: | 2020-01-07 05:26:06 |
Message-ID: | 20200107052606.GP12066@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 07, 2020 at 09:21:03AM +0530, Dilip Kumar wrote:
> On Tue, Jan 7, 2020 at 1:29 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > Find attached cleaned up patch.
> > For now, I updated the regress/expected/, but I think the test maybe has to be
> > updated to do what it was written to do.
>
> I have noticed that in "cost_index" we have used the indexCorrelation
> for computing the run_cost, not the number of pages whereas in your
> patch you have used it for computing the number of pages. Any reason
> for the same?
As Jeff has pointed out, high correlation has two effects in cost_index():
1) the number of pages read will be less;
2) the pages will be read more sequentially;
cost_index reuses the pages_fetched variable, so (1) isn't particularly clear,
but does essentially:
/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
pages_fetched(MIN) = index_pages_fetched(tuples_fetched,
baserel->pages,
(double) index->pages,
root);
max_IO_cost = pages_fetchedMIN * spc_random_page_cost;
/* min_IO_cost is for the perfectly correlated case (csquared=1) */
pages_fetched(MAX) = ceil(indexSelectivity * (double) baserel->pages);
min_IO_cost = (pages_fetchedMAX - 1) * spc_seq_page_cost;
My patch 1) changes compute_bitmap_pages() to interpolate pages_fetched using
the correlation; pages_fetchedMIN is new:
> Patch
> - pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
> + pages_fetchedMAX = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
> +
> + /* pages_fetchedMIN is for the perfectly correlated case (csquared=1) */
> + pages_fetchedMIN = ceil(indexSelectivity * (double) baserel->pages);
> +
> + pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);
And, 2) also computes cost_per_page by interpolation between seq_page and
random_page cost:
+ cost_per_page_corr = spc_random_page_cost -
+ (spc_random_page_cost - spc_seq_page_cost)
+ * (1-correlation*correlation);
Thanks for looking. I'll update the name of pages_fetchedMIN/MAX in my patch
for consistency with cost_index.
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Nagaraj Raj | 2020-01-07 06:15:09 | pg_repack failure |
Previous Message | Amit Langote | 2020-01-07 05:01:49 | Re: adding partitioned tables to publications |