Re: Keeping temporary tables in shared buffers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Asim Praveen <apraveen(at)pivotal(dot)io>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Kimura <dkimura(at)pivotal(dot)io>
Subject: Re: Keeping temporary tables in shared buffers
Date: 2018-05-28 08:25:19
Message-ID: CAA4eK1JF-C+OD6cziQvNztXpaNVEcRECywU+vTbYeQGyPUZxsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 25, 2018 at 6:33 AM, Asim Praveen <apraveen(at)pivotal(dot)io> wrote:
> Hello
>
> We are evaluating the use of shared buffers for temporary tables. The
> advantage being queries involving temporary tables can make use of parallel
> workers.
>

This is one way, but I think there are other choices as well. We can
identify and flush all the dirty (local) buffers for the relation
being accessed parallelly. Now, once the parallel operation is
started, we won't allow performing any write operation on them. It
could be expensive if we have a lot of dirty local buffers for a
particular relation. I think if we are worried about the cost of
writes, then we can try some different way to parallelize temporary
table scan. At the beginning of the scan, leader backend will
remember the dirty blocks present in local buffers, it can then share
the list with parallel workers which will skip scanning those blocks
and in the end leader ensures that all those blocks will be scanned by
the leader. This shouldn't incur a much additional cost as the
skipped blocks should be present in local buffers of backend.

I understand that none of these alternatives are straight-forward, but
I think it is worth considering whether we have any better way to
allow parallel temporary table scans.

> Challenges:
> 1. We lose the performance benefit of local buffers.

Yeah, I think cases, where we need to drop temp relations, will become
costlier as they have to traverse all the shared buffers instead of
just local buffers.

I think if we use shared buffers for temp relations, there will be
some overhead for other backends as well, especially for the cases
when backends need to evict buffers. It is quite possible that if the
relation is in local buffers, we might not write it at all, but moving
it to shared buffers will increase its probability of being written to
disk.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-05-28 08:52:48 Few comments on commit 857f9c36 (skip full index scans )
Previous Message Craig Ringer 2018-05-28 08:13:35 Re: Is a modern build system acceptable for older platforms