Re: optimizing vacuum truncation scans

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing vacuum truncation scans
Date: 2015-09-28 06:05:47
Message-ID: CAJrrPGc3o+HBWQBEhFn33b0bxDWWzss=AuwNhRk4bn1=tJJNrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 4, 2015 at 2:18 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>
>> On 22 July 2015 at 17:11, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>>
>>> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
>>> wrote:
>>>>
>>>> On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
>>>> wrote:
>>>> > Attached is a patch that implements the vm scan for truncation. It
>>>> > introduces a variable to hold the last blkno which was skipped during
>>>> > the
>>>> > forward portion. Any blocks after both this blkno and after the last
>>>> > inspected nonempty page (which the code is already tracking) must have
>>>> > been
>>>> > observed to be empty by the current vacuum. Any other process
>>>> > rendering the
>>>> > page nonempty are required to clear the vm bit, and no other process
>>>> > can set
>>>> > the bit again during the vacuum's lifetime. So if the bit is still
>>>> > set, the
>>>> > page is still empty without needing to inspect it.
>>>>
>>>> Urgh. So if we do this, that forever precludes having HOT pruning set
>>>> the all-visible bit.
>>>
>>>
>>> I wouldn't say forever, as it would be easy to revert the change if
>>> something more important came along that conflicted with it.
>>
>>
>> I think what is being said here is that someone is already using this
>> technique, or if not, then we actively want to encourage them to do so as an
>> extension or as a submission to core.
>>
>> In that case, I think the rely-on-VM technique sinks again, sorry Jim,
>> Jeff. Probably needs code comments added.
>
>
> Sure, that sounds like the consensus. The VM method was very efficient, but
> I agree it is pretty fragile and restricting.
>
>>
>>
>> That does still leave the prefetch technique, so all is not lost.
>>
>> Can we see a patch with just prefetch, probably with a simple choice of
>> stride? Thanks.
>
>
> I probably won't get back to it this commit fest, so it can be set to
> returned with feedback. But if anyone has good ideas for how to set the
> stride (or detect that it is on SSD and so is pointless to try) I'd love to
> hear about them anytime.

I got the following way to get the whether data file is present in the
DISK or SSD.

1. Get the device file system that table data file is mapped using the
following or similar.

df -P "filename" | awk 'NR==2{print $1}'

2. if the device file system is of type /dev/sd* then treat is as a
disk system and proceed
with the prefetch optimization.

3. if we are not able to find the device details directly then we need
to get the information
from the mapping system.

Usually the devices will map like the following

/dev/mapper/v** points to ../dm-*

4. Get the name of the "dm-*" from the above details and check
whether it is a SSD or not
with the following command.

/sys/block/dm-*/queue/rotation

5. If the value is 0 then it is an SSD drive, 1 means disk drive.

The described above procedure works only for linux. I didn't check for
other operating systems yet.
Is it worth to consider?

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2015-09-28 07:34:23 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Tatsuo Ishii 2015-09-28 06:01:48 Re: Doubt in pgbench TPS number