Re: Relation extension scalability

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Relation extension scalability
Date: 2016-03-26 09:48:44
Message-ID: CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 26, 2016 at 8:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think we need to start testing these patches not only in terms of
> how *fast* they are but how *large* the relation ends up being when
> we're done. A patch that inserts the rows slower but the final
> relation is smaller may be better overall. Can you retest v13, v14,
> and master, and post not only the timings but the relation size
> afterwards? And maybe post the exact script you are using?
>

I have tested the size and performance, scripts are attached in the mail.

COPY 1-10 bytes tuple from 32 Clients
Base V13 V14
-------- ---------
---------
TPS 123 874 446
No. Of Tuples 148270000 1049980000 536370000
Relpages 656089 4652593 2485482
INSERT 1028 bytes Tuples From 16 Clients
Base V13 V14
-------- --------
---------
TPS 42 211 120
No. Of Tuples 5149000 25343000 14524000
Rel Pages 735577 3620765 2140612

As per above results If we calculate the tuple number of tuples and
respective relpages, then neither in v13 nor v14 there are extra unused
pages.

As per my calculation for INSERT (1028 byte tuple) each page contain 7
tuples so
number of pages required
Base: 5149000/7 = 735571 (from relpages we can see 6 pages are extra)
V13 : 25343000/7= 3620428 (from relpages we can see ~300 pages are extra).
V14 : 14524000/7= 2074857 (from relpages we can see ~70000 pages are
extra).

With V14 we have found max pages number of extra pages, I expected V13 to
have max unused pages, but it's v14 and I tested it in multiple runs and
v13 is always the winner. I tested with multiple client count also like 8,
32 and v13 always have only ~60-300 extra pages out of total ~2-4 Million
Pages.

Attached files:
-------------------
test_size_ins.sh --> automated script to run insert test and calculate
tuple and relpages.
test_size_copy --> automated script to run copy test and calculate tuple
and relpages.
copy_script -> copy pg_bench script used by test_size_copy.sh
insert_script --> insert pg_bench script used by test_size_ins.sh

> Maybe something like this would help:
>
> if (needLock)
> {
> if (!use_fsm)
> LockRelationForExtension(relation, ExclusiveLock);
> else if (!ConditionLockRelationForExtension(relation,
> ExclusiveLock))
> {
> BlockNumber last_blkno =
> RelationGetNumberOfBlocks(relation);
>
> targetBlock = GetPageWithFreeSpaceExtended(relation,
> last_blkno, len + saveFreeSpace);
> if (targetBlock != InvalidBlockNumber)
> goto loop;
>
> LockRelationForExtension(relation, ExclusiveLock);
> targetBlock = GetPageWithFreeSpace(relation, len +
> saveFreeSpace);
> if (targetBlock != InvalidBlockNumber)
> {
> UnlockRelationForExtension(relation, ExclusiveLock);
> goto loop;
> }
> RelationAddExtraBlocks(relation, bistate);
> }
> }
>
> I think this is better than what you had before with lastValidBlock,
> because we're actually interested in searching the free space map at
> the *very end* of the relation, not wherever the last target block
> happens to have been.
>
> We could go further still and have GetPageWithFreeSpace() always
> search the last, say, two pages of the FSM in all cases. But that
> might be expensive. The extra call to RelationGetNumberOfBlocks seems
> cheap enough here because the alternative is to wait for a contended
> heavyweight lock.
>

I will try the test with this also and post the results.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
copy_script application/octet-stream 50 bytes
insert_script application/octet-stream 41 bytes
test_size_copy.sh application/x-sh 1.1 KB
test_size_ins.sh application/x-sh 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Piotr Stefaniak 2016-03-26 11:12:44 Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
Previous Message Tatsuo Ishii 2016-03-26 09:18:33 Re: multivariate statistics v14