From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | pabloa98 <pabloa98(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Michael Lewis <mlewis(at)entrata(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Rob Sargent <robjsargent(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Could postgres12 support millions of sequences? (like 10 million) |
Date: | 2020-03-20 17:26:15 |
Message-ID: | 032b7a7f-e08a-fb7d-abf9-4a227f815f7a@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 3/20/20 9:59 AM, Adrian Klaver wrote:
> On 3/19/20 10:31 PM, pabloa98 wrote:
>> I see.
>>
>> Any suggestion? It should behave like a sequence in the sense that
>> concurrent transitions will get different numbers from this
>> alternative sequence like solution.
>>
>> In our case, we will need to do a call nextval('some_seq') (or
>> similar) from different processes no more than twice every minute.
>>
>>
>> It would be nice to have a sequence data type. Correct me if I am
>> wrong, but It seems to me that a sequence data type would cost the
>> same or less than the current sequence implementation.
>>
>> The schema would be more clear too. We could have a table like:
>>
>> CREATE TABLE pair(
>> group INT NOT NULL,
>> element INT NOT NULL,
>> seq SEQUENCE INCREMENT 1
>> START 1
>> CACHE 1
>> MINVALUE 1
>> MAXVALUE 99999999
>> NOT NULL,
>> CONSTRAINT PRIMARY KEY (group, element)
>> );
>>
>> And then:
>>
>> INSERT INTO event(group, element, code)
>> VALUES (
>> 1,
>> 1,
>> ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE
>> p.group=1 and p.code=1 )
>> );
>>
>> Or perhaps storing all the sequences in the same table as rows will
>> have the same behavior.
>
> If code is just something to show the sequence of insertion for group,
> element combinations then maybe something like below:
>
> CREATE TABLE event(
> group_id INT NOT NULL, --changed as group is reserved word
> element INT NOT NULL,
> insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
> PRIMARY KEY(group_id, element, insert_ts)
> );
>
>
> insert into event(group_id, element) VALUES
> (1, 1),
> (1, 1),
> (1, 1),
> (2, 1),
> (1, 1),
> (1, 3),
> (1, 1),
> (1, 3),
> (2, 1),
> (2, 1);
>
>
> select * from event ;
> group_id | element | insert_ts
> ----------+---------+--------------------------------
> 1 | 1 | 03/20/2020 09:51:12.675926 PDT
> 1 | 1 | 03/20/2020 09:51:12.675985 PDT
> 1 | 1 | 03/20/2020 09:51:12.675991 PDT
> 2 | 1 | 03/20/2020 09:51:12.675996 PDT
> 1 | 1 | 03/20/2020 09:51:12.676 PDT
> 1 | 3 | 03/20/2020 09:51:12.676004 PDT
> 1 | 1 | 03/20/2020 09:51:12.676008 PDT
> 1 | 3 | 03/20/2020 09:51:12.676012 PDT
> 2 | 1 | 03/20/2020 09:51:12.676016 PDT
> 2 | 1 | 03/20/2020 09:51:12.67602 PDT
> (10 rows)
>
>
> select group_id, element, row_number() OVER (partition by (group_id,
> element) order by (group_id, element)) AS code from event;
> group_id | element | code
> ----------+---------+------
> 1 | 1 | 1
> 1 | 1 | 2
> 1 | 1 | 3
> 1 | 1 | 4
> 1 | 1 | 5
> 1 | 3 | 1
> 1 | 3 | 2
> 2 | 1 | 1
> 2 | 1 | 2
> 2 | 1 | 3
> (10 rows)
Oops the above actually returned:
select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element)) AS code, insert_ts from event;
group_id | element | code | insert_ts
----------+---------+------+--------------------------------
1 | 1 | 1 | 03/20/2020 09:51:12.675926 PDT
1 | 1 | 2 | 03/20/2020 09:51:12.675985 PDT
1 | 1 | 3 | 03/20/2020 09:51:12.675991 PDT
1 | 1 | 4 | 03/20/2020 09:51:12.676008 PDT
1 | 1 | 5 | 03/20/2020 09:51:12.676 PDT
1 | 3 | 1 | 03/20/2020 09:51:12.676004 PDT
1 | 3 | 2 | 03/20/2020 09:51:12.676012 PDT
2 | 1 | 1 | 03/20/2020 09:51:12.67602 PDT
2 | 1 | 2 | 03/20/2020 09:51:12.676016 PDT
2 | 1 | 3 | 03/20/2020 09:51:12.675996 PDT
(10 rows)
Needs to be:
select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element, insert_ts)) AS code, insert_ts
from event;
group_id | element | code | insert_ts
----------+---------+------+--------------------------------
1 | 1 | 1 | 03/20/2020 09:51:12.675926 PDT
1 | 1 | 2 | 03/20/2020 09:51:12.675985 PDT
1 | 1 | 3 | 03/20/2020 09:51:12.675991 PDT
1 | 1 | 4 | 03/20/2020 09:51:12.676 PDT
1 | 1 | 5 | 03/20/2020 09:51:12.676008 PDT
1 | 3 | 1 | 03/20/2020 09:51:12.676004 PDT
1 | 3 | 2 | 03/20/2020 09:51:12.676012 PDT
2 | 1 | 1 | 03/20/2020 09:51:12.675996 PDT
2 | 1 | 2 | 03/20/2020 09:51:12.676016 PDT
2 | 1 | 3 | 03/20/2020 09:51:12.67602 PDT
(10 rows)
>
>>
>> Pablo
>>
>>
>> On Thu, Mar 19, 2020 at 7:56 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
>> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>>
>> Michael Lewis <mlewis(at)entrata(dot)com <mailto:mlewis(at)entrata(dot)com>>
>> writes:
>> > On Thu, Mar 19, 2020, 5:48 PM David G. Johnston
>> <david(dot)g(dot)johnston(at)gmail(dot)com <mailto:david(dot)g(dot)johnston(at)gmail(dot)com>>
>> > wrote:
>> >> However, one other consideration with sequences: do you care that
>> >> PostgreSQL will cache/pin (i.e., no release) every single
>> sequence you
>> >> touch for the lifetime of the session? (I do not think DISCARD
>> matters here
>> >> but I'm just guessing)
>>
>> > Would you expand on this point or is there someplace specific
>> in the
>> > documentation on this?
>>
>> I think what David is worried about is that a sequence object is a
>> one-row table in PG's implementation. Thus
>>
>> (1) each sequence requires a dozen or two rows in assorted system
>> catalogs (not sure exactly how many offhand).
>>
>> (2) each sequence uses up 8KB on disk for its table file.
>>
>> (3) each sequence you actually access within a particular session
>> results in creation of relcache and catcache entries in that
>> session's local memory. I'm not sure offhand how big those are
>> either, but a few KB per sequence would be a reasonable guess.
>>
>> (4) each sequence competes for space in the shared-buffer arena,
>> since its 8K block has to be swapped into there whenever you try
>> to access/increment the sequence's value.
>>
>> This is all mighty inefficient of course, and there's been talk
>> of trying to reduce the per-sequence overhead; but I don't know
>> of anyone actively working on that. As things stand, I think
>> having millions of sequences would be quite painful performance-
>> wise, especially if your workload were actively hitting a lot
>> of them concurrently. It would work, for some value of "work",
>> but it wouldn't perform very well.
>>
>> Also, as multiple people mentioned already, this still wouldn't
>> guarantee gap-free sequences of ID values.
>>
>> regards, tom lane
>>
>
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Justin King | 2020-03-20 17:42:31 | Re: PG12 autovac issues |
Previous Message | Adrian Klaver | 2020-03-20 16:59:51 | Re: Could postgres12 support millions of sequences? (like 10 million) |