Re: Stack overflow issue

From: Егор Чиндяскин <kyzevan23(at)mail(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Stack overflow issue
Date: 2023-01-03 15:40:57
Message-ID: 1672760457.940462079@f306.i.mail.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>Среда, 26 октября 2022, 21:47 +07:00 от Egor Chindyaskin <kyzevan23(at)mail(dot)ru>:

>24.08.2022 20:58, Tom Lane writes:
>> Nice work! I wonder if you can make the regex crashes reachable by
>> reducing the value of max_stack_depth enough that it's hit before
>> reaching the "regular expression is too complex" limit.
>>
>> regards, tom lane Hello everyone! It's been a while since me and Alexander Lakhin have
>published a list of functions that have a stack overflow illness. We are
>back to tell you more about such places.
>During our analyze we made a conclusion that some functions can be
>crashed without changing any of the parameters and some can be crashed
>only if we change some stuff.
>
>The first function crashes without any changes:
>
># CheckAttributeType
>
>(n=60000; printf "create domain dint as int; create domain dinta0 as
>dint[];"; for ((i=1;i<=$n;i++)); do printf "create domain dinta$i as
>dinta$(( $i - 1 ))[]; "; done; ) | psql
>psql -c "create table t(f1 dinta60000[]);"
>
>Some of the others crash if we change "max_locks_per_transaction"
>parameter:
>
># findDependentObjects
>
>max_locks_per_transaction = 200
>
>(n=10000; printf "create table t (i int); create view v0 as select *
>from t;"; for ((i=1;i<$n;i++)); do printf "create view v$i as select *
>from v$(( $i - 1 )); "; done; ) | psql
>psql -c "drop table t"
>
># ATExecDropColumn
>
>max_locks_per_transaction = 300
>
>(n=50000; printf "create table t0 (a int, b int); "; for
>((i=1;i<=$n;i++)); do printf "create table t$i() inherits(t$(( $i - 1
>))); "; done; printf "alter table t0 drop b;" ) | psql
>
># ATExecDropConstraint
>
>max_locks_per_transaction = 300
>
>(n=50000; printf "create table t0 (a int, b int, constraint bc check (b
> > 0));"; for ((i=1;i<=$n;i++)); do printf "create table t$i()
>inherits(t$(( $i - 1 ))); "; done; printf "alter table t0 drop
>constraint bc;" ) | psql
>
># ATExecAddColumn
>
>max_locks_per_transaction = 200
>
>(n=50000; printf "create table t0 (a int, b int);"; for
>((i=1;i<=$n;i++)); do printf "create table t$i() inherits(t$(( $i - 1
>))); "; done; printf "alter table t0 add column c int;" ) | psql
>
># ATExecAlterConstrRecurse
>
>max_locks_per_transaction = 300
>
>(n=50000;
>printf "create table t(a int primary key); create table pt (a int
>primary key, foreign key(a) references t) partition by range (a);";
>printf "create table pt0 partition of pt for values from (0) to (100000)
>partition by range (a);";
>for ((i=1;i<=$n;i++)); do printf "create table pt$i partition of pt$((
>$i - 1 )) for values from ($i) to (100000) partition by range (a); "; done;
>printf "alter table pt alter constraint pt_a_fkey deferrable initially
>deferred;"
>) | psql
>
>This is where the fun begins. According to Tom Lane, a decrease in
>max_stack_depth could lead to new crashes, but it turned out that
>Alexander was able to find new crashes precisely due to the increase in
>this parameter. Also, we had ulimit -s set to 8MB as the default value.
>
># eval_const_expressions_mutator
>
>max_stack_depth = '7000kB'
>
>(n=10000; printf "select 'a' "; for ((i=1;i<$n;i++)); do printf "
>collate \"C\" "; done; ) | psql
>
>If you didn’t have a crash, like me, when Alexander shared his find,
>then probably you configured your cluster with an optimization flag -Og.
>In the process of trying to break this function, we came to the
>conclusion that the maximum stack depth depends on the optimization flag
>(-O0/-Og). As it turned out, when optimizing, the function frame on the
>stack becomes smaller and because of this, the limit is reached more
>slowly, therefore, the system can withstand more torment. Therefore,
>this query will fail if you have a cluster configured with the -O0
>optimization flag.
>
>The crash of the next function not only depends on the optimization
>flag, but also on a number of other things. While researching, we
>noticed that postgres enforces a distance ~400kB from max_stack_depth to
>ulimit -s. We thought we could hit the max_stack_depth limit and then
>hit the OS limit as well. Therefore, Alexander wrote a recursive SQL
>function, that eats up a stack within max_stack_depth, including a query
>that eats up the remaining ~400kB. And this causes a crash.
>
># executeBoolItem
>
>max_stack_depth = '7600kB'
>
>create function infinite_recurse(i int) returns int as $$
>begin
>   raise notice 'Level %', i;
>   begin
>     perform jsonb_path_query('{"a":[1]}'::jsonb, ('$.a[*] ? (' ||
>repeat('!(', 4800) || '@ == @' || repeat(')', 4800) || ')')::jsonpath);
>   exception
>     when others then raise notice 'jsonb_path_query error at level %,
>%', i, sqlerrm;
>   end;
>   begin
>     select infinite_recurse(i + 1) into i;
>   exception
>     when others then raise notice 'Max stack depth reached at level %,
>%', i, sqlerrm;
>   end;
>   return i;
>end;
>$$ language plpgsql;
>
>select infinite_recurse(1);
>
>To sum it all up, we have not yet decided on a general approach to such
>functions. Some functions are definitely subject to stack overflow. Some
>are definitely not. This can be seen from the code where the recurse
>flag is passed, or a function that checks the stack is called before a
>recursive call. Some require special conditions - for example, you need
>to parse the query and build a plan, and at that stage the stack is
>eaten faster (and checked) than by the function that we are interested in.
>
>We keep researching and hope to come up with a good solution sooner or
>later.
Hello, in continuation of the topic of the stack overflow problem, Alexander Lakhin was able to find a few more similar places.
 
An important point for the first function is that the server must be built with asserts enabled, otherwise the crash will not happen.
Also, the result in the form of a server crash will be achieved only after 2-3 hours.
 
#MemoryContextCheck
(n=1000000; printf "begin;"; for ((i=1;i<=$n;i++)); do printf "savepoint s$i;"; done; printf "release s1;" ) | psql >/dev/null
 
Other functions could be crashed without asserts enabled.
 
#CommitTransactionCommand
(n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "ERROR; COMMIT;") | psql >/dev/null
 
#MemoryContextStatsInternal
(n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "SELECT pg_log_backend_memory_contexts(pg_backend_pid())") | psql >/dev/null
 
#ShowTransactionStateRec
(n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "SET log_min_messages = 'DEBUG5'; SAVEPOINT sp;") | psql >/dev/null
 
The following next two functions call each other; the following way was found to overflow the stack (with modified server configuration):
 
#MemoryContextDeleteChildren with MemoryContextDelete
 
max_connections = 1000
max_stack_depth = '7600kB'
 
create table idxpart (a int) partition by range (a);
 
select 'create index on idxpart (a)' from generate_series(1, 40000);
\gexec
 
create table idxpart (a int) partition by range (a);
 
select 'create index on idxpart (a)' from generate_series(1, 40000);
\gexec
 
create function infinite_recurse(level int) returns int as $$
declare l int;
begin
   begin
     select infinite_recurse(level + 1) into level;
   exception
     when others then raise notice 'Max stack depth reached at level %, %', level, sqlerrm;
 
     create table idxpart1 partition of idxpart for values from (1) to (2) partition by range (a);  
 
   end;
   return level;
end;
$$ language plpgsql;
 
select infinite_recurse(1);
 
Finally, there are yet two recursive functions in mcxt.c:
 
#MemoryContextResetChildren - could be vulnerable but not used at all after eaa5808e.
 
#MemoryContextMemAllocated - at present called only with local contexts.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sascha Kuhl 2023-01-03 15:45:16 Re: Stack overflow issue
Previous Message Michail Nikolaev 2023-01-03 15:20:11 Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION