| From: | Achilleas Mantzios <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com> |
|---|---|
| To: | Mladen Marinović <marin(at)kset(dot)org>, "Efrain J(dot) Berdecia" <ejberdecia(at)yahoo(dot)com> |
| Cc: | SERHAD ERDEM <serhade(at)hotmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Different execution plans in PG17 and pgBouncer... |
| Date: | 2025-05-05 14:09:01 |
| Message-ID: | 5cf4d905-8f2f-4b02-a77a-b633c087c54a@cloud.gatewaynet.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On 5/5/25 14:26, Mladen Marinović wrote:
> Hi,
>
> Mystery not solved...but identified. The pool is in transaction mode
> and some connections use set enable_mergejoin=off, but they do not set
> it back to on. Upon getting the connection from the pool the parameter
> is still set to off causing the planner to not use this kind of join
> which results in different plans when using this tainted pgbouncer
> connection instead of the clean one from pg17.
>
> The problem is that server_reset_query is not used when the pool is in
> transaction mode. Now, we have to see how to fix this problem.
But you've got this : https://www.pgbouncer.org/config.html
"
server_reset_query_always
Whether |server_reset_query| should be run in all pooling modes. When
this setting is off (default), the |server_reset_query| will be run only
in pools that are in sessions-pooling mode. Connections in
transaction-pooling mode should not have any need for a reset query.
This setting is for working around broken setups that run applications
that use session features over a transaction-pooled PgBouncer. It
changes non-deterministic breakage to deterministic breakage: Clients
always lose their state after each transaction.
"
>
> Regards,
> Mladen Marinović
>
> On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia
> <ejberdecia(at)yahoo(dot)com> wrote:
>
> Is the query using parameter markers? Is the source executing the
> query forcing a "bad" data type casting?
>
> Yahoo Mail: Search, Organize, Conquer
> <https://mail.onelink.me/107872968?pid=nativeplacement&c=US_Acquisition_YMktg_315_SearchOrgConquer_EmailSignature&af_sub1=Acquisition&af_sub2=US_YMktg&af_sub3=&af_sub4=100002039&af_sub5=C01_Email_Static_&af_ios_store_cpp=0c38e4b0-a27e-40f9-a211-f4e2de32ab91&af_android_url=https://play.google.com/store/apps/details?id=com.yahoo.mobile.client.android.mail&listing=search_organize_conquer>
>
> On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović
> <marin(at)kset(dot)org> wrote:
>
>
> On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM
> <serhade(at)hotmail(dot)com> wrote:
>
> Hi , you had better try vacuum analyze for the whole db
> , pgbouncer connection layer can not causeslow queries.
>
>
> I did that already. But the slow query is the consequence of
> the different plan, not the statistics.
>
> ------------------------------------------------------------------------
> *From:* Mladen Marinović <marin(at)kset(dot)org>
> *Sent:* Monday, May 5, 2025 12:27 PM
> *To:* Achilleas Mantzios <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com>
> *Cc:* pgsql-general(at)lists(dot)postgresql(dot)org
> <pgsql-general(at)lists(dot)postgresql(dot)org>
> *Subject:* Re: Different execution plans in PG17 and
> pgBouncer...
>
>
> On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios
> <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com> wrote:
>
>
> On 5/5/25 11:00, Mladen Marinović wrote:
>
>
>
> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios
> <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com> wrote:
>
>
> On 5/5/25 09:52, Mladen Marinović wrote:
>
> Hi,
>
> We recently migrated our production
> instances from PG11 to PG17. While doing
> so we upgraded our pgBouncer instances
> from 1.12 to 1.24. As everything worked on
> the test servers we pushed this to
> production a few weeks ago. We did not
> notice any problems until a few days ago
> (but the problems were here from the
> start). The main manifestation of the
> problems is a service that runs a fixed
> query to get a backlog of unprocessed data
> (limited to a 1000 rows). When testing the
> query using pgAdmin connected directly to
> the database we get a result in cca. 20
> seconds. The same query runs for 2 hours
> when using pgBouncer to connect to the
> same database.
>
>
> That's a huge jump, I hope you guys did
> extensive testing of your app. In which
> language is your app written? If java, then
> define prepareThreshold=0 in your jdbc and set
> max_prepared_statements = 0 in pgbouncer.
>
> Mainly python, but the problem was noticed in a
> java service.
> Prepare treshold was already set to 0. We changed
> the max_prepared_statements to 0 from the default
> (200) but no change was noticed.
>
> How about search paths ? any difference on
> those between the two runs ? Do you set
> search_path in pgbouncer ? what is "cca." btw ?
>
>
> The more interesting part is that when we
> issue an explain of the same query we get
> different plans. We did this a few seconds
> apart so there should be no difference in
> collected statistics. We ruled out
> prepared statements, as we suspected the
> generic plan might be the problem, but it
> is not. Is there any pgBouncer or PG17
> parameter that might be the cause of this?
>
>
> Does this spawn any connections (such as
> dblink) ? are there limits per user/db
> pool_size in pgbouncer ?
>
> No additional connection nor dbling. Just plain
> SQL (CTE, SELECT, INSERT, UPDATE, DELETE,...)
> There are limits, but they are not hit. The query
> just uses a different plan and runs slower because
> of that.
>
> Pgbouncer, in contrast to its old friend
> PgPool-II is completely passive, just passes
> through SQL to the server as fast as possible
> as it can. But I am sure you know that. Good
> luck, keep us posted!
>
> Yes, that is what puzzles me.
>
> What is the pgbouncer's timeout in the server
> connections ?
>
> How about "idle in transaction" ? do you get any of
> those? What's the isolation level ?
>
> How about the user ? is this the same user doing
> pgadmin queries VS via the app ?
>
> Can you identify the user under which the problem is
> manifested and :
>
> ALTER user "unlucky_user" SET log_statement = 'all';
>
> ALTER user "unlucky_user" SET
> log_min_duration_statement = 0; -- to help you debug
> the prepared statements .. just in case , and other
> stuff not printed by log_statement = all.
>
> None of those parameters should affect the fact that when
> issuing the explain select query (the statement is not
> prepared) from psql directly gives a different result than
> issuing it over the pgbouncer connection. The result is
> repeatable.
>
> We have rolled back pgbouncer to 1.12. and it seems the
> problem persists. This is one of the weirdest things I
> have ever seen with PostgreSQL.
>
> Regards,
> Mladen Marinović
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Merlin Moncure | 2025-05-05 23:15:35 | Re: Is anyone up for hosting the online PG game "Schemaverse"? |
| Previous Message | Álvaro Herrera | 2025-05-05 14:08:03 | Re: Different execution plans in PG17 and pgBouncer... |