From: | Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | alvherre(at)2ndquadrant(dot)com, pryzby(at)telsasoft(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: dsa_allocate() faliure |
Date: | 2018-11-30 19:20:49 |
Message-ID: | CAJk1zg2kgnAbWAuM3oG20EN_Fvin2Z5OtWJHTR711S2jbNwQQA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Hi, just a small update.
I've configured the OS for taking crash dumps on Ubuntu 16.04 with the
following (maybe somebody will find it helpful):
I've added LimitCORE=infinity to /lib/systemd/system/postgresql(at)(dot)service
under [Service] section
I've reloaded the service config with sudo systemctl daemon-reload
Changed the core pattern to: sudo echo
/var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern
I had tested it with kill -ABRT pidofbackend and it behaved correctly. A
crash dump was written.
In the last days I've been monitoring no segfault occurred but the
das_allocation did.
I'm starting to doubt if the segfault I've found in dmesg was actually
related.
I've grepped the postgres log for dsa_allocated:
Why do the messages occur sometimes as FATAL and sometimes as ERROR?
2018-11-29 07:59:06 CET::@:[20584]: FATAL: dsa_allocate could not find 7
free pages
2018-11-29 07:59:06 CET:127.0.0.1(40846):user(at)db:[19507]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 09:04:13 CET::@:[27341]: FATAL: dsa_allocate could not find 13
free pages
2018-11-30 09:04:13 CET:127.0.0.1(41782):user(at)db:[25417]: ERROR:
dsa_allocate could not find 13 free pages
2018-11-30 09:28:38 CET::@:[30215]: FATAL: dsa_allocate could not find 4
free pages
2018-11-30 09:28:38 CET:127.0.0.1(45980):user(at)db:[29924]: ERROR:
dsa_allocate could not find 4 free pages
2018-11-30 16:37:16 CET::@:[14385]: FATAL: dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET::@:[14375]: FATAL: dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET:212.186.105.45(55004):user(at)db:[14386]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54964):user(at)db:[14379]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54916):user(at)db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:45:11 CET:212.186.105.45(55356):user(at)db:[14555]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15359]: FATAL: dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET::@:[15363]: FATAL: dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET:212.186.105.45(54964):user(at)db:[14379]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54916):user(at)db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(55842):user(at)db:[14815]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:56:11 CET:212.186.105.45(57076):user(at)db:[15638]: FATAL:
dsa_allocate could not find 7 free pages
There's quite a bit errors from today but I was launching the problematic
query in parallel from 2-3 sessions.
Sometimes it was breaking sometimes not.
Couldn't find any pattern.
The workload on this db is not really constant, rather bursting.
--
regards,
Jakub Glapa
On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:
> On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > Hmm. I will see if I can come up with a many-partition torture test
> > reproducer for this.
>
> No luck. I suppose one theory that could link both failure modes
> would a buffer overrun, where in the non-shared case it trashes a
> pointer that is later dereferenced, and in the shared case it writes
> past the end of allocated 4KB pages and corrupts the intrusive btree
> that lives in spare pages to track available space.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2018-11-30 19:55:27 | Re: Add function to release an allocated SQLDA |
Previous Message | Andrew Dunstan | 2018-11-30 19:18:18 | Re: pgsql: Switch pg_verify_checksums back to a blacklist |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2018-11-30 20:46:47 | Re: dsa_allocate() faliure |
Previous Message | Pavel Stehule | 2018-11-30 14:54:12 | Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0 |