Quick Links

Re: BUG #16363: Memory increases for each table accessed until connection is closed

From:	Eduardo Barreto Alexandre <sezdocs(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #16363: Memory increases for each table accessed until connection is closed
Date:	2020-04-16 14:18:04
Message-ID:	CAG6JkqN7zUOaNdAsEru91UQYk=ov-rWHxZCSOm=MnwXyf+hsyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Sending the reply again since it seems that it didn't go through the
mailing list

Hello Tom, first of all, thanks for the response!

Sorry to not have added a lot of the needed information, I will try to
rectify that in this message.

So, first is what I'm trying to do, in my backend, I store financial data
(candlesticks and trades) in Postgres + TimescaleDB, the way I do it is
separating each exchange in one database and each market and timeframe in
one table, for example, giving that I have `ExchangeA` with 1 market
EUR/USD and 4 timeframes, I would get these 5 tables:
candles_eur_usd_one_minute
candles_eur_usd_five_minutes
candles_eur_usd_third_minutes
candles_eur_usd_one_hour
trades_eur_usd

Normally in my backend, I have, per database (exchange) ~750 markets, and
~16 timeframes, so 750 * 16 = 12000 tables.

I noticed that Postgres would continuously increase my memory usage per
connection until I got an OOM error, which, in my backend terms, means
inserting candles and trades, ~2000 rows per minute. For a system with
16GB, it would take like 1 to 3 hours and then I would get the OOM. For my
other system which has 64GB, it would take like 2 days but it would come a
time when Postgres would use everything and crash too.

For versions, I was using Postgres 11 with latest TimescaleDB, since I was
not sure if this was a TimescaleDB issue, I first created a minimal example
that triggered the issue faster (the one I talked in the first e-mail that
uses 700 tables) and then tested without TimescaleDB in Postgres 11 and 12.
I got the same result in both, so removing TimescaleDB out of the equation.

For the operational system, I'm using ubuntu, but I do test it in my Gentoo
machine with the same results, and I also tested it using Postgres in
docker.

For postgresql.conf, I was using first one tunned by the TimescaleDB tuner
software, but then changed to a default one to see if it would make a
change.

My backend is written in Elixir, so I'm using Ecto library to connect with
Postgres, Ecto will create a pool of connections that will be alive as long
as my backend is alive too. To make it easier for me to test the issue, I
limited the number of connections to 1, so I can keep an eye on that
connection process memory usage, etc.

> Yes, Postgres caches some information about every table your session
> has accessed, and no you don't have a lot of control over that, and
> no it isn't a bug.

> Having said that, the amount of memory consumed this way is in the
> vicinity of a few tens of KB per table, according to some simple
> experiments I just did. I couldn't get a process touching 700 simple
> tables to eat more than ~70MB of space. So either you are talking about
> tables with enormously complex schemas, or there's something else going
> on. (Please note that the SO post you are referencing describes an
> attempt to use circa 100000 tables/views in one session, not 700.)

One thing that I forgot to explain better is what I'm doing to touch those
tables. Basically I created a little Elixir backend with Ecto which will
keep 1 connection alive for the pool and start a bulk insertion of rows to
one of the 700 tables, so, basically, the steps are:
1) Choose one of the tables randomly;
2) Insert in one query 13000 rows to that table;
3) Go back to step 1 and repeat.

This will trigger the issue instantly and will make the memory usage of
that connection grow very fast as shown in the last e-mail.

One very important thing to note is that if I change the above algorithm to
do the same thing, but always choose the same table in step 1 (so I will do
the bulk insertion always in 1 table instead of one of the 700), I will not
trigger the issue, memory usage will not go up at all. it only triggers it
if I'm doing the insertion in different tables each time.

> The memory consumption curve you showed looks suspiciously like you have
> circa 4GB shared buffers allocated and the reported number is mostly about
> how many of those buffers the process has touched so far. This is a
> common misunderstanding; "top" and similar tools tend to be really bad
> about accounting for shared memory usage in any useful way. In any case,
> the process you've shown us stopped growing its space consumption
> some time ago, so I wonder where the OOM complaint actually came from.

> The short answer is that you're probably barking up the wrong tree,
> but there's not enough information here to provide any useful guesses
> about which is the right tree.

I guess you are right, maybe the issue is with some shared buffers cache or
something, I tried lowering to the minimum value each possible option in
postgresql.conf that I found but still got the same OOM result.

Thanks a lot for the help, and feel free to ask for more information!

In response to

Re: BUG #16363: Memory increases for each table accessed until connection is closed at 2020-04-15 01:48:31 from Tom Lane

Responses

Re: BUG #16363: Memory increases for each table accessed until connection is closed at 2020-04-16 14:43:16 from luis.roberto

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	luis.roberto	2020-04-16 14:43:16	Re: BUG #16363: Memory increases for each table accessed until connection is closed
Previous Message	Devrim Gündüz	2020-04-16 11:03:35	Re: BUG #16366: unable to install pgadmin4