RE: BUG #15923: Prepared statements take way too much memory.

From: Daniel Migowski <dmigowski(at)ikoffice(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Thomas Munro" <thomas(dot)munro(at)gmail(dot)com>
Cc: "'pgsql-bugs(at)lists(dot)postgresql(dot)org'" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #15923: Prepared statements take way too much memory.
Date: 2019-07-28 11:14:45
Message-ID: 41ED3F5450C90F4D8381BC4D8DF6BBDCF02E19B4@EXCHANGESERVER.ikoffice.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

Thanks for all your replies, I try to combine the answers for the last mails on the subject.

------ In reply to Tom Lane -------

Creating a new type of MemoryContext (FlatMemoryContext?) sounds like a good idea for me, because it sounds minimal invasive, I will try that. I believe that it doesn't need a maximum block size, does it? One could give a size for blocks that are requested by malloc, and every time someone pallocs a larger area this is directly malloced and returned. I saw that AllocSec used an initial size which is doubled every time the previous size isn't sufficient anymore. Should the new MemoryContext should behave identically?

I wonder if this MemoryContext should make good use of a "finalize" function, which can be called after no work in the context is expected anymore, and which realloc's the allocated blocks to the really used size so the C-lib can reuse the unused memory. Maybe even AllocSec could profit from it as a first step. Checking the stats the used vs. malloced mem is at a low 60%.

> This wouldn't take very much new code, and it would potentially offer up to perhaps 2X space savings compared to aset.c, thanks to cutting the per-chunk overhead. Whether that's enough to float your boat with respect to cached plans isn't too clear, though.

No, boat will still sink, but it's a good start and servers are less likely to crash.

> A totally different idea is to make a variant version of copyObject that is intended to produce a compact form of a node tree, and does not create a separate palloc allocation for each node but just packs them as tightly as it can in larger palloc chunks. ..

On the copyObject code that should be autogenerated, I thought a lot about how this would be possible with macros, but couldn't think of anything, because one macro statement would have to produce source code in two distinct places (at least in two functions). Is there similar code somewhere else in the source tree ?

See below for more on that subject

> ...serialization of node tree...

I also don't like serialization very much, because I already rewrote the Standard Serialization Mechanism in Java very successfully, but now about the problems in case of graph type object dependencies (which don't seem to exist in NodeSets because then the objectCopy function would also have to cope with that and doesn't). But if everything else fails in the end it might by a good idea. However, I believe the size of the rewritten query plan is many times the original node tree and used often, so it might not be a good idea.

------ In reply to Thomas Munro ------

Regarding your comments on the variant version of copyObject, I also believe that a FlatMemoryContext would result in nearly the same memory usage at a first glance, except for the Chunk headers Tom Lane also noticed. One should measure how much space could really be saved when we don't need the chunk headers.

But I like the calculate-memory variant one would get when the copyObject code is generated. The two phase variant would not even be slower because the overhead of pallocing every chunk for every node MUST be higher than a simple calculation of the required space. This feels a bit like inlining the allocation logic into the copyObject code, and would be the best variant regarding speed AND size.

------ In reply to Andres Freund ------

> Yea, I think we'd have to autogenerate them to go for that. I'm kinda hoping that maybe John Naylor would take a look, after he made all our lives quite a bit easier with the other catalog script work... In the 'pipe dream' version of this, that script would be able to generate code to adjust pointers after a memcpy... That'd make copyObject() of such flattened trees much cheaper.

Thanks for the input, I will have a look at it. When copyObject on such trees is such a common operation, it might be feasible to store the length of such a readonly-tree somewhere within the tree itself, so one knows the exact length of a node including all its subnodes, which "should" always be in a contiguous range of memory anyway, and we would just have to apply an offset to all pointers within the copied area.

Or, when we come back to the two-pass idea of preallocating a large chunk of memory for the nodes, one could use the same function to calculate the range on the fly.

I see the problem here that the current implementation of copyObject (which is used in many placed in the code) would have to be aware that the source nodes are allocated in a flat MemoryContext for this to work through everywhere in the codebase. This could be achived if a chunk header for the nodes is not omitted (althought the chunk header would not be set by the MemoryContext but by the copyNode implementation for FlatMemoryContext itself. If the current memory isn't flat, normal pallocs have to be done.). Does code exist with the assumption that every node resides in its own allocated chunk?

> It'd sure be interesting to write a function that computes stats about which types of nodes consume how much memory. Shouldn't be too hard. I'm mildly suspecting that a surprisingly large overhead in complex query trees comes from target lists that are the same in many parts of the tree.

This would be the second part of the refactoring. I was wondering if one could reduce the data that is stored in the tree in the first place by using references to existing structures instead.

------ Conclusion -------

* A FlatMemoryContext would reduce memory usage by 40%(?), and the current copyNode infrastructure could remain as is.
* Using a code generator for the nodes is commonly agreed to be a good idea (and the amount of node types in nodes.h make me believe that the amount of hand written code would drop significantly when using one).
* An autogenerated copyObjectFlat would be able to determine the total size of a node and its descendants and could copy a tree with a single palloc, making FlatMemoryContext obsolete and even yielding a higher memory reduction.
* IF we keep the ChunkHeader in FlatMemoryContext, then the "standard" copy node would be able to fallback to a very fast memcopy+pointeroffsetchange variant. The profit here might outweigh the few percent of memory we gain when we skip the header. But then we have no memory gains vs. using the FlatMemoryContext alone.

So:

* A FlatMemoryContext is a good idea and I will try to implement it.
* Code generation could speed things up significantly and a variant would just contribute to memory if we skip chunk headers (if that is allowed at all).
* There is more work to do afterwards, because the trees are still too big.

------------------------------------------------

Some more questions:
* Why are MemoryContextAllocZeroAligned and MemoryContextAllocZero the same function?
* I wonder if we just cannot drop the raw_parse_tree altogether for large statements when the query tree has been generated. I know you need it in case the definition of tables etc. change to recreate the query tree, but wouldn't it be possible the recreate the raw_parse_tree from the query_string in that very rare event also? The query_string is already the most compacted form of the raw_parse_tree, right?
* Does code exist with the assumption that every node resides in its own allocated chunk? (asked already above)
* Where can I find the existing code generation script? Is it called during make or before checkin?

------------------------------------------------

Thanks for reading through this. I sadly cannot provide an example because this would mean to include my companies whole database schema and I just gave up after spending 30 minutes to clean it up enough, but the complexity of the view can still be viewed at https://explain.depesz.com/s/gN2

Regards,
Daniel Migowski

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2019-07-28 13:04:51 Re: BUG #15883: Event Trigger Firing Matrix Table is incomplete
Previous Message Michael Paquier 2019-07-28 10:45:55 Re: BUG #15883: Event Trigger Firing Matrix Table is incomplete