Quick Links

Re: NAMEDATALEN increase because of non-latin languages

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Денис Романенко <deromanenko(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: NAMEDATALEN increase because of non-latin languages
Date:	2021-08-19 12:47:42
Message-ID:	CAEze2WjyrWF_1tsYF0ijrZ_aEKhwCtdpeCfRpQUYnDqGXC1DPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 19 Aug 2021 at 13:44, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > Another fun thing --- and, I fear, another good argument against just
> > raising NAMEDATALEN --- is what about TupleDescs, which last I checked
> > used an array of fixed-width pg_attribute images. But maybe we could
> > replace that with an array of pointers. Andres already did a lot of
> > the heavy code churn required to hide that data structure behind
> > TupleDescAttr() macros, so changing the representation should be much
> > less painful than it would once have been.
>
> I was recently wondering if we shouldn't go to a completely bespoke
> datastructure for TupleDesc->attrs, rather than reusing FormData_pg_attribute.
>
> Right now every attribute uses nearly two cachelines (112 bytes). Given how
> frequent a task tuple [de]forming is, and how often it's a bottleneck,
> increasing the cache efficiency of tupledescs would worth quite a bit of
> effort - I do see tupledesc attr cache misses in profiles. A secondary benefit
> would be that we do create a lot of short-lived descs in the executor,
> slimming those down obviously would be good on its own. A third benefit would
> be that we could get rid of attcacheoff in pg_attribute, that always smelled
> funny to me.
>
> One possible way to structure such future tupledescs would be to have multiple
> arrays in struct TupleDescData. With an array of just the data necessary for
> [de]forming at the place ->attrs is, and other stuff in one or more separate
> arrays. The other option could perhaps be omitted for some tupledescs or
> computed lazily.
>
> For deforming we just need attlen (2byte), attbyval (1 byte), attalign (1byte)
> and optionally attcacheoff (4 byte), for forming we also need attstorage (1
> byte). Naively that ends up being 12 bytes - 5 attrs / cacheline is a heck of
> a lot better than ~0.5.

I tried to implement this 'compact attribute access descriptor' a few
months ago in my effort to improve btree index performance.

I abandoned the idea at the time as I didn't find any measurable
difference for the (limited!) tests I ran, where the workload was
mainly re-indexing, select * into, and similar items while
benchmarking reindexing in the 'pp-complete' dataset. But, seeing that
there might be interest outside this effort on a basis seperate from
just plain performance, I'll share the results.

Attached is the latest version of my patch that I could find; it might
be incorrect or fail, as this is something I sent to myself between 2
of my systems during development of the patch. Also, attached as .txt,
as I don't want any CFBot coverage on this (this is not proposed for
inclusion, it is just a show of work, and might be basis for future
work).

The patch allocates an array of 'TupleAttrAlignData'-structs at the
end of the attrs-array, fills it with the correct data upon
TupleDesc-creation, and uses this TupleAttrAlign-data for constructing
and destructing tuples.

One main difference from what you described was that I used a union
for storing attbyval and attstorage, as the latter is only applicable
to attlen < 0, and the first only for attlen >= 0. This keeps the
whole structure in 8 bytes, whilst also being useable in both tuple
forming and deforming.

I hope this can is useful, otherwise sorry for the noise.

Kind regards,

Matthias van de Meent

Attachment	Content-Type	Size
0001-Some-work-on-storing-attribute-access-fields-more-co.patch.txt	text/plain	30.1 KB

In response to

Re: NAMEDATALEN increase because of non-latin languages at 2021-08-19 11:44:35 from Andres Freund

Responses

Re: NAMEDATALEN increase because of non-latin languages at 2021-08-19 12:57:56 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2021-08-19 12:57:56	Re: NAMEDATALEN increase because of non-latin languages
Previous Message	Dipesh Pandit	2021-08-19 12:39:53	Re: .ready and .done files considered harmful