Re: Performance degradation in commit ac1d794

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Васильев Дмитрий <d(dot)vasilyev(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in commit ac1d794
Date: 2016-03-17 02:04:26
Message-ID: 20160317020426.uz5oobr6b72mpcdw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2016-03-16 15:41:28 -0400, Robert Haas wrote:
> On Wed, Mar 16, 2016 at 3:25 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> > - Given current users we don't need a large amount of events, so having
> >> > to iterate through the registered events doesn't seem bothersome. We
> >> > could however change the api to be something like
> >> >
> >> > int WaitLatchEventSet(LatchEventSet *set, OccurredEvents *, int nevents, long timeout);
> >> >
> >> > which would return the number of events that happened, and would
> >> > basically "fill" one of the (usually stack allocated) OccurredEvent
> >> > structures with what happened.
> >>
> >> I definitely think something along these lines is useful. I want to
> >> be able to have an Append node with 100 ForeignScans under it and kick
> >> off all the scans asynchronously and wait for all of the FDs at once.
> >
> > So you'd like to get only an event for the FD with data back? Or are you
> > ok with iterating through hundred elements in an array, to see which are
> > ready?
>
> I'd like to get an event back for the FD with data. Iterating sounds
> like it could be really slow. Say you get lots of little packets back
> from the same connection, while the others are idle. Now you've got
> to keep iterating through them all over and over again. Blech.

I've a (working) WIP version that works like I think you want. It's
implemented in patch 05 (rework with just poll() support) and (add epoll
suppport). It's based on patches posted here earlier, but these aren't
interesting for the discussion.

The API is now:

typedef struct WaitEventSet WaitEventSet;

typedef struct WaitEvent
{
int pos; /* position in the event data structure */
uint32 events; /* tripped events */
int fd; /* fd associated with event */
} WaitEvent;

/*
* Create a WaitEventSet with space for nevents different events to wait for.
*
* latch may be NULL.
*/
extern WaitEventSet *CreateWaitEventSet(int nevents);

/* ---
* Add an event to the set. Possible events are:
* - WL_LATCH_SET: Wait for the latch to be set
* - WL_POSTMASTER_DEATH: Wait for postmaster to die
* - WL_SOCKET_READABLE: Wait for socket to become readable
* can be combined in one event with WL_SOCKET_WRITEABLE
* - WL_SOCKET_WRITABLE: Wait for socket to become readable
* can be combined with WL_SOCKET_READABLE
*
* Returns the offset in WaitEventSet->events (starting from 0), which can be
* used to modify previously added wait events.
*/
extern int AddWaitEventToSet(WaitEventSet *set, uint32 events, int fd, Latch *latch);

/*
* Change the event mask and, if applicable, the associated latch of of a
* WaitEvent.
*/
extern void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch);

/*
* Wait for events added to the set to happen, or until the timeout is
* reached. At most nevents occurrent events are returned.
*
* Returns the number of events occurred, or 0 if the timeout was reached.
*/
extern int WaitEventSetWait(WaitEventSet *set, long timeout, WaitEvent* occurred_events, int nevents);

I've for now left the old latch API in place, and only converted
be-secure.c to the new style. I'd appreciate some feedback before I go
around and convert and polish everything.

Questions:
* I'm kinda inclined to merge the win32 and unix latch
implementations. There's already a fair bit in common, and this is
just going to increase that amount.

* Right now the caller has to allocate the WaitEvents he's waiting for
locally (likely on the stack), but we also could allocate them as part
of the WaitEventSet. Not sure if that'd be a benefit.

* I can do a blind rewrite of the windows implementation, but I'm
obviously not going to get that entirely right. So I need some help
from a windows person to test this.

* This approach, with a 'stateful' wait event data structure, will
actually allow to fix a couple linering bugs we have on the windows
port. C.f. http://www.postgresql.org/message-id/4351.1336927207@sss.pgh.pa.us

- Andres

Attachment Content-Type Size
0001-Access-the-correct-pollfd-when-checking-for-socket-e.patch text/x-patch 1.7 KB
0002-Make-it-easier-to-choose-the-used-waiting-primitive-.patch text/x-patch 4.2 KB
0003-Error-out-if-waiting-on-socket-readiness-without-a-s.patch text/x-patch 2.2 KB
0004-Only-clear-unix_latch.c-s-self-pipe-if-it-actually-c.patch text/x-patch 5.8 KB
0005-WIP-WaitEvent-API.patch text/x-patch 14.4 KB
0006-WIP-Use-epoll-for-Wait-Event-API-if-available.patch text/x-patch 10.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-17 02:30:06 Re: [HACKERS] pgbench -C -M prepared gives an error
Previous Message Michael Paquier 2016-03-17 02:03:37 Re: RFC: replace pg_stat_activity.waiting with something more descriptive