RE: Random pg_upgrade test failure on drongo

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Alexander Lakhin' <exclusion(at)gmail(dot)com>
Cc: "'andrew(at)dunslane(dot)net'" <andrew(at)dunslane(dot)net>, "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Random pg_upgrade test failure on drongo
Date: 2023-11-23 12:15:22
Message-ID: TY3PR01MB988963F49BF9528CD8DEED3CF5B9A@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Alexander,

>
> I can easily reproduce this failure on my workstation by running 5 tests
> 003_logical_slots in parallel inside Windows VM with it's CPU resources
> limited to 50%, like so:
> VBoxManage controlvm "Windows" cpuexecutioncap 50
>
> set PGCTLTIMEOUT=180
> python3 -c "NUMITERATIONS=20;NUMTESTS=5;import os;tsts='';exec('for i in
> range(1,NUMTESTS+1):
> tsts+=f\"pg_upgrade_{i}/003_logical_slots \"'); exec('for i in
> range(1,NUMITERATIONS+1):print(f\"iteration {i}\");
> assert(os.system(f\"meson test --num-processes {NUMTESTS} {tsts}\") == 0)')"
> ...
> iteration 2
> ninja: Entering directory `C:\src\postgresql\build'
> ninja: no work to do.
> 1/5 postgresql:pg_upgrade_2 / pg_upgrade_2/003_logical_slots
> ERROR 60.30s exit status 25
> ...
> pg_restore: error: could not execute query: ERROR: could not create file
> "base/1/2683": File exists
> ...

Great. I do not have such an environment so I could not find. This seemed to
suggest that the failure was occurred because the system was busy.

> I agree with your analysis and would like to propose a PoC fix (see
> attached). With this patch applied, 20 iterations succeeded for me.

Thanks, here are comments. I'm quite not sure for the windows, so I may say
something wrong.

* I'm not sure why the file/directory name was changed before doing a unlink.
Could you add descriptions?
* IIUC, the important points is the latter part, which waits until the status is
changed. Based on that, can we remove a double rmtree() from cleanup_output_dirs()?
They seems to be add for the similar motivation.

```
+ loops = 0;
+ while (lstat(curpath, &st) < 0 && lstat_error_was_status_delete_pending())
+ {
+ if (++loops > 100) /* time out after 10 sec */
+ return -1;
+ pg_usleep(100000); /* us */
+ }
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Quan Zongliang 2023-11-23 12:27:51 Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
Previous Message Bono Stebler 2023-11-23 11:55:58 Use index to estimate expression selectivity