Re: Random pg_upgrade test failure on drongo

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: Random pg_upgrade test failure on drongo
Дата
Msg-id 7fedc503-b0a0-53b1-e5d9-0cedaa862d15@gmail.com
обсуждение исходный текст
Ответ на RE: Random pg_upgrade test failure on drongo  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Ответы RE: Random pg_upgrade test failure on drongo  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
Hello Kuroda-san,

23.11.2023 15:15, Hayato Kuroda (Fujitsu) wrote:
>
>> I agree with your analysis and would like to propose a PoC fix (see
>> attached). With this patch applied, 20 iterations succeeded for me.
> Thanks, here are comments. I'm quite not sure for the windows, so I may say
> something wrong.
>
> * I'm not sure why the file/directory name was changed before doing a unlink.
>    Could you add descriptions?

Please look at the simple test program attached. It demonstrates the
failure for me when running in two sessions as follows:
unlink-open test 150 1000
                 unlink-open test2 150 1000
...
iteration 60
iteration 61
fopen() after unlink() failed (13)

Process Monitor shows:
...
         9:16:55.9249792 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 138,412,032, 
Length: 1,048,576
### unlink() performed for the file "test":
9:16:55.9852903 AM    unlink-open.exe    4968    CreateFile C:\src\test    SUCCESS    Desired Access: Read Attributes,

Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, 
Delete, AllocationSize: n/a, OpenResult: Opened
9:16:55.9853637 AM    unlink-open.exe    4968 QueryAttributeTagFile    C:\src\test    SUCCESS    Attributes: A, 
ReparseTag: 0x0
### file "test" gets into state DELETE PENDING:
9:16:55.9853756 AM    unlink-open.exe    4968 SetDispositionInformationFile    C:\src\test    SUCCESS    Delete: True
9:16:55.9853888 AM    unlink-open.exe    4968    CloseFile C:\src\test    SUCCESS
### concurrent operations with file "test2":
         9:16:55.9866362 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 139,460,608, 
Length: 1,048,576
...
         9:16:55.9972373 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 157,286,400, 
Length: 1,048,576
         9:16:55.9979040 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
### unlink() for "test2":
         9:16:56.1029981 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Read 
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: 
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
         9:16:56.1030432 AM    unlink-open.exe    3232 QueryAttributeTagFile    C:\src\test2    SUCCESS    Attributes:

A, ReparseTag: 0x0
### file "test2" gets into state DELETE PENDING:
         9:16:56.1030517 AM    unlink-open.exe    3232 SetDispositionInformationFile    C:\src\test2    SUCCESS    
Delete: True
         9:16:56.1030625 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
### and then it opened successfully:
         9:16:56.1189503 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Generic

Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,

ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
         9:16:56.1192016 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
### operations with file "test2" continued:
         9:16:56.1193394 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Read 
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: 
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
         9:16:56.1193895 AM    unlink-open.exe    3232 QueryAttributeTagFile    C:\src\test2    SUCCESS    Attributes:

A, ReparseTag: 0x0
         9:16:56.1194042 AM    unlink-open.exe    3232 SetDispositionInformationFile    C:\src\test2    SUCCESS    
Delete: True
         9:16:56.1194188 AM    unlink-open.exe    3232 CloseFile    C:\src\test2    SUCCESS
         9:16:56.1198459 AM    unlink-open.exe    3232 CreateFile    C:\src\test2    SUCCESS    Desired Access: Generic

Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,

ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
         9:16:56.1200302 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 0, Length: 
1,048,576, Priority: Normal
...
         9:16:56.1275871 AM    unlink-open.exe    3232 WriteFile    C:\src\test2    SUCCESS    Offset: 10,485,760, 
Length: 1,048,576

### at the same time, CreateFile() for file "test" failed:
9:16:56.1276453 AM    unlink-open.exe    4968    CreateFile C:\src\test    DELETE PENDING    Desired Access: Generic 
Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N,

ShareMode: Read, Write, AllocationSize: 0
9:16:56.1279359 AM    unlink-open.exe    3232    WriteFile C:\src\test2    SUCCESS    Offset: 11,534,336, Length:
1,048,576
9:16:56.1283452 AM    unlink-open.exe    3232    WriteFile C:\src\test2    SUCCESS    Offset: 12,582,912, Length:
1,048,576
...

But with rename(MoveFileEx), I see:
unlink-open test 150 1000 rename
...
9:38:01.7035286 AM    unlink-open.exe    10208    WriteFile C:\src\test    SUCCESS    Offset: 156,237,824, Length:
1,048,576
9:38:01.7075621 AM    unlink-open.exe    10208    WriteFile C:\src\test    SUCCESS    Offset: 157,286,400, Length:
1,048,576
9:38:01.7101299 AM    unlink-open.exe    10208    CloseFile C:\src\test    SUCCESS
9:38:01.7130802 AM    unlink-open.exe    10208    CreateFile C:\src\test    SUCCESS    Desired Access: Read Attributes,

Delete, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, 
ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:38:01.7132844 AM    unlink-open.exe    10208 QueryAttributeTagFile    C:\src\test    SUCCESS    Attributes: A, 
ReparseTag: 0x0
9:38:01.7133420 AM    unlink-open.exe    10208 QueryBasicInformationFile    C:\src\test    SUCCESS    CreationTime: 
11/24/2023 9:38:01 AM, LastAccessTime: 11/24/2023 9:38:01 AM, LastWriteTime: 11/24/2023 9:38:01 AM, ChangeTime: 
11/24/2023 9:38:01 AM, FileAttributes: A
9:38:01.7135191 AM    unlink-open.exe    10208    CreateFile C:\src    SUCCESS    Desired Access: Write Data/Add File,

Synchronize, Disposition: Open, Options: , Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult:
Opened

### file "test" renamed to "test.tmp", it doesn't get into state DELETE PENDING
9:38:01.7136221 AM    unlink-open.exe    10208 SetRenameInformationFile    C:\src\test    SUCCESS ReplaceIfExists:
True,
 
FileName: C:\src\test.tmp
9:38:01.8384110 AM    unlink-open.exe    10208    CloseFile C:\src    SUCCESS
9:38:01.8388203 AM    unlink-open.exe    10208    CloseFile C:\src\test.tmp    SUCCESS

### then file "test.tmp" deleted as usual:
9:38:01.8394278 AM    unlink-open.exe    10208    CreateFile C:\src\test.tmp    SUCCESS    Desired Access: Read 
Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: 
Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
9:38:01.8396534 AM    unlink-open.exe    10208 QueryAttributeTagFile    C:\src\test.tmp    SUCCESS    Attributes: A, 
ReparseTag: 0x0
9:38:01.8396885 AM    unlink-open.exe    10208 SetDispositionInformationFile    C:\src\test.tmp    SUCCESS Delete:
True
9:38:01.8397312 AM    unlink-open.exe    10208    CloseFile C:\src\test.tmp    SUCCESS
9:38:01.9162566 AM    unlink-open.exe    10208    CreateFile C:\src\test    SUCCESS    Desired Access: Generic Write, 
Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, 
ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
9:38:01.9167628 AM    unlink-open.exe    10208    CloseFile C:\src\test    SUCCESS

So the same test run with MoveFileEx():
unlink-open test 150 1000 rename
                 unlink-open test2 150 1000 rename
successfully passes for me in the same environment (Windows VM slowed down to 50%).

That is, my idea was to try removing a file through renaming it as a fast
path (thus avoiding that troublesome state DELETE PENDING), and if that
fails, to perform removal as before. May be the whole function might be
simplified, but I'm not sure about special cases yet.

> * IIUC, the important points is the latter part, which waits until the status is
>    changed. Based on that, can we remove a double rmtree() from cleanup_output_dirs()?
>    They seems to be add for the similar motivation.

I couldn't yet reproduce a failure, which motivated that doubling (IIUC, it
was observed in [1]), with c28911750 reverted, so I need more time to
research that issue to answer this question.

[1] https://www.postgresql.org/message-id/20230131172806.GM22427%40telsasoft.com

Best regards,
Alexander
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Improving the comments in pqsignal()
Следующее
От: "Drouvot, Bertrand"
Дата:
Сообщение: Re: Synchronizing slots from primary to standby