Re: Random pg_upgrade test failure on drongo
От | Alexander Lakhin |
---|---|
Тема | Re: Random pg_upgrade test failure on drongo |
Дата | |
Msg-id | 7fedc503-b0a0-53b1-e5d9-0cedaa862d15@gmail.com обсуждение исходный текст |
Ответ на | RE: Random pg_upgrade test failure on drongo ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
RE: Random pg_upgrade test failure on drongo
("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
|
Список | pgsql-hackers |
Hello Kuroda-san, 23.11.2023 15:15, Hayato Kuroda (Fujitsu) wrote: > >> I agree with your analysis and would like to propose a PoC fix (see >> attached). With this patch applied, 20 iterations succeeded for me. > Thanks, here are comments. I'm quite not sure for the windows, so I may say > something wrong. > > * I'm not sure why the file/directory name was changed before doing a unlink. > Could you add descriptions? Please look at the simple test program attached. It demonstrates the failure for me when running in two sessions as follows: unlink-open test 150 1000 unlink-open test2 150 1000 ... iteration 60 iteration 61 fopen() after unlink() failed (13) Process Monitor shows: ... 9:16:55.9249792 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 138,412,032, Length: 1,048,576 ### unlink() performed for the file "test": 9:16:55.9852903 AM unlink-open.exe 4968 CreateFile C:\src\test SUCCESS Desired Access: Read Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 9:16:55.9853637 AM unlink-open.exe 4968 QueryAttributeTagFile C:\src\test SUCCESS Attributes: A, ReparseTag: 0x0 ### file "test" gets into state DELETE PENDING: 9:16:55.9853756 AM unlink-open.exe 4968 SetDispositionInformationFile C:\src\test SUCCESS Delete: True 9:16:55.9853888 AM unlink-open.exe 4968 CloseFile C:\src\test SUCCESS ### concurrent operations with file "test2": 9:16:55.9866362 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 139,460,608, Length: 1,048,576 ... 9:16:55.9972373 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 157,286,400, Length: 1,048,576 9:16:55.9979040 AM unlink-open.exe 3232 CloseFile C:\src\test2 SUCCESS ### unlink() for "test2": 9:16:56.1029981 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Read Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 9:16:56.1030432 AM unlink-open.exe 3232 QueryAttributeTagFile C:\src\test2 SUCCESS Attributes: A, ReparseTag: 0x0 ### file "test2" gets into state DELETE PENDING: 9:16:56.1030517 AM unlink-open.exe 3232 SetDispositionInformationFile C:\src\test2 SUCCESS Delete: True 9:16:56.1030625 AM unlink-open.exe 3232 CloseFile C:\src\test2 SUCCESS ### and then it opened successfully: 9:16:56.1189503 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created 9:16:56.1192016 AM unlink-open.exe 3232 CloseFile C:\src\test2 SUCCESS ### operations with file "test2" continued: 9:16:56.1193394 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Read Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 9:16:56.1193895 AM unlink-open.exe 3232 QueryAttributeTagFile C:\src\test2 SUCCESS Attributes: A, ReparseTag: 0x0 9:16:56.1194042 AM unlink-open.exe 3232 SetDispositionInformationFile C:\src\test2 SUCCESS Delete: True 9:16:56.1194188 AM unlink-open.exe 3232 CloseFile C:\src\test2 SUCCESS 9:16:56.1198459 AM unlink-open.exe 3232 CreateFile C:\src\test2 SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created 9:16:56.1200302 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 0, Length: 1,048,576, Priority: Normal ... 9:16:56.1275871 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 10,485,760, Length: 1,048,576 ### at the same time, CreateFile() for file "test" failed: 9:16:56.1276453 AM unlink-open.exe 4968 CreateFile C:\src\test DELETE PENDING Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0 9:16:56.1279359 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 11,534,336, Length: 1,048,576 9:16:56.1283452 AM unlink-open.exe 3232 WriteFile C:\src\test2 SUCCESS Offset: 12,582,912, Length: 1,048,576 ... But with rename(MoveFileEx), I see: unlink-open test 150 1000 rename ... 9:38:01.7035286 AM unlink-open.exe 10208 WriteFile C:\src\test SUCCESS Offset: 156,237,824, Length: 1,048,576 9:38:01.7075621 AM unlink-open.exe 10208 WriteFile C:\src\test SUCCESS Offset: 157,286,400, Length: 1,048,576 9:38:01.7101299 AM unlink-open.exe 10208 CloseFile C:\src\test SUCCESS 9:38:01.7130802 AM unlink-open.exe 10208 CreateFile C:\src\test SUCCESS Desired Access: Read Attributes, Delete, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 9:38:01.7132844 AM unlink-open.exe 10208 QueryAttributeTagFile C:\src\test SUCCESS Attributes: A, ReparseTag: 0x0 9:38:01.7133420 AM unlink-open.exe 10208 QueryBasicInformationFile C:\src\test SUCCESS CreationTime: 11/24/2023 9:38:01 AM, LastAccessTime: 11/24/2023 9:38:01 AM, LastWriteTime: 11/24/2023 9:38:01 AM, ChangeTime: 11/24/2023 9:38:01 AM, FileAttributes: A 9:38:01.7135191 AM unlink-open.exe 10208 CreateFile C:\src SUCCESS Desired Access: Write Data/Add File, Synchronize, Disposition: Open, Options: , Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened ### file "test" renamed to "test.tmp", it doesn't get into state DELETE PENDING 9:38:01.7136221 AM unlink-open.exe 10208 SetRenameInformationFile C:\src\test SUCCESS ReplaceIfExists: True, FileName: C:\src\test.tmp 9:38:01.8384110 AM unlink-open.exe 10208 CloseFile C:\src SUCCESS 9:38:01.8388203 AM unlink-open.exe 10208 CloseFile C:\src\test.tmp SUCCESS ### then file "test.tmp" deleted as usual: 9:38:01.8394278 AM unlink-open.exe 10208 CreateFile C:\src\test.tmp SUCCESS Desired Access: Read Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 9:38:01.8396534 AM unlink-open.exe 10208 QueryAttributeTagFile C:\src\test.tmp SUCCESS Attributes: A, ReparseTag: 0x0 9:38:01.8396885 AM unlink-open.exe 10208 SetDispositionInformationFile C:\src\test.tmp SUCCESS Delete: True 9:38:01.8397312 AM unlink-open.exe 10208 CloseFile C:\src\test.tmp SUCCESS 9:38:01.9162566 AM unlink-open.exe 10208 CreateFile C:\src\test SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created 9:38:01.9167628 AM unlink-open.exe 10208 CloseFile C:\src\test SUCCESS So the same test run with MoveFileEx(): unlink-open test 150 1000 rename unlink-open test2 150 1000 rename successfully passes for me in the same environment (Windows VM slowed down to 50%). That is, my idea was to try removing a file through renaming it as a fast path (thus avoiding that troublesome state DELETE PENDING), and if that fails, to perform removal as before. May be the whole function might be simplified, but I'm not sure about special cases yet. > * IIUC, the important points is the latter part, which waits until the status is > changed. Based on that, can we remove a double rmtree() from cleanup_output_dirs()? > They seems to be add for the similar motivation. I couldn't yet reproduce a failure, which motivated that doubling (IIUC, it was observed in [1]), with c28911750 reverted, so I need more time to research that issue to answer this question. [1] https://www.postgresql.org/message-id/20230131172806.GM22427%40telsasoft.com Best regards, Alexander
Вложения
В списке pgsql-hackers по дате отправления: