Обсуждение: pg_test_fsync: "Invalid argument" in the middle of a test
Hi list, I'm in the middle of setting up a new machine and there's something odd in pg_test_fsync output. Does anyone have ideas why open_sync tests would fail in the middle?: 4 * 4kB open_sync writes 89.322 ops/sec 11195 usecs/op 8 * 2kB open_sync writes write failed: Invalid argument Happens every time I run it. strace reveals that the first 2kB write fails: open("./pg_test_fsync.out", O_RDWR|O_SYNC|O_DIRECT) = 5 alarm(5) = 0 write(5, "[...]", 2048) = -1 EINVAL (Invalid argument) This is on Ubuntu 13.10 (kernel 3.11) with XFS (mount ed with noatime, no other customizations). Using the LSI SAS 2008 RAID controller branded as Fujitsu D2607 (latest firmware) and megaraid_sas driver. There are no warnings or anything in dmesg or other logs. This does not occur on other Ubuntu 13.10 installations which have different storage stacks. The timings are too fast, as well, since it's backed by four 15k drives in RAID10, no battery and no cache. The write failure does not occur on ext4 in the same setup, but the timings are still too fast. Regards, Marti ---- Full pg_test_fsync output: 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 1575.148 ops/sec 635 usecs/op fdatasync 1460.741 ops/sec 685 usecs/op fsync 1362.300 ops/sec 734 usecs/op fsync_writethrough n/a open_sync 1528.402 ops/sec 654 usecs/op Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 106.022 ops/sec 9432 usecs/op fdatasync 1300.160 ops/sec 769 usecs/op fsync 1353.178 ops/sec 739 usecs/op fsync_writethrough n/a open_sync 108.378 ops/sec 9227 usecs/op Compare open_sync with different write sizes: (This is designed to compare the cost of writing 16kB in different write open_sync sizes.) 1 * 16kB open_sync write 1405.532 ops/sec 711 usecs/op 2 * 8kB open_sync writes 108.439 ops/sec 9222 usecs/op 4 * 4kB open_sync writes 89.322 ops/sec 11195 usecs/op 8 * 2kB open_sync writes write failed: Invalid argument
On Tue, Feb 11, 2014 at 12:20 AM, Marti Raudsepp <marti@juffo.org> wrote: > This is on Ubuntu 13.10 (kernel 3.11) with XFS (mount ed with noatime, > no other customizations). I managed to track this down; XFS doesn't allow using O_DIRECT for writes smaller than the filesystem's sector size (probably same on other FSes). The XFS filesystem created by the Ubuntu installer uses 4kB sectors, for some weird reason: # xfs_info /dev/sda1 meta-data=/dev/disk/by-uuid/987c0579-bd67-4f80-bbc6-50f975ee4c1d isize=256 agcount=16, agsize=4341104 blks = sectsz=4096 attr=2 [...] Yet the storage stack knows they're 512-byte sectors: # cat /sys/block/sda/queue/logical_block_size 512 # cat /sys/block/sda/queue/physical_block_size 512 A new fresh filesystem also properly uses 512B sectors: # mkfs.xfs /dev/sda5 meta-data=/dev/sda5 isize=256 agcount=4, agsize=489856 blks = sectsz=512 attr=2, projid32bit=0 [...] I will be submitting a patch for pg_test_fsync so it can survive write failures in this situation. ---- I could still use some help with this part... Does anyone have experience in setting up megaraid_sas for reliable fsyncs? > open_datasync 1575.148 ops/sec 635 usecs/op > fdatasync 1460.741 ops/sec 685 usecs/op > fsync 1362.300 ops/sec 734 usecs/op > fsync_writethrough n/a > open_sync 1528.402 ops/sec 654 usecs/op Regards, Marti
On Tue, Feb 11, 2014 at 01:28:01AM +0200, Marti Raudsepp wrote: > On Tue, Feb 11, 2014 at 12:20 AM, Marti Raudsepp <marti@juffo.org> wrote: > > This is on Ubuntu 13.10 (kernel 3.11) with XFS (mount ed with noatime, > > no other customizations). > > I managed to track this down; XFS doesn't allow using O_DIRECT for > writes smaller than the filesystem's sector size (probably same on > other FSes). The XFS filesystem created by the Ubuntu installer uses > 4kB sectors, for some weird reason: I have added the attached, applied C comment about Direct I/O write failures and mismatched block sizes. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Вложения
Bruce Momjian wrote: > On Tue, Feb 11, 2014 at 01:28:01AM +0200, Marti Raudsepp wrote: > > On Tue, Feb 11, 2014 at 12:20 AM, Marti Raudsepp <marti@juffo.org> wrote: > > > This is on Ubuntu 13.10 (kernel 3.11) with XFS (mount ed with noatime, > > > no other customizations). > > > > I managed to track this down; XFS doesn't allow using O_DIRECT for > > writes smaller than the filesystem's sector size (probably same on > > other FSes). The XFS filesystem created by the Ubuntu installer uses > > 4kB sectors, for some weird reason: > > I have added the attached, applied C comment about Direct I/O write > failures and mismatched block sizes. Would it be more useful to report the test as failed and continue with other tests? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Feb 12, 2014 at 10:46 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Would it be more useful to report the test as failed and continue with > other tests? Yeah, I think so, I'm planning to code this in the week. It's harder than it sounds because the alarm() timer is still ticking. On POSIX it can be cancelled with alarm(0), but the Windows code spawns a separate thread for timing. It seems that TerminateThread [1] could be used on Windows. It has many caveats, but should be safe for our purposes. Or we could only implement error handling on POSIX and call exit(1) on Windows. [1] http://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx Regards, Marti