Обсуждение: pg_fallocate

Поиск
Список
Период
Сортировка

pg_fallocate

От
Mitsumasa KONDO
Дата:
Hi,

@font-face { font-family: "Times"; }@font-face { font-family: "MS 明朝"; }@font-face { font-family: "Century"; }@font-face { font-family: "Century"; }@font-face { font-family: "@MS 明朝"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0mm 0mm 0.0001pt; text-align: justify; font-size: 12pt; font-family: Century; }.MsoChpDefault { font-family: Century; }div.WordSection1 { page: WordSec@font-face { font-family: "Times"; }@font-face { font-family: "MS 明朝"; }@font-face { font-family: "Century"; }@font-face { font-family: "Cambria Math"; }@font-face { font-family: "@MS 明朝"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0mm 0mm 0.0001pt; text-align: justify; font-size: 12pt; font-family: Century; }.MsoChpDefault { font-family: Century; }div.WordSection1 { page: WordSection1; }I'l like to add fallocate() system call to improve sequential read/write peformance. fallocate() system call is different from posix_fallocate() that is zero-fille algorithm to reserve continues disk space. fallocate() is almost less overhead alogotithm to reserve continues disk space than posix_fallocate().

It will be needed by sorted checkpoint and more faster vacuum command in near the future.


If you get more detail information, please see linux manual.

I go sight seeing in Dublin with Ishii-san now:-)

 

Regards,

--

Mitsumasa KONDO

NTT Open Source Software

Вложения

Re: pg_fallocate

От
Robert Haas
Дата:
On Thu, Oct 31, 2013 at 9:16 AM, Mitsumasa KONDO
<kondo.mitsumasa@gmail.com> wrote:
> I'l like to add fallocate() system call to improve sequential read/write
> peformance. fallocate() system call is different from posix_fallocate() that
> is zero-fille algorithm to reserve continues disk space. fallocate() is
> almost less overhead alogotithm to reserve continues disk space than
> posix_fallocate().
>
> It will be needed by sorted checkpoint and more faster vacuum command in
> near the future.
>
>
> If you get more detail information, please see linux manual.
>
> I go sight seeing in Dublin with Ishii-san now:-)

Our last attempts to improve performance in this area died in a fire
when it turned out that code that should have been an improvement fell
down over inexplicable ext4 behavior.  I think, therefore, that
extensive benchmarking of this or any other proposed approach is
absolutely essential.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_fallocate

От
Peter Eisentraut
Дата:
On 10/31/13, 9:16 AM, Mitsumasa KONDO wrote:
> I'l like to add fallocate() system call to improve sequential read/write
> peformance. fallocate() system call is different from posix_fallocate()
> that is zero-fille algorithm to reserve continues disk space.
> fallocate() is almost less overhead alogotithm to reserve continues disk
> space than posix_fallocate().

Your patch seems to be missing a bit that defines HAVE_FALLOCATE,
probably something in configure.in.




Re: pg_fallocate

От
Oskari Saarenmaa
Дата:
On Thu, Oct 31, 2013 at 01:16:44PM +0000, Mitsumasa KONDO wrote:
> --- a/src/backend/storage/file/fd.c
> +++ b/src/backend/storage/file/fd.c
> @@ -383,6 +383,21 @@ pg_flush_data(int fd, off_t offset, off_t amount)
>      return 0;
>  }
>  
> +/*
> + * pg_fallocate --- advise OS that the data pre-allocate continus file segments
> + * in physical disk.
> + *
> + * Not all platforms have fallocate. Some platforms only have posix_fallocate,
> + * but it ped zero fill to get pre-allocate file segmnets. It is not good
> + * peformance when extend new segmnets, so we don't use posix_fallocate.
> + */
> +int
> +pg_fallocate(File file, int flags, off_t offset, off_t nbytes)
> +{
> +#if defined(HAVE_FALLOCATE)
> +    return fallocate(VfdCache[file].fd, flags, offset, nbytes);
> +#endif
> +}

You should set errno to ENOSYS and return -1 if HAVE_FALLOCATE isn't
defined.

> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -24,6 +24,7 @@
>  #include <unistd.h>
>  #include <fcntl.h>
>  #include <sys/file.h>
> +#include <linux/falloc.h>

This would have to be wrapped in #ifdef HAVE_FALLOCATE or
HAVE_LINUX_FALLOC_H; if you want to create a wrapper around fallocate() you
should add PG defines for the flags, too.  Otherwise it's probably easier to
just call fallocate() directly inside an #ifdef block as you did in xlog.c.

> @@ -510,6 +511,10 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
>       * if bufmgr.c had to dump another buffer of the same file to make room
>       * for the new page's buffer.
>       */
> +
> +    if(forknum == 1)
> +        pg_fallocate(v->mdfd_vfd, FALLOC_FL_KEEP_SIZE, 0, RELSEG_SIZE);
> +

Return value should be checked; if it's -1 and errno is something else than
ENOSYS or EOPNOTSUPP the disk space allocation failed and you must return an
error.

/ Oskari