Обсуждение: Bottlenecks with large number of relation segment files

Поиск
Список
Период
Сортировка

Bottlenecks with large number of relation segment files

От
Amit Langote
Дата:
Hello,

I am looking the effect of having large number of relation files under
$PGDATA/base/ (for example, in cases where I choose lower segment size
using --with-segsize). Consider a case where I am working with a large
database with large relations, for example a database similar in size
to what "pgbench -i -s 3500" would be.

May the routines in fd.c become bottleneck with a large number of
concurrent connections to above database, say something like "pgbench
-j 8 -c 128"? Is there any other place I should be paying attention
to?

--
Amit Langote


Re: Bottlenecks with large number of relation segment files

От
KONDO Mitsumasa
Дата:
Hi Amit,

(2013/08/05 15:23), Amit Langote wrote:
> May the routines in fd.c become bottleneck with a large number of
> concurrent connections to above database, say something like "pgbench
> -j 8 -c 128"? Is there any other place I should be paying attention
> to?
What kind of file system did you use?

When we open file, ext3 or ext4 file system seems to sequential search inode for
opening file in file directory.
And PostgreSQL limit FD 1000 per process. It seems too small.
Please change src/backend/storage/file/fd.c at "max_files_per_process = 1000;"
If we rewrite it, We can change limit of FD per process. I have already created
fix-patch about this problem in postgresql.conf, and will submit next CF.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



Re: Bottlenecks with large number of relation segment files

От
Amit Langote
Дата:
On Mon, Aug 5, 2013 at 5:01 PM, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:
> Hi Amit,
>
>
> (2013/08/05 15:23), Amit Langote wrote:
>>
>> May the routines in fd.c become bottleneck with a large number of
>> concurrent connections to above database, say something like "pgbench
>> -j 8 -c 128"? Is there any other place I should be paying attention
>> to?
>
> What kind of file system did you use?
>
> When we open file, ext3 or ext4 file system seems to sequential search inode
> for opening file in file directory.
> And PostgreSQL limit FD 1000 per process. It seems too small.
> Please change src/backend/storage/file/fd.c at "max_files_per_process =
> 1000;"
> If we rewrite it, We can change limit of FD per process. I have already
> created fix-patch about this problem in postgresql.conf, and will submit
> next CF.

Thank you for replying Kondo-san.
The file system is ext4.
So, within the limits of max_files_per_process, the routines of file.c
should not become a bottleneck?


--
Amit Langote


Re: Bottlenecks with large number of relation segment files

От
John R Pierce
Дата:
On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:
> When we open file, ext3 or ext4 file system seems to sequential search
> inode for opening file in file directory.

no, ext3/4 uses H-tree structures to search directories over 1 block
long quite efficiently.


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Bottlenecks with large number of relation segment files

От
KONDO Mitsumasa
Дата:
(2013/08/05 17:14), Amit Langote wrote:
> So, within the limits of max_files_per_process, the routines of file.c
> should not become a bottleneck?
It may not become bottleneck.
1 FD consumes 160 byte in 64bit system. See linux manual at "epoll".

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center




Re: Bottlenecks with large number of relation segment files

От
Andres Freund
Дата:
On 2013-08-05 18:40:10 +0900, KONDO Mitsumasa wrote:
> (2013/08/05 17:14), Amit Langote wrote:
> >So, within the limits of max_files_per_process, the routines of file.c
> >should not become a bottleneck?
> It may not become bottleneck.
> 1 FD consumes 160 byte in 64bit system. See linux manual at "epoll".

That limit is about max_user_watches, not the general cost of an
fd. Afair they take up a a good more than that. Also, there are global
limits to the amount of filehandles that can simultaneously opened on a
system.


Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Bottlenecks with large number of relation segment files

От
Florian Weimer
Дата:
On 08/05/2013 10:42 AM, John R Pierce wrote:
> On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:
>> When we open file, ext3 or ext4 file system seems to sequential search
>> inode for opening file in file directory.
>
> no, ext3/4 uses H-tree structures to search directories over 1 block
> long quite efficiently.

And the Linux dentry cache is rather aggressive, so most of the time,
only the in-memory hash table will be consulted.  (The dentry cache only
gets flushed on severe memory pressure.)

--
Florian Weimer / Red Hat Product Security Team


Re: Bottlenecks with large number of relation segment files

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> ... Also, there are global
> limits to the amount of filehandles that can simultaneously opened on a
> system.

Yeah.  Raising max_files_per_process puts you at serious risk that
everything else on the box will start falling over for lack of available
FD slots.  (PG itself tends to cope pretty well, since fd.c knows it can
drop some other open file when it gets EMFILE.)  We more often have to
tell people to lower that limit than to raise it.

            regards, tom lane


Re: Bottlenecks with large number of relation segment files

От
KONDO Mitsumasa
Дата:
(2013/08/05 19:28), Andres Freund wrote:
> On 2013-08-05 18:40:10 +0900, KONDO Mitsumasa wrote:
>> (2013/08/05 17:14), Amit Langote wrote:
>>> So, within the limits of max_files_per_process, the routines of file.c
>>> should not become a bottleneck?
>> It may not become bottleneck.
>> 1 FD consumes 160 byte in 64bit system. See linux manual at "epoll".
>
> That limit is about max_user_watches, not the general cost of an
> fd. Afair they take up a a good more than that.
OH! It's my mistake... I retry to read about FD in linux manual at "proc".
It seems that a process having FD can see in /proc/[pid]/fd/.
And it seems symbolic link and consume 64byte memory per FD.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



























Re: [HACKERS] Bottlenecks with large number of relation segment files

От
KONDO Mitsumasa
Дата:
(2013/08/05 21:23), Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> ... Also, there are global
>> limits to the amount of filehandles that can simultaneously opened on a
>> system.
>
> Yeah.  Raising max_files_per_process puts you at serious risk that
> everything else on the box will start falling over for lack of available
> FD slots.
Is it Really? When I use hadoop like NOSQL storage, I set large number of FD.
Actually, Hadoop Wiki is writing following.

http://wiki.apache.org/hadoop/TooManyOpenFiles
> Too Many Open Files
>
> You can see this on Linux machines in client-side applications, server code or even in test runs.
> It is caused by per-process limits on the number of files that a single user/process can have open, which was
introducedin the 2.6.27 kernel. The default value, 128, was chosen because "that should be enough". 
>
> In Hadoop, it isn't.
~
> ulimit -n 8192

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: Bottlenecks with large number of relation segment files

От
KONDO Mitsumasa
Дата:
(2013/08/05 20:38), Florian Weimer wrote:
> On 08/05/2013 10:42 AM, John R Pierce wrote:
>> On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:
>>> When we open file, ext3 or ext4 file system seems to sequential search
>>> inode for opening file in file directory.
>>
>> no, ext3/4 uses H-tree structures to search directories over 1 block
>> long quite efficiently.
>
> And the Linux dentry cache is rather aggressive, so most of the time, only the
> in-memory hash table will be consulted.  (The dentry cache only gets flushed on
> severe memory pressure.)
Are you really? When I put large number of files in same directory and open,
it is very very slow. But open directory is not.
So I think it's only directory search. Not file search in same directory. And I
heard before it.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center






Re: [HACKERS] Bottlenecks with large number of relation segment files

От
Andres Freund
Дата:
On 2013-08-06 19:19:41 +0900, KONDO Mitsumasa wrote:
> (2013/08/05 21:23), Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> >> ... Also, there are global
> >> limits to the amount of filehandles that can simultaneously opened on a
> >> system.
> >
> > Yeah.  Raising max_files_per_process puts you at serious risk that
> > everything else on the box will start falling over for lack of available
> > FD slots.
> Is it Really? When I use hadoop like NOSQL storage, I set large number of FD.
> Actually, Hadoop Wiki is writing following.
>
> http://wiki.apache.org/hadoop/TooManyOpenFiles
> > Too Many Open Files
> >
> > You can see this on Linux machines in client-side applications, server code or even in test runs.
> > It is caused by per-process limits on the number of files that a single user/process can have open, which was
introducedin the 2.6.27 kernel. The default value, 128, was chosen because "that should be enough". 

The first paragraph (which you're quoting with 128) is talking about
epoll which we don't use. The second paragraph indeed talks about the
max numbers of fds. Of *one* process.
Postgres uses a *process* based model. So, max_files_per_process is about
the the number of fds in a single backend. You need to multiply it by
max_connections + a bunch to get to the overall number of FDs.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Bottlenecks with large number of relation segment files

От
KONDO Mitsumasa
Дата:
(2013/08/06 19:33), Andres Freund wrote:
> On 2013-08-06 19:19:41 +0900, KONDO Mitsumasa wrote:
>> (2013/08/05 21:23), Tom Lane wrote:
>>> Andres Freund <andres@2ndquadrant.com> writes:
>>>> ... Also, there are global
>>>> limits to the amount of filehandles that can simultaneously opened on a
>>>> system.
>>>
>>> Yeah.  Raising max_files_per_process puts you at serious risk that
>>> everything else on the box will start falling over for lack of available
>>> FD slots.
>> Is it Really? When I use hadoop like NOSQL storage, I set large number of FD.
>> Actually, Hadoop Wiki is writing following.
>>
>> http://wiki.apache.org/hadoop/TooManyOpenFiles
>>> Too Many Open Files
>>>
>>> You can see this on Linux machines in client-side applications, server code or even in test runs.
>>> It is caused by per-process limits on the number of files that a single user/process can have open, which was
introducedin the 2.6.27 kernel. The default value, 128, was chosen because "that should be enough". 
>
> The first paragraph (which you're quoting with 128) is talking about
> epoll which we don't use. The second paragraph indeed talks about the
> max numbers of fds. Of *one* process.
Yes. I have been already understood like that.

> Postgres uses a *process* based model. So, max_files_per_process is about
> the the number of fds in a single backend. You need to multiply it by
> max_connections + a bunch to get to the overall number of FDs.
Yes, too. I think max_file_per_process seems too small. In NoSQL, it was
recommended large number of FD. However, I do not know whether it is really
enough in PostgreSQL. If we use PostgreSQL with big data, we might need to change
max_file_per_process, I think.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: Bottlenecks with large number of relation segment files

От
Florian Weimer
Дата:
On 08/06/2013 12:28 PM, KONDO Mitsumasa wrote:
> (2013/08/05 20:38), Florian Weimer wrote:
>> On 08/05/2013 10:42 AM, John R Pierce wrote:
>>> On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:
>>>> When we open file, ext3 or ext4 file system seems to sequential search
>>>> inode for opening file in file directory.
>>>
>>> no, ext3/4 uses H-tree structures to search directories over 1 block
>>> long quite efficiently.
>>
>> And the Linux dentry cache is rather aggressive, so most of the time,
>> only the
>> in-memory hash table will be consulted.  (The dentry cache only gets
>> flushed on
>> severe memory pressure.)

> Are you really? When I put large number of files in same directory and
> open, it is very very slow. But open directory is not.

The first file name resolution is slow, but subsequent resolutions
typically happen from the dentry cache.  (The cache is not populated
when the directory is opened.)

--
Florian Weimer / Red Hat Product Security Team


Re: Bottlenecks with large number of relation segment files

От
KONDO Mitsumasa
Дата:
(2013/08/06 20:19), Florian Weimer wrote:
> The first file name resolution is slow, but subsequent resolutions typically
> happen from the dentry cache.  (The cache is not populated when the directory is
> opened.)
I see. I understand why ext file system is slow when we put large number of files.
Thank you for your advise!

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center