Обсуждение: Large # of Tables, Getting ready for the enterprise

Поиск
Список
Период
Сортировка

Large # of Tables, Getting ready for the enterprise

От
"carl garland"
Дата:
As postgres becomes better and more in the spotlight there are a couple of 
issues I think that the hacker group might want to address to better prepare 
it for the enterprise and highend production systems.  Currently postgres 
will support an incredible amount of tables whereas Interbase only supports 
64K, but the efficiency and performance of the pg backend quickly 
degenerates after 1000 tables.  I know that most people will think that 
filesystem will be the bottleneck but as XFS nears completion the problem 
will shift back to pg.  It is my understanding that the system tables where 
lookups on tables occur are always done sequentially and not using any more 
optimized (btree etc) solution.  I also think this may be applicable to the 
toastable objects where large # of objects occur.  I want to start to look 
at the code to maybe help out but have a few questions:
1)  When referencing a table is it only looked up once and then cached or 
does a scan of the system table occur only once per session.
2)  Which files should I look at in tree.
3)  Any tips, suggestions, pitfalls I should remember.

Thanx for the pointers,
Carl Garland
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



Re: Large # of Tables, Getting ready for the enterprise

От
"Ross J. Reedstrom"
Дата:
On Thu, Aug 17, 2000 at 06:17:01AM -0400, carl garland wrote:
> As postgres becomes better and more in the spotlight there are a couple of 
> issues I think that the hacker group might want to address to better prepare 
> it for the enterprise and highend production systems.  Currently postgres 
> will support an incredible amount of tables whereas Interbase only supports 
> 64K, but the efficiency and performance of the pg backend quickly 
> degenerates after 1000 tables.  I know that most people will think that 
> filesystem will be the bottleneck but as XFS nears completion the problem 

Realize that pg is a cross platform program. The existance (or not) of a 
particular filesystem cannot be assumed, nor does the release of a new FS
on linux, for example, impact the entire postgresql community.

> will shift back to pg.  It is my understanding that the system tables where 
> lookups on tables occur are always done sequentially and not using any more 
> optimized (btree etc) solution.  I also think this may be applicable to the 

Nope. The pg_class table has indicices on oid and relname, which are used when
appropriate. In addition, there is a cache for system tables (syscache).

> toastable objects where large # of objects occur.  I want to start to look 
> at the code to maybe help out but have a few questions:
> 1)  When referencing a table is it only looked up once and then cached or 
> does a scan of the system table occur only once per session.

This would be the syscache, for the system tables, or the relcache, which
caches relation (i.e. table) descriptors for the general case. There's
been some discussion and work regarding cleaning up the seperation between
accessing relations and storing them in particular files. At one point,
Tom Lane mentioned that he was wondering if the knowledge of the relcache
needs to be move out of the bufmgr/smgr interfaces (that's the buffer
manager and storage manager, respectively) as part of that cleanup.

> 2)  Which files should I look at in tree.

Hmm, an awful lot of them. 

find . -name \*.[chyl] |xargs grep -l syscache | wc -l   98
find . -name \*.[chyl] |xargs grep -l relcache | wc -l   43 

And that's not nearly exhaustive.

> 3)  Any tips, suggestions, pitfalls I should remember.

Take a look at:

http://postgresql.org/docs/faq-dev-english.html

and 

file:/where/ever/you/put/pgsql/src/tools/backend/index.html

Very useful for getting the broad overview and links into your
local filetree, with descriptions.

This is a very complex bit of code, right at the heart of the
database. I've found it a might steep learning curve. Perhaps not the
best place to start for a first backend coding project. But dig in: for
me, there's no better way to learn than breaking code, and fixing it. The
core developers are good at not letting stupid ideas get commited to CVS.

One thing not mentioned above: memory management is quite complex, as well,
and underlies all the other code (of course). Take a look at the mmgr code.

In fact, if you've got any skill at code analysis and documentation,
further diagramming and describing of the interrelationships between
the different caches and managers (syscache, relcache, bufcache, bufmgr,
smgr, mmgr etc.) would be a welcome addition to the work Bruce has done
with the developers FAQ, above.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005



Re: Large # of Tables, Getting ready for the enterprise

От
Tom Lane
Дата:
"carl garland" <carlhgarland@hotmail.com> writes:
> Currently postgres will support an incredible amount of tables whereas
> Interbase only supports 64K, but the efficiency and performance of the
> pg backend quickly degenerates after 1000 tables.

Current sources fix some problems with large numbers of indexes
(pg_index was being sequentially scanned in several places).  Offhand
I'm not aware of any other significant real-world performance problems
in this area; can you be more specific about what's bothering you?
        regards, tom lane