Обсуждение: tsearch profiling - czech environment - take 55MB

Поиск

Список

Период

Сортировка

tsearch profiling - czech environment - take 55MB

От

Pavel Stehule

Дата:

11 марта 2010 г., 14:29:07

Hello

There are some wrong in our implementation NISortDictionary. After
initialisation is ts_cache memory context 55MB long and pg takes
190MB.

dispell_init
cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used
After dictionary loading
cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used Ispell dictionary init context: 27615288 total in 13
blocks;7710864
 
free (12 chunks); 19904424 used
After AffFile loading
cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used Ispell dictionary init context: 27615288 total
in13 blocks; 7710864
 
free (20 chunks); 19904424 used
After stop words loading
cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used Ispell dictionary init context: 27615288 total
in13 blocks; 7710864
 
free (20 chunks); 19904424 used
After dictionary sort
cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864
 
free (20 chunks); 19904424 used
After Affixes sort
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864
 
free (34 chunks); 19904424 used
final
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864
 
free (34 chunks); 19904424 used

Regards
Pavel Stehule

Re: tsearch profiling - czech environment - take 55MB

От

Pavel Stehule

Дата:

11 марта 2010 г., 14:34:53

2010/3/11 Pavel Stehule <pavel.stehule@gmail.com>:
> Hello
>
> There are some wrong in our implementation NISortDictionary. After
> initialisation is ts_cache memory context 55MB long and pg takes
> 190MB.
>
> dispell_init
> cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used
> After dictionary loading
> cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used
>  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
> free (12 chunks); 19904424 used
> After AffFile loading
> cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used
>  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
> free (20 chunks); 19904424 used
> After stop words loading
> cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used
>  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
> free (20 chunks); 19904424 used
> After dictionary sort
> cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used
>  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
> free (20 chunks); 19904424 used
> After Affixes sort
> cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used
>  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
> free (34 chunks); 19904424 used
> final
> cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used
>  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
> free (34 chunks); 19904424 used
>

the mkSPNode takes 45MB

Conf->Dictionary = mkSPNode(Conf, 0, Conf->nspell, 0);

> Regards
> Pavel Stehule
>

Re: tsearch profiling - czech environment - take 55MB

От

Tom Lane

Дата:

11 марта 2010 г., 14:52:54

Pavel Stehule <pavel.stehule@gmail.com> writes:
> There are some wrong in our implementation NISortDictionary. After
> initialisation is ts_cache memory context 55MB long and pg takes
> 190MB.

What's your tsearch configuration exactly?
        regards, tom lane

Re: tsearch profiling - czech environment - take 55MB

От

Pavel Stehule

Дата:

11 марта 2010 г., 15:03:28

2010/3/11 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> There are some wrong in our implementation NISortDictionary. After
>> initialisation is ts_cache memory context 55MB long and pg takes
>> 190MB.
>
> What's your tsearch configuration exactly?
>

files: http://www.pgsql.cz/data/czech.tar.gz

configuration:

CREATE TEXT SEARCH DICTIONARY cspell  (template=ispell, dictfile = czech, afffile=czech, stopwords=czech);
CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
ALTER TEXT SEARCH CONFIGURATION cs  ALTER MAPPING FOR word, asciiword WITH cspell, simple;

then try: select * from ts_debug('cs','Příliš žluťoučký kůň se napil
žluté vody');

with some time (used fce clock())

cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used
After dictionary loading 320000
cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used Ispell dictionary init context: 27615288 total in 13
blocks;7710864 
free (12 chunks); 19904424 used
After AffFile loading 330000
cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used Ispell dictionary init context: 27615288 total
in13 blocks; 7710864 
free (20 chunks); 19904424 used
After stop words loading 330000
cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used Ispell dictionary init context: 27615288 total
in13 blocks; 7710864 
free (20 chunks); 19904424 used
****** 1 ******
cspell: 816952 total in 78 blocks; 9240 free (12 chunks); 807712 used Ispell dictionary init context: 27615288 total in
13blocks; 7710864 
free (20 chunks); 19904424 used
****** 2 ****** 380000
cspell: 825144 total in 79 blocks; 8440 free (10 chunks); 816704 used Ispell dictionary init context: 27615288 total in
13blocks; 7710864 
free (20 chunks); 19904424 used
****** 2.5 ****** 490000
// mkSPNode
cspell: 825144 total in 79 blocks; 8440 free (10 chunks); 816704 used Ispell dictionary init context: 27615288 total in
13blocks; 7710864 
free (20 chunks); 19904424 used
****** 3 ****** 580000
cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864 
free (20 chunks); 19904424 used
After dictionary sort 580000
cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864 
free (20 chunks); 19904424 used
After Affixes sort 580000
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864 
free (34 chunks); 19904424 used
final 580000
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864 
free (34 chunks); 19904424 used
executor start



>                        regards, tom lane
>

Re: tsearch profiling - czech environment - take 55MB

От

Pavel Stehule

Дата:

11 марта 2010 г., 16:09:08

2010/3/11 Pavel Stehule <pavel.stehule@gmail.com>:
> 2010/3/11 Tom Lane <tgl@sss.pgh.pa.us>:
>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>> There are some wrong in our implementation NISortDictionary. After
>>> initialisation is ts_cache memory context 55MB long and pg takes
>>> 190MB.
>>
>> What's your tsearch configuration exactly?
>>

I have a 64bit Linux.

The problem is in very large small allocations - there are 853215 nodes.

The memory can be minimalized with some block allocations

static void.
binit(void)
{
<------>data = NULL;
<------>allocated = 0;
}


static char *
balloc(size_t size)
{
<------>char *result;
<------>
<------>if (data == NULL || size > allocated )
<------>{
<------><------>data = palloc(1024 * 100);
<------><------>allocated = 1024 * 100;
<------>}
<------>
<------>result = data;
<------>data += size;
<------>allocated -= size;
<------>memset(result, 0, size);
<------>
<------>return result;
}

I replaced palloc0 inside mkSPnode by balloc

cspell: 25626352 total in 349 blocks; 11048 free (2 chunks); 25615304 used Ispell dictionary init context: 27615288
totalin 13 blocks; 7710864
 
free (34 chunks); 19904424 used

versus

cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context:
27615288total in 13 blocks; 7710864
 
free (34 chunks); 19904424 used

Regards
Pavel

Re: tsearch profiling - czech environment - take 55MB

От

Tom Lane

Дата:

11 марта 2010 г., 16:18:56

Pavel Stehule <pavel.stehule@gmail.com> writes:
> The problem is in very large small allocations - there are 853215 nodes.
> I replaced palloc0 inside mkSPnode by balloc

This goes back to the idea we've discussed from time to time of having a
variant memory context type in which pfree() is a no-op and we dispense
with all the per-chunk overhead.  I guess that if there really isn't any
overhead there then pfree/repalloc would actually crash :-( but for the
particular case of dictionaries that would probably be OK because
there's so little code that touches them.
        regards, tom lane

Re: tsearch profiling - czech environment - take 55MB

От

Pavel Stehule

Дата:

11 марта 2010 г., 16:34:37

2010/3/11 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> The problem is in very large small allocations - there are 853215 nodes.
>> I replaced palloc0 inside mkSPnode by balloc
>
> This goes back to the idea we've discussed from time to time of having a
> variant memory context type in which pfree() is a no-op and we dispense
> with all the per-chunk overhead.  I guess that if there really isn't any
> overhead there then pfree/repalloc would actually crash :-( but for the
> particular case of dictionaries that would probably be OK because
> there's so little code that touches them.

it has a sense. I was surprised how much memory is necessary :(. Some
smarter allocation save 50% - 2.5G for 100 users, what is important,
but I thing, so these data has to be shared. I believed to preloading,
but it is problematic - there are no data in shared preload time, and
the allocated size is too big.

Pavel

>
>                        regards, tom lane
>

Re: tsearch profiling - czech environment - take 55MB

От

Alvaro Herrera

Дата:

11 марта 2010 г., 18:30:14

Pavel Stehule escribió:
> 2010/3/11 Tom Lane <tgl@sss.pgh.pa.us>:
> > Pavel Stehule <pavel.stehule@gmail.com> writes:
> >> The problem is in very large small allocations - there are 853215 nodes.
> >> I replaced palloc0 inside mkSPnode by balloc
> >
> > This goes back to the idea we've discussed from time to time of having a
> > variant memory context type in which pfree() is a no-op and we dispense
> > with all the per-chunk overhead.  I guess that if there really isn't any
> > overhead there then pfree/repalloc would actually crash :-( but for the
> > particular case of dictionaries that would probably be OK because
> > there's so little code that touches them.
> 
> it has a sense. I was surprised how much memory is necessary :(. Some
> smarter allocation save 50% - 2.5G for 100 users, what is important,
> but I thing, so these data has to be shared. I believed to preloading,
> but it is problematic - there are no data in shared preload time, and
> the allocated size is too big.

Could it be mmapped and shared that way?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: tsearch profiling - czech environment - take 55MB

От

Pavel Stehule

Дата:

11 марта 2010 г., 18:32:59

2010/3/11 Alvaro Herrera <alvherre@commandprompt.com>:
> Pavel Stehule escribió:
>> 2010/3/11 Tom Lane <tgl@sss.pgh.pa.us>:
>> > Pavel Stehule <pavel.stehule@gmail.com> writes:
>> >> The problem is in very large small allocations - there are 853215 nodes.
>> >> I replaced palloc0 inside mkSPnode by balloc
>> >
>> > This goes back to the idea we've discussed from time to time of having a
>> > variant memory context type in which pfree() is a no-op and we dispense
>> > with all the per-chunk overhead.  I guess that if there really isn't any
>> > overhead there then pfree/repalloc would actually crash :-( but for the
>> > particular case of dictionaries that would probably be OK because
>> > there's so little code that touches them.
>>
>> it has a sense. I was surprised how much memory is necessary :(. Some
>> smarter allocation save 50% - 2.5G for 100 users, what is important,
>> but I thing, so these data has to be shared. I believed to preloading,
>> but it is problematic - there are no data in shared preload time, and
>> the allocated size is too big.
>
> Could it be mmapped and shared that way?

I don't know - I newer worked with mmap.

Pavel

>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: tsearch profiling - czech environment - take 55MB