Re: performance issue using DBI

Поиск
Список
Период
Сортировка
От Joshua b. Jore
Тема Re: performance issue using DBI
Дата
Msg-id Pine.BSO.4.44.0206060832210.2703-100000@kitten.greentechnologist.org
обсуждение исходный текст
Ответ на performance issue using DBI  ("Nicolas Nolst" <nnolst@hotmail.com>)
Ответы Re: performance issue using DBI
Список pgsql-general
How much data is 20,000 lines? 20KB? 100MB? You might do well to just
process this in memory and then just COPY the right data to the table.
This gets away from doing the SELECT/INSERT/SELECT/INSERT thing which is
going to be painful for batch jobs.

See, the thing is that on every insert the indexes have to be updated.
They won't be used until after you VACUUM the tables so it does no good
inside your transaction. Drop the indexes.

Also, leave off with your use of currval/nextval. The point of a serial
type is that the column increments each time automagically. For what you
are doing you could turn that serial into an integer, create your sequence
separately, exec nextval('yourseq') and then just use the value you
retrieved. Don't use currval since some other process might alter the
sequence between the time you call nextval and currval. Just store the
value.

I think this problem is better solved on the perl side than on the
PostgreSQL side. Consider using data structures like so. You can probably
create a better structure since you know your data and I don't.

sessions
{ remote_ip => { phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               },
                 phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               }
               },
  remote_ip => { phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               },
                 phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               }
               }
}

actions - now session_id is the array offset.
[ [ url, timestamp ],
  [ url, timestamp ],
  [ url, timestamp ] ],
[ [ url, timestamp ],
  [ url, timestamp ],
  [ url, timestamp ] ],
[ [ url, timestamp ],
  [ url, timestamp ],
  [ url, timestamp ] ],


> If the session already exists I add a line in the table actions with a INSERT
>
> If the session doesn't exist or if the criteria is true, I add a line in the
> table sessions with an INSERT and then add  a line with a INSERT in the table actions (I use nextval and currval).
>
> I have put indexes on sessions(session_id), sessions(msisdn),
> actions(session_id) and actions(timestamp). I process one log file of about 20000 lines every day. All the lines are
> processed in one transaction (autocommit set to 0).
>
> My problem is that populating my database is slower  when the data gets bigger
> and the performance falls dramatically. I thought that is would be improve with
> my indexes but the problem still persists.
>
> Could you please give me some clues that could solve this issue.
>
> Thanks.
>
>
>
> Nicolas Nolst
> [belgium_gs.gif]
>
>
______________________________________________________________________________________________________________________________
> MSN Photos is the easiest way to share and print your photos: Click Here
>
>





В списке pgsql-general по дате отправления:

Предыдущее
От: "Kasper (swebase)"
Дата:
Сообщение: pg_shadow
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: pg_shadow