Re: any solution for doing a data file import spawning it on multiple processes

Поиск
Список
Период
Сортировка
От hb@101-factory.eu
Тема Re: any solution for doing a data file import spawning it on multiple processes
Дата
Msg-id 9E7215C2-C9C6-4B5A-A4D9-825ED610FF84@101-factory.eu
обсуждение исходный текст
Ответ на Re: any solution for doing a data file import spawning it on multiple processes  (Edson Richter <edsonrichter@hotmail.com>)
Список pgsql-general
thanks all, i will be looking into it.

Met vriendelijke groet,

Henk

On 16 jun. 2012, at 18:23, Edson Richter <edsonrichter@hotmail.com> wrote:

> Em 16/06/2012 12:59, hb@101-factory.eu escreveu:
>> thanks i thought about splitting the file, but that did no work out well.
>>
>> so we receive 2 files evry 30 seconds and need to import this as fast as possible.
>>
>> we do not run java curently but maybe it's an option.
>> are you willing to share your code?
>>
>> also i was thinking using perl for it
>>
>>
>> henk
>>
>> On 16 jun. 2012, at 17:37, Edson Richter <edsonrichter@hotmail.com> wrote:
>>
>>> Em 16/06/2012 12:04, hb@101-factory.eu escreveu:
>>>> hi there,
>>>>
>>>> I am trying to import large data files into pg.
>>>> for now i used the. xarg linux command to spawn the file line for line and set  and use the  maximum available
connections.
>>>>
>>>> we use pg pool as connection pool to the database, and so try to maximize the concurrent data import of the file.
>>>>
>>>> problem for now that it seems to work well but we miss a line once in a while, and that is not acceptable. also it
createszombies ;(. 
>>>>
>>>> does anybody have any other tricks that will do the job?
>>>>
>>>> thanks,
>>>>
>>>> Henk
>>> I've used custom Java application using connection pooling (limited to 1000 connections, mean 1000 concurrent file
imports).
>>>
>>> I'm able to import more than 64000 XML files (about 13Kb each) in 5 minutes, without memory leaks neither zombies,
and(of course) no missing records. 
>>>
>>> Besides I each thread import separate file, I have another situation where I have separated threads importing
differentlines of same file. No problems at all. Do not forget to check your OS "file open" limits (it was a big issue
inthe past for me due Lucene indexes generated during import). 
>>>
>>> Server: 8 core Xeon, 16Gig, SAS 15000 rpm disks, PgSQL 9.1.3, Linux Centos 5, Sun Java 1.6.27.
>>>
>>> Regards,
>>>
>>> Edson Richter
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
> I'm not allowed to publish my company's code, but the logic if very easy to understand (you will have to "invent"
yourown solution, below code is bare bone): 
>
> class MainThread implements Runnable {
>    private boolean keepRunning = true;
>
>    public void run() {
>        while(keepRunning) {
>            try {
>                executeFiles();
>                Thread.sleep(30000); // sleep 30 seconds
>            } catch(Exception ex) {
>                ex.printStackTrace();
>            }
>        }
>    }
>
>    private void executeFiles() {
>        File monitorDir = new File("/var/mydatafolder/");
>        File processingDir = new File("/var/myprocessingfolder/");
>
>        // I'll import only files with names like "data20120621.csv":
>        FileFilter fileFilter = new FileFilter() {
>            public boolean accept(File file) {
>                boolean isfile = file.isFile() && !file.isHidden() && !file.isDirectory();
>                if(!isfile) return false;
>                String fname = file.getName();
>                return fname.startsWith("data") && (file.getName().endsWith("csv"));
>             }
>         };
>
>        List<File> forProcessing = monitorDir.listFiles(fileFilter);
>
>        for(File fileFound : forProcessing) {
>            // FileUtil is a utility class, you will have to create your own... your move method will vary according
yourOperating System 
>            FileUtil.move(fileFound, processingDir);
>            // ProcessFile is a class that implements Runnable, and do your stuff there...
>            Thread t = new Thread(new ProcessFile(processingDir, fileFound.getName()));
>            t.start();
>        }
>    }
>
>    /** Use this method to stop the thread from another place in your complex system! */
>    public void synchronized stopWorker() {
>        keepRunning = false;
>    }
>
>    public static void main(String [] args) {
>        Thread t = new Thread(new MainThread());
>        t.start();
>    }
> }
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

В списке pgsql-general по дате отправления:

Предыдущее
От: Yeb Havinga
Дата:
Сообщение: Re: v9.1.3 WITH with_query UPDATE
Следующее
От: Vibhor Kumar
Дата:
Сообщение: Re: v9.1.3 WITH with_query UPDATE