Re: COPY enhancements
От | Robert Haas |
---|---|
Тема | Re: COPY enhancements |
Дата | |
Msg-id | 603c8f070910071930t42c2a416tbef4d57b7fe3ca7f@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: COPY enhancements (Emmanuel Cecchet <manu@asterdata.com>) |
Ответы |
Re: COPY enhancements
Re: COPY enhancements Re: COPY enhancements Re: COPY enhancements |
Список | pgsql-hackers |
On Fri, Sep 25, 2009 at 10:01 AM, Emmanuel Cecchet <manu@asterdata.com> wrote: > Robert, > > Here is the new version of the patch that applies to CVS HEAD as of this > morning. > > Emmanuel I took a look at this patch tonight and, having now read through some of it, I have some more detailed comments. It seems quite odd to me that when COPY succeeds but there are errors, the transaction commits. The only indication that some of my data didn't end up in the table is that the output says "COPY n" where n is less than the total number of rows I attempted to copy. On the other hand, it would be equally bad if the transaction aborted, because then my error logging table would not get populated - I note that that's the behavior we do get if the max errors threshold is exceeded. I'm not sure what the right answer is here, but it needs some thought and discussion. I think at a minimum the caller needs some indication of the number of FAILED rows, not just the number of succesful ones. What's really bad about this is that a flag called "error_logging" is actually changing the behavior of the command in a way that is far more dramatic than (and doesn't actually have much to do with) error logging. It's actually making a COPY command succeed that would otherwise have failed. That's not just a logging change. I think the underlying problem is that there is more than one feature here, and they're kind of mixed together. The fact that the partitioning code needs to be separated from the error logging stuff has already been discussed, but I think it actually needs to be broken down even further. I think there are three separate features in the error logging code: A. Ability to ignore a certain number of errors (of certain types?) and still get the other tuples into the table anyway. B. Ability to return error information in a structured format rather than as an exception message. C. Ability to direct the structured error messages to a table. Each of those features deserves a separate discussion to decide whether we want it and how best to implement it. Personally, I think we should skip (C), at least as a starting point. Instead of logging to a table, I think we should consider making COPY return the tuples indicating the error as a query result, the same way EXPLAIN returns the query plan. It looks like this would eliminate the need for at least three of the new COPY options without losing much functionality. I also think that, if possible, (A) and (B) should be implemented as separate features, so that each one can be used separately or both can be used in combination. Some other notes: 1. The copy_errorlogging.out regression test fails, at least on my machine. I guess this problem was discussed previously, but perhaps you should change the setup of the test in some way so that you can generate a patch that doesn't require manual fiddling to get it the regression tests to pass. 2. The error logging doesn't seem to work in all cases. rhaas=# copy foo from '/home/rhaas/x' (error_logging, error_logging_table_name 'errlogging'); ERROR: duplicate key value violates unique constraint "foo_pkey" DETAIL: Key (id)=(1) already exists. CONTEXT: COPY foo, line 1: "1 Robert" I assume that this fails because the code is trained to catch some specific category of errors and redirect those to the error logging table while letting everything take its natural course. I don't feel that that's a terribly useful behavior, and on the other hand, I'm not sure it's going to be possible to fix it without killing performance. 3. The option checking for the error_logging options is insufficient. For example, if ERROR_LOGGING_TABLE_NAME is specified but ERROR_LOGGING is not specified, COPY still works (and just ignores the ERROR_LOGGING_TABLE_NAME). Note that this is quite different from what HEADER does, for example. 4. Specifiying an error_logging_table_name with spaces or other funny characters in it fails. COPY foo FROM STDIN (ERROR_LOGGING, ERROR_LOGGING_TABLE_NAME 'foo bar baz'); 5. The assumption that the public schema is an appropriate default does not seem right to me. I would think that we would want to default to the first schema in the user's search path. 6. errorlogging.c is poorly named, because it has specifically to do with logging errors in COPY, rather than with error logging in general; there's also no reason for it to live in utils/misc. Also, it does not include the same headers as the other files in the same directory. 7. There's no need for the OidLinkedList structure; we already have a system for handling lists of OIDs. See pg_list.h. 8. The OID cache is interesting; have you benchmarked this to see how much it improves performance? If so, you should describe your methodology and performance results. It might be worthwhile to leave this out of the first version of the partitioning patch for simplicity and add submit as a follow-on performance patch. Do we really need a GUC to size this cache, or is there a size beyond which it doesn't help much? 9. The style of the code is not consistent with what PostgreSQL does elsewhere. The use of '\see' is very distracting. The style of comments does not consistently patch the PostgreSQL style: in places comments start with "/**", and there's one in copy.c that reads "/* <--- Error logging function ---> */". Also, the standard way to write a multiline comment is with the text starting on the line following "/*", not the "/*" line itself. Also, some variables are declared as "type * name" rather than "type *name". 10. This patch twiddles with the regression tests in ways that are not related to the functionality it is adding; consistent with the idea that a patch should do one thing and do it well, you should revert this part. Overall, I think that this patch is not very close to being committable and accordingly I am going to mark it as Returned with Feedback. However, I hope that the discussion will continue and that we can get consensus on some changes for next CommitFest. ...Robert
В списке pgsql-hackers по дате отправления: