Обсуждение: FW: Slow query performance

Поиск
Список
Период
Сортировка

FW: Slow query performance

От
"Dann Corbit"
Дата:

From: Kevin Galligan [mailto:kgalligan@gmail.com]
Sent: Wednesday, October 29, 2008 3:16 PM
To: Dann Corbit
Subject: Re: [GENERAL] Slow query performance

 

Columns are as follows:

account  | integer               |
city     | character varying(20) |
zip      | character(5)          |
dincome  | character(1)          |
sex      | character(1)          |
mob      | boolean               |
religion | character(1)          |
groupcd  | character(1)          |
lastdata | character varying(4)  |
countycd | character varying(3)  |
state    | character varying(2)  |
dutype   | character varying(1)  |
orders   | integer               |
countysz | character varying(1)  |
ethnic   | character varying(2)  |
language | character varying(2)  |
cbsacode | character varying(5)  |
cbsatype | character varying(1)  |
age      | smallint              |
dob      | date                  |
ccard    | boolean               |
lor      | integer               |
bankcard | boolean               |
lastord  | date                  |
total    | integer               |
lmob     | boolean               |
homeown  | character varying(1)  |
ord1st   | date                  |
ordlast  | date                  |
married  | boolean               |
deptcard | boolean               |

>>

You did not show us the indexes.

If you have pgadmin III, go to the table and copy/paste the actual definition, including indexes.

<<
From here its about another 100 columns with either booleans or smallints, mostly null values.

I eventually killed the vacuum.  I will run it again, but was just going through the indexes.  All were of this format...

"INFO:  index "idx_jordlast" now contains 265658026 row versions in 728409 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.88s/0.13u sec elapsed 90.38 sec."

An example of a slow query is...

select count(*) from bigdatatable where age between 22 and 40 and state = 'NY';
>>

Is there an index on state and age?

Is there an index on state?

Is there an index on age?

That is important missing information.

If there is no index on either column, then you will do a table scan.

If all of your slow queries look like the above, then create a clustered index on state,age

<<
I know count is not optimized on postgresql like it is on mysql due to transaction isolation (at least that's what I've read.  Makes sense to me).  I understand it'll take time to actually count the rows.  However, here's the output of 'explain analyze select count(*) from bigdatatable where age between 22 and 40 and state = 'NY';'

 Aggregate  (cost=5179639.55..5179639.56 rows=1 width=0) (actual time=389529.895..389529.897 rows=1 loops=1)
   ->  Bitmap Heap Scan on bigdatatable  (cost=285410.65..5172649.63 rows=2795968 width=0) (actual time=6727.848..387159.175                                 rows=2553273 loops=1)
         Recheck Cond: ((state)::text = 'NY'::text)
         Filter: ((age >= 22) AND (age <= 40))
         ->  Bitmap Index Scan on idx_jstate  (cost=0.00..284711.66 rows=15425370 width=0) (actual time=6298.950..6298.950 ro                                ws=16821828 loops=1)
               Index Cond: ((state)::text = 'NY'::text)
 Total runtime: 389544.088 ms

It looks like the index scans are around 6 seconds or so each, which is fine.  then it looks like "Aggregate" suddenly jumps up to 6 minutes.

I know the database design is crude.  Its really just a big flat table.  I didn't put too much into weeding out which columns should be indexed and which shouldn't (just slapped an index on each).  Happy to do that work, but right now I'm in panic mode and just need to figure out which way to start going.  I had a design on mysql which worked pretty good at 10 to 20 % of full size, but degraded quite a bit at full size.  compounding this is there is another implementation we've seen that uses the full size of similar data and returns actual results in seconds (I originally planned to used a 5% size db for estimated results, then the full size for getting the actual data.  This plan was rejected :(

Any thoughts?  It seemed to work OK when I had a table with 10 cols but about the same data length.  That may have been an artificial test, though.

Again.  This is read-only once the data is set up.  Client wants to run pretty much arbitrary queries, so its hard to isolate certain things for optimization, although there are some "knowns".

Will start the full vacuum process again.

Thanks in advance,
-Kevin

On Wed, Oct 29, 2008 at 4:52 PM, Dann Corbit <DCorbit@connx.com> wrote:
>> -----Original Message-----
>> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
>> owner@postgresql.org] On Behalf Of Kevin Galligan
>> Sent: Wednesday, October 29, 2008 1:18 PM
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Slow query performance
>>
>> I'm approaching the end of my rope here.  I have a large database.
>> 250 million rows (ish).  Each row has potentially about 500 pieces of
>> data, although most of the columns are sparsely populated.
>>
>> What I'm trying to do is, essentially, search for sub-sets of that
>> data based on arbitrary queries of those data columns.  the queries
>> would be relatively simple ("dirbtl is not null and qqrq between 20
>> and 40").  After the database is built, it is read only.
>>
>> So, I started with maybe 10-15 fields in a main table, as most records
>> have values for those fields.  Then had individual tables for the
>> other values.  The idea is that the percentage of rows with values
>> drops off significantly after those main tables.  That, an each
>> individual query looks at probably 3 or 4 fields in total.  The
>> performance of those queries was pretty bad.  Its got to join large
>> numbers of values, which didn't really work out well.
>>
>> So, went the other direction completely.  I rebuilt the database with
>> a much larger main table.  Any values with 5% or greater filled in
>> rows were added to this table.  Maybe 130 columns.  Indexes applied to
>> most of these.  Some limited testing with a smaller table seemed to
>> indicate that queries on a single table without a join would work much
>> faster.
>>
>> So, built that huge table.  now query time is terrible.  Maybe a
>> minute or more for simple queries.
>>
>> I'm running vacuum/analyze right now (which is also taking forever,
>> BTW).
>>
>> The box has 15 g of ram.  I made the shared_buffers setting to 8 or 9
>> gig.  My first question, what would be better to bump up to increase
>> the performance?  I thought that was the field to jack up to improve
>> query time or index caching, but I've read conflicting data.  The 15
>> ram is available.
>>
>> I originally had this in mysql.  Also bad performance.  I understand
>> how to optimize that much better, but it just wasn't cutting it.
>>
>> Anyway, help with tuning the settings would be greatly appreciated.
>> Advice on how best to lay this out would also be helpful (I know its
>> difficult without serious detail).
>
> Show the full schema for your table and the output of:
> VACUUM VERBOSE ANALYZE <your_table>
>
> Show the query that is slow.
>
>

Re: FW: Slow query performance

От
"Dann Corbit"
Дата:

From: Kevin Galligan [mailto:kgalligan@gmail.com]
Sent: Wednesday, October 29, 2008 4:34 PM
To: Dann Corbit
Cc: pgsql-general@postgresql.org
Subject: Re: FW: [GENERAL] Slow query performance

 

Sorry for the lack of detail.  Index on both state and age.  Not a clustered on both as the queries are fairly arbitrary (that's the short answer.  The long answer is that, at least with those columns, something like that MAY be an option later but I don't know enough now).

I don't have the gui admin set up, as I'm doing this over ssh.  Will get the full table definition in a bit.  The short answer is simple btree indexes on the columns being searched.  I was applying simple indexes on all the columns, as there will be queries like "where [col] is not null", although in retrospect, that's fairly pointless unless the column has very little data.  Even then, maybe.

Anyway, looking at the output, the time goes from 6727.848 to 387159.175 during the bitmap heap scan (I was reading it wrong about the Aggregate line).  Considering the size involved, is this something that postgre has decided is too big to be done in memory?  That would be my wild guess.

Ran another query.  this one even simpler.  Still quite long...

explain analyze select count(*) from bigdatatable where age between 22 and 23 and state = 'NY';
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=37.41..37.42 rows=1 width=0) (actual time=217998.706..217998.707 rows=1 loops=1)
   ->  Index Scan using idx_jage on bigdatatable  (cost=0.00..37.41 rows=1 width=0) (actual time=247.209..217988.584 rows=10303 loops=1)
         Index Cond: ((age >= 22) AND (age <= 23))
         Filter: ((state)::text = 'NY'::text)
 Total runtime: 217998.800 ms

Abbreviated schema below.  The table is huge.  Originally I had a design with a main "anchor" table that had all records, and most of those columns were in other tables I would join for the search.  This didn't perform very well, so I decided to go the other way and see how it worked.  I did a count on all data, and anything with records in fewer than 5 percent of the rows, I put in their own table.  Everything else is in this big one.  The indexing strategy is a joke right now.  I just applied one to each.  This is still in the testing phase.

I had set this up on mysql.  The joins on the full size db turned out to be terrible.  I'm currently setting up the "one large table" design on mysql to see how that works.

The obvious answer would be that the table is huge, so when the query is running, its grabbing all that data and not using much of it.  True.  However, I'm not sure how to approach the design now.  Its rarely going to need data from anything other than a few columns, but joining across 10's or 100's of millions of records didn't seem that much fun either.  Thoughts?

CREATE TABLE bigdatatable (
    account integer,
    city character varying(20),
    zip character(5),
    dincome character(1),
    sex character(1),
    mob boolean,
    religion character(1),
    groupcd character(1),
    lastdata character varying(4),
    countycd character varying(3),
    state character varying(2),
    dutype character varying(1),
    orders integer,
    countysz character varying(1),
    language character varying(2),
    cbsacode character varying(5),
    cbsatype character varying(1),
    age smallint,
    dob date,
    ccard boolean,
    lor integer,
    bankcard boolean,
    lastord date,
    total integer,
    lmob boolean,
    homeown character varying(1),
    ord1st date,
    ordlast date,
    married boolean,
    deptcard boolean,
    ordtotm smallint,
    ordlastm date,
    ord1stm date,
    orddolm smallint,
    pcuser boolean,
    homeval character varying(1),
    mailresp boolean,
    lhomepc boolean,
    dirrspby boolean,
    mgift boolean,
    lebuyer boolean,
    payother smallint,
    lhomdecr boolean,
    driver boolean,
    ordtote smallint,
    ord1ste date,
    ordlaste date,
    orddole smallint,
    mhmdecor boolean,
    oddsnend smallint,
    aptot smallint,
    apwk smallint,
    apdol smallint,
    payccrd smallint,
    landval smallint,
    mfapparl boolean,
    mgengift boolean,
    homeblt smallint,
    homebydt date,
    educate character varying(1),
    children boolean,
    payvisa smallint,
    hmfr smallint,
    maghlth smallint,
    homebypx integer,
    gfhol smallint,
    mbeauty boolean,
    apwmtot smallint,
    apwmwk smallint,
    apwmdol smallint,
    travlseg integer,
    lhealth boolean,
    lcharity boolean,
    moutdoor boolean,
    occupatn character varying(4),
    fundrais boolean,
    msports boolean,
    hg smallint,
    magfam smallint,
    melectrc boolean,
    lelectrc boolean,
    bankcrd1 boolean,
    lfoodck boolean,
    mfood boolean,
    finance boolean,
    hmfrntot smallint,
    hmfrnwk smallint,
    hmfrndol smallint,
    paymastr smallint,
    lgarden boolean,
    lartantq boolean,
    martantq boolean,
    hb smallint,
    mmaleap boolean,
    lhwrkshp boolean,
    hghmctot smallint,
    hmhmcwk smallint,
    hghmcdol smallint,
    paycash smallint,
    studntln boolean,
    lwelness boolean,
    opportun smallint,
    gftot smallint,
    gfwk smallint,
    giftinst smallint,
    gfdol smallint,
    mchildrn boolean,
    chtot smallint,
    chwk smallint,
    chdol smallint,
    hbhltot smallint,
    hbhltwk smallint,
    lifestyl smallint,
    hmhwrtot smallint,
    hmhwrwk smallint,
    lsports boolean,
    hmhwrdol smallint,
    hbhltdol smallint,
    mfemplus boolean,
    mhightkt boolean,
    apmntot smallint,
    apmnwk smallint,
    sltot smallint,
    slwk smallint,
    hmlintot smallint,
    hmlinwk smallint,
    hmlindol smallint,
    sldol smallint,
    genmwk smallint,
    genmtot smallint,
    genmdol smallint,
    apmndol smallint,
    chprods smallint,
    lfitness boolean,
    lculturl boolean
);


CREATE INDEX idx_jage ON bigdatatable USING btree (age);
CREATE INDEX idx_japdol ON bigdatatable USING btree (apdol);
CREATE INDEX idx_japtot ON bigdatatable USING btree (aptot);
CREATE INDEX idx_japwk ON bigdatatable USING btree (apwk);
CREATE INDEX idx_japwmtot ON bigdatatable USING btree (apwmtot);
CREATE INDEX idx_jbankcard ON bigdatatable USING btree (bankcard);
CREATE INDEX idx_jcbsacode ON bigdatatable USING btree (cbsacode);
CREATE INDEX idx_jcbsatype ON bigdatatable USING btree (cbsatype);
CREATE INDEX idx_jccard ON bigdatatable USING btree (ccard);
CREATE INDEX idx_jchildren ON bigdatatable USING btree (children);
CREATE INDEX idx_jcountycd ON bigdatatable USING btree (countycd);
CREATE INDEX idx_jcountysz ON bigdatatable USING btree (countysz);
CREATE INDEX idx_jdeptcard ON bigdatatable USING btree (deptcard);
CREATE INDEX idx_jdirrspby ON bigdatatable USING btree (dirrspby);
CREATE INDEX idx_jdob ON bigdatatable USING btree (dob);
CREATE INDEX idx_jdriver ON bigdatatable USING btree (driver);
CREATE INDEX idx_jdutype ON bigdatatable USING btree (dutype);
CREATE INDEX idx_jeducate ON bigdatatable USING btree (educate);
CREATE INDEX idx_jfundrais ON bigdatatable USING btree (fundrais);
CREATE INDEX idx_jgfhol ON bigdatatable USING btree (gfhol);
CREATE INDEX idx_jhmfr ON bigdatatable USING btree (hmfr);
CREATE INDEX idx_jhomeblt ON bigdatatable USING btree (homeblt);
CREATE INDEX idx_jhomebydt ON bigdatatable USING btree (homebydt);
CREATE INDEX idx_jhomeown ON bigdatatable USING btree (homeown);
CREATE INDEX idx_jhomeval ON bigdatatable USING btree (homeval);
CREATE INDEX idx_jlandval ON bigdatatable USING btree (landval);
CREATE INDEX idx_jlanguage ON bigdatatable USING btree (language);
CREATE INDEX idx_jlastord ON bigdatatable USING btree (lastord);
CREATE INDEX idx_jlebuyer ON bigdatatable USING btree (lebuyer);
CREATE INDEX idx_jlhealth ON bigdatatable USING btree (lhealth);
CREATE INDEX idx_jlhomdecr ON bigdatatable USING btree (lhomdecr);
CREATE INDEX idx_jlhomepc ON bigdatatable USING btree (lhomepc);
CREATE INDEX idx_jlmob ON bigdatatable USING btree (lmob);
CREATE INDEX idx_jlor ON bigdatatable USING btree (lor);
CREATE INDEX idx_jmaghlth ON bigdatatable USING btree (maghlth);
CREATE INDEX idx_jmailresp ON bigdatatable USING btree (mailresp);
(a bunch more in here...)
CREATE INDEX idx_jstate ON bigdatatable USING btree (state);
CREATE INDEX idx_jtotal ON bigdatatable USING btree (total);

>> 

These indexes are not going to be very selective.  For instance, if you get the state index, you will scan 1/50th of the data on average and for big states like NY, CA, TX a lot more than 1/50th.  Similarly for AGE, especially when you ask for a range.  Wading through a even a small percentage of the data using an index is surprisingly expensive when neither the data nor the index is clustered.  Indexes come into their own when you trim the data by several orders of magnitude with a specific sort of query.

If you have an index on (STATE, AGE) (for instance) then you will filter  through perhaps 1/50th as much as you would with a single column index.

Having a zillion indexes like that will make updates really slow.

It might be worth considering a column database model.

<< 

Re: FW: Slow query performance

От
"Kevin Galligan"
Дата:
I agree with the concept.  The issue is the application is an open ended query tool.  So, pretty much whatever they feel like entering is valid.  I totally understand the indexes aren't very selective.  I guess I just don't know what the next step is.  There aren't a lot of assumptions I can make about how the queries are run.

I'm in the uniquely bad situation that there exists something with very similar data that works, so its impossible to tell them "You can't run a state query by itself with some other data.  You always needs an age bound" or whatever.  The other application works reasonably fast.  I also don't know how its built, which makes my life that much more difficult.

All of my experience is with small or medium sized databases.  the front end itself is done, but I've had the client on the hook for a while now trying to sort this out.  Not sure what the next step is.

I tried the other extreme end.  Age in its own table.  Just 'account integer' and 'fval smallint'.  fval is the age value.

explain analyze select count(*) from jage where fval between 22 and 33;
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1499316.66..1499316.67 rows=1 width=0) (actual time=75985.403..75985.404 rows=1 loops=1)
   ->  Bitmap Heap Scan on jage  (cost=365374.78..1456268.39 rows=17219307 width=0) (actual time=7930.354..56879.811 rows=18016538 loops=1)
         Recheck Cond: ((fval >= 22) AND (fval <= 33))
         ->  Bitmap Index Scan on idx_tjage  (cost=0.00..361069.96 rows=17219307 width=0) (actual time=7084.535..7084.535 rows=18016538 loops=1)
               Index Cond: ((fval >= 22) AND (fval <= 33))
 Total runtime: 76015.215 ms


Still took over a minute to do the count.  It seems like the index scan happens pretty fast, but the last steps go from 7 or 8 seconds to over a minute.

On Wed, Oct 29, 2008 at 7:52 PM, Dann Corbit <DCorbit@connx.com> wrote:

From: Kevin Galligan [mailto:kgalligan@gmail.com]
Sent: Wednesday, October 29, 2008 4:34 PM
To: Dann Corbit
Cc: pgsql-general@postgresql.org
Subject: Re: FW: [GENERAL] Slow query performance

 

Sorry for the lack of detail.  Index on both state and age.  Not a clustered on both as the queries are fairly arbitrary (that's the short answer.  The long answer is that, at least with those columns, something like that MAY be an option later but I don't know enough now).

I don't have the gui admin set up, as I'm doing this over ssh.  Will get the full table definition in a bit.  The short answer is simple btree indexes on the columns being searched.  I was applying simple indexes on all the columns, as there will be queries like "where [col] is not null", although in retrospect, that's fairly pointless unless the column has very little data.  Even then, maybe.

Anyway, looking at the output, the time goes from 6727.848 to 387159.175 during the bitmap heap scan (I was reading it wrong about the Aggregate line).  Considering the size involved, is this something that postgre has decided is too big to be done in memory?  That would be my wild guess.

Ran another query.  this one even simpler.  Still quite long...

explain analyze select count(*) from bigdatatable where age between 22 and 23 and state = 'NY';
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=37.41..37.42 rows=1 width=0) (actual time=217998.706..217998.707 rows=1 loops=1)
   ->  Index Scan using idx_jage on bigdatatable  (cost=0.00..37.41 rows=1 width=0) (actual time=247.209..217988.584 rows=10303 loops=1)
         Index Cond: ((age >= 22) AND (age <= 23))
         Filter: ((state)::text = 'NY'::text)
 Total runtime: 217998.800 ms

Abbreviated schema below.  The table is huge.  Originally I had a design with a main "anchor" table that had all records, and most of those columns were in other tables I would join for the search.  This didn't perform very well, so I decided to go the other way and see how it worked.  I did a count on all data, and anything with records in fewer than 5 percent of the rows, I put in their own table.  Everything else is in this big one.  The indexing strategy is a joke right now.  I just applied one to each.  This is still in the testing phase.

I had set this up on mysql.  The joins on the full size db turned out to be terrible.  I'm currently setting up the "one large table" design on mysql to see how that works.

The obvious answer would be that the table is huge, so when the query is running, its grabbing all that data and not using much of it.  True.  However, I'm not sure how to approach the design now.  Its rarely going to need data from anything other than a few columns, but joining across 10's or 100's of millions of records didn't seem that much fun either.  Thoughts?

CREATE TABLE bigdatatable (
    account integer,
    city character varying(20),
    zip character(5),
    dincome character(1),
    sex character(1),
    mob boolean,
    religion character(1),
    groupcd character(1),
    lastdata character varying(4),
    countycd character varying(3),
    state character varying(2),
    dutype character varying(1),
    orders integer,
    countysz character varying(1),
    language character varying(2),
    cbsacode character varying(5),
    cbsatype character varying(1),
    age smallint,
    dob date,
    ccard boolean,
    lor integer,
    bankcard boolean,
    lastord date,
    total integer,
    lmob boolean,
    homeown character varying(1),
    ord1st date,
    ordlast date,
    married boolean,
    deptcard boolean,
    ordtotm smallint,
    ordlastm date,
    ord1stm date,
    orddolm smallint,
    pcuser boolean,
    homeval character varying(1),
    mailresp boolean,
    lhomepc boolean,
    dirrspby boolean,
    mgift boolean,
    lebuyer boolean,
    payother smallint,
    lhomdecr boolean,
    driver boolean,
    ordtote smallint,
    ord1ste date,
    ordlaste date,
    orddole smallint,
    mhmdecor boolean,
    oddsnend smallint,
    aptot smallint,
    apwk smallint,
    apdol smallint,
    payccrd smallint,
    landval smallint,
    mfapparl boolean,
    mgengift boolean,
    homeblt smallint,
    homebydt date,
    educate character varying(1),
    children boolean,
    payvisa smallint,
    hmfr smallint,
    maghlth smallint,
    homebypx integer,
    gfhol smallint,
    mbeauty boolean,
    apwmtot smallint,
    apwmwk smallint,
    apwmdol smallint,
    travlseg integer,
    lhealth boolean,
    lcharity boolean,
    moutdoor boolean,
    occupatn character varying(4),
    fundrais boolean,
    msports boolean,
    hg smallint,
    magfam smallint,
    melectrc boolean,
    lelectrc boolean,
    bankcrd1 boolean,
    lfoodck boolean,
    mfood boolean,
    finance boolean,
    hmfrntot smallint,
    hmfrnwk smallint,
    hmfrndol smallint,
    paymastr smallint,
    lgarden boolean,
    lartantq boolean,
    martantq boolean,
    hb smallint,
    mmaleap boolean,
    lhwrkshp boolean,
    hghmctot smallint,
    hmhmcwk smallint,
    hghmcdol smallint,
    paycash smallint,
    studntln boolean,
    lwelness boolean,
    opportun smallint,
    gftot smallint,
    gfwk smallint,
    giftinst smallint,
    gfdol smallint,
    mchildrn boolean,
    chtot smallint,
    chwk smallint,
    chdol smallint,
    hbhltot smallint,
    hbhltwk smallint,
    lifestyl smallint,
    hmhwrtot smallint,
    hmhwrwk smallint,
    lsports boolean,
    hmhwrdol smallint,
    hbhltdol smallint,
    mfemplus boolean,
    mhightkt boolean,
    apmntot smallint,
    apmnwk smallint,
    sltot smallint,
    slwk smallint,
    hmlintot smallint,
    hmlinwk smallint,
    hmlindol smallint,
    sldol smallint,
    genmwk smallint,
    genmtot smallint,
    genmdol smallint,
    apmndol smallint,
    chprods smallint,
    lfitness boolean,
    lculturl boolean
);


CREATE INDEX idx_jage ON bigdatatable USING btree (age);
CREATE INDEX idx_japdol ON bigdatatable USING btree (apdol);
CREATE INDEX idx_japtot ON bigdatatable USING btree (aptot);
CREATE INDEX idx_japwk ON bigdatatable USING btree (apwk);
CREATE INDEX idx_japwmtot ON bigdatatable USING btree (apwmtot);
CREATE INDEX idx_jbankcard ON bigdatatable USING btree (bankcard);
CREATE INDEX idx_jcbsacode ON bigdatatable USING btree (cbsacode);
CREATE INDEX idx_jcbsatype ON bigdatatable USING btree (cbsatype);
CREATE INDEX idx_jccard ON bigdatatable USING btree (ccard);
CREATE INDEX idx_jchildren ON bigdatatable USING btree (children);
CREATE INDEX idx_jcountycd ON bigdatatable USING btree (countycd);
CREATE INDEX idx_jcountysz ON bigdatatable USING btree (countysz);
CREATE INDEX idx_jdeptcard ON bigdatatable USING btree (deptcard);
CREATE INDEX idx_jdirrspby ON bigdatatable USING btree (dirrspby);
CREATE INDEX idx_jdob ON bigdatatable USING btree (dob);
CREATE INDEX idx_jdriver ON bigdatatable USING btree (driver);
CREATE INDEX idx_jdutype ON bigdatatable USING btree (dutype);
CREATE INDEX idx_jeducate ON bigdatatable USING btree (educate);
CREATE INDEX idx_jfundrais ON bigdatatable USING btree (fundrais);
CREATE INDEX idx_jgfhol ON bigdatatable USING btree (gfhol);
CREATE INDEX idx_jhmfr ON bigdatatable USING btree (hmfr);
CREATE INDEX idx_jhomeblt ON bigdatatable USING btree (homeblt);
CREATE INDEX idx_jhomebydt ON bigdatatable USING btree (homebydt);
CREATE INDEX idx_jhomeown ON bigdatatable USING btree (homeown);
CREATE INDEX idx_jhomeval ON bigdatatable USING btree (homeval);
CREATE INDEX idx_jlandval ON bigdatatable USING btree (landval);
CREATE INDEX idx_jlanguage ON bigdatatable USING btree (language);
CREATE INDEX idx_jlastord ON bigdatatable USING btree (lastord);
CREATE INDEX idx_jlebuyer ON bigdatatable USING btree (lebuyer);
CREATE INDEX idx_jlhealth ON bigdatatable USING btree (lhealth);
CREATE INDEX idx_jlhomdecr ON bigdatatable USING btree (lhomdecr);
CREATE INDEX idx_jlhomepc ON bigdatatable USING btree (lhomepc);
CREATE INDEX idx_jlmob ON bigdatatable USING btree (lmob);
CREATE INDEX idx_jlor ON bigdatatable USING btree (lor);
CREATE INDEX idx_jmaghlth ON bigdatatable USING btree (maghlth);
CREATE INDEX idx_jmailresp ON bigdatatable USING btree (mailresp);
(a bunch more in here...)
CREATE INDEX idx_jstate ON bigdatatable USING btree (state);
CREATE INDEX idx_jtotal ON bigdatatable USING btree (total);

>> 

These indexes are not going to be very selective.  For instance, if you get the state index, you will scan 1/50th of the data on average and for big states like NY, CA, TX a lot more than 1/50th.  Similarly for AGE, especially when you ask for a range.  Wading through a even a small percentage of the data using an index is surprisingly expensive when neither the data nor the index is clustered.  Indexes come into their own when you trim the data by several orders of magnitude with a specific sort of query.

If you have an index on (STATE, AGE) (for instance) then you will filter  through perhaps 1/50th as much as you would with a single column index.

Having a zillion indexes like that will make updates really slow.

It might be worth considering a column database model.

<<