postgresql - Postgres two subqueries poor performance [EAV model] -

August 15, 2015

i have table called "meta" 2.2 million records. table contains 3 columns: productid, key , value. key , value 2 columns containing meta description of product.

the following query takes 2.6 sec , returns 676 results (postgresql 8.4.13, centos 6.4 64-bit). query used retrieve possible meta descriptions filter (size) user filtered on 2 other filters (year , source).

i tried array solution topic made worse: postgresql in operator subquery poor performance

the 2 subqueries pretty fast (75ms , 178ms), combining them causes performance issues. there way rewrite query?

this current query:

select distinct on(value) key, value    "meta"   key = 'size'     , "productid" in (select "productid"            "meta"           "value" = 'ibm'             , "key" = 'source' )     , "productid" in (select "productid"           "meta"         "value" >= '1920'             , "value" <= '2010'              , "key" = 'year' )  order  value

with following explain analyze:

unique  (cost=38829.46..38843.19 rows=564 width=15) (actual time=2674.474..2690.856 rows=676 loops=1)   ->  sort  (cost=38829.46..38836.32 rows=2745 width=15) (actual time=2674.471..2681.333 rows=66939 loops=1)         sort key: public."meta".value         sort method:  quicksort  memory: 8302kb         ->  hash join  (cost=32075.86..38672.69 rows=2745 width=15) (actual time=472.158..2472.002 rows=66939 loops=1)               hash cond: (public."meta"."originalid" = public."meta"."productid")               ->  nested loop  (cost=15079.41..21563.33 rows=13109 width=23) (actual time=113.873..1013.113 rows=104307 loops=1)                     ->  hashaggregate  (cost=15079.41..15089.21 rows=980 width=4) (actual time=113.802..163.805 rows=105204 loops=1)                           ->  bitmap heap scan on "meta"  (cost=315.39..15051.42 rows=11196 width=4) (actual time=24.540..68.237 rows=105204 loops=1)                                 recheck cond: (((key)::text = 'source'::text) , ((value)::text = 'kadaster_woii_raf_usaaf'::text))                                 ->  bitmap index scan on "productmetadatakeyvalueindex"  (cost=0.00..312.60 rows=11196 width=0) (actual time=23.506..23.506 rows=105204 loops=1)                                       index cond: (((key)::text = 'source'::text) , ((value)::text = 'ibm'::text))                     ->  index scan using "idx_productid" on "meta"  (cost=0.00..6.59 rows=1 width=19) (actual time=0.006..0.008 rows=1 loops=105204)                           index cond: (public."meta"."productid" = public."meta"."productid")                           filter: ((public."meta".key)::text = 'size'::text)               ->  hash  (cost=16954.58..16954.58 rows=3350 width=4) (actual time=358.214..358.214 rows=184571 loops=1)                     ->  hashaggregate  (cost=16921.08..16954.58 rows=3350 width=4) (actual time=258.149..319.154 rows=184571 loops=1)                           ->  bitmap heap scan on "meta"  (cost=1172.62..16825.39 rows=38273 width=4) (actual time=86.725..167.110 rows=184571 loops=1)                                 recheck cond: (((key)::text = 'year'::text) , ((value)::text >= '1920'::text) , ((value)::text <= '2010'::text))                                 ->  bitmap index scan on "productmetadatakeyindex"  (cost=0.00..1163.05 rows=38273 width=0) (actual time=83.992..83.992 rows=184571 loops=1)                                       index cond: (((key)::text = 'year'::text) , ((value)::text >= '1920'::text) , ((value)::text <= '2010'::text)) total runtime: 2696.276 ms

defined indexes:

idx_productid   create index "idx_productid" on "meta" using btree ("productid")     productmetaunique_id    create unique index "productmetaunique_id" on "meta" using btree ("productid", key)      productmetadatakeyindex create index "productmetadatakeyindex" on "meta" using btree (key)   productmetadatakeyvalueindex    create index "productmetadatakeyvalueindex" on "meta" using btree (key, value)

first off, postgresql 8.1.4 antique. upgrade that, because every release since (8.2, 8.3, 8.4, 9.0, 9.1 , 9.2) have seen improvements query planner.

next, rewrite query use joins , group by, , possibly better plan.

select m1."value" meta m1 join meta m2 on m2."productid" = m1."productid"                , m2."key" = 'source'                , m2."value" = 'ibm' join meta m3 on m3."productid" = m1."productid"                , m3."key" = 'year'                , m3."value" between 1920 , 2010 m1."key" = 'size' group m1."value"

the latter use index on (key, product_id) , (product_id, key, value) index-only scan in pg 9.2, avoiding table lookups altogether.

next, case should have put data in products table directly. if you're querying against it, doesn't belong in meta in first place.

lastly, if really want keep stuff in meta, might case pays go in there exist statements:

select val unnest(array['known', 'sizes', 'go', 'here']::text[]) val exists (     select 1     meta m1     join meta m2 on m2."productid" = m1."productid"                    , m2."key" = 'source'                    , m2."value" = 'ibm'     join meta m3 on m3."productid" = m1."productid"                    , m3."key" = 'year'                    , m3."value" between 1920 , 2010     m1."key" = 'size'       , m1."value" = val   );

doing spare expensive group by, sort , unique operations.

Search This Blog

DIs

postgresql - Postgres two subqueries poor performance [EAV model] -

Comments

Post a Comment

Popular posts from this blog

php - cannot display multiple markers in google maps v3 from traceroute result -

php - Boolean search on database with 5 million rows, very slow -

css - Text drops down with smaller window -