pgroongaとpg_bigmの比較をとりいそぎ

2015年1月22日日常

PostgreSQLで全文検索をするインデックスを作成するのに、
いままでsenna（とtextsearch_senna）を使っていたのですが、
今回DBを9.4にしたところ使えなくなってしまっていたので、
現在色々調査中です。
・pgroonga
・pg_bigm
を試運転してみました。

■環境
OS:CentOS5.5(32bit)
PostgreSQL:9.4
※割と古いテスト機なので、32bitです。注意。
64bit機で試した方が良かった感。というか試します。
データセットは6万件くらいのニュース見出しです。
【2015.01.22 21時修正】この環境の項目を上に持ってきました。

とりいそぎ結果をぺたぺ貼っておく。

■pgroonga
QUERY PLAN
Bitmap Heap Scan on items (cost=170.91..2826.87 rows=21517 width=79) (actual time=18.649..19.660 rows=429 loops=1)
Recheck Cond: (disp_title %% ’インフルエンザ’::text)
Heap Blocks: exact=368
-> Bitmap Index Scan on items_disp_title_pgroonga_index (cost=0.00..165.53 rows=21517 width=0) (actual time=18.585..18.585 rows=429 loops=1)
Index Cond: (disp_title %% ’インフルエンザ’::text)
Planning time: 0.105 ms
Execution time: 20.210 ms

QUERY PLAN
Bitmap Heap Scan on items (cost=114.42..2662.79 rows=10758 width=79) (actual time=2.510..2.656 rows=43 loops=1)
Recheck Cond: ((disp_title %% ’インフルエンザ’::text) AND (disp_title %% ’小児’::text))
Heap Blocks: exact=42
-> Bitmap Index Scan on items_disp_title_pgroonga_index (cost=0.00..111.73 rows=10758 width=0) (actual time=2.488..2.488 rows=43 loops=1)
Index Cond: ((disp_title %% ’インフルエンザ’::text) AND (disp_title %% ’小児’::text))
Planning time: 0.113 ms
Execution time: 2.798 ms
※AND検索の結果がおかしい。この場合「小児」の結果しかOutputされない。
【2015.01.22 21時追記】↑修正していただいたようです。
【2015.01.23 17時追記】↑↑修正を確認しました。@@演算子も実装していただけました、ありがとうございます！

■bigm
QUERY PLAN
Bitmap Heap Scan on items (cost=82.59..1791.51 rows=851 width=79) (actual time=0.791..4.576 rows=429 loops=1)
Recheck Cond: (disp_title ~~ ’%インフルエンザ%’::text)
Heap Blocks: exact=368
-> Bitmap Index Scan on items_disp_title_bigm_idx (cost=0.00..82.38 rows=851 width=0) (actual time=0.695..0.695 rows=429 loops=1)
Index Cond: (disp_title ~~ ’%インフルエンザ%’::text)
Planning time: 0.474 ms
Execution time: 5.297 ms

QUERY PLAN
Bitmap Heap Scan on items (cost=88.00..92.02 rows=1 width=79) (actual time=0.340..0.346 rows=2 loops=1)
Recheck Cond: ((disp_title ~~ ’%インフルエンザ%’::text) AND (disp_title ~~ ’%小児%’::text))
Heap Blocks: exact=2
-> Bitmap Index Scan on items_disp_title_bigm_idx (cost=0.00..88.00 rows=1 width=0) (actual time=0.323..0.323 rows=2 loops=1)
Index Cond: ((disp_title ~~ ’%インフルエンザ%’::text) AND (disp_title ~~ ’%小児%’::text))
Planning time: 0.513 ms
Execution time: 0.394 ms

■indexなし
QUERY PLAN
Seq Scan on items (cost=0.00..2924.93 rows=851 width=79) (actual time=0.174..47.946 rows=429 loops=1)
Filter: (disp_title ~~ ’%インフルエンザ%’::text)
Rows Removed by Filter: 42605
Planning time: 0.292 ms
Execution time: 48.517 ms

QUERY PLAN
Seq Scan on items (cost=0.00..3032.51 rows=1 width=79) (actual time=0.387..47.178 rows=2 loops=1)
Filter: ((disp_title ~~ ’%インフルエンザ%’::text) AND (disp_title ~~ ’%小児%’::text))
Rows Removed by Filter: 43032
Planning time: 0.474 ms
Execution time: 47.213 ms

取り急ぎ以上です。

textseach_jaも試したいのですが、また後ほど。
速度だけでの結果だけみればbigm優勢なんですが、クエリの書き方(%%と@@演算子)がtextsearch_sennaで使っててものすごい便利だったので、始動したばかりのpgroonga、期待しております！

コメントの新規書き込みは停止しました。
新規日記作成・コメント書き込みの停止に関する案内

judgeOX(Cnt 0)

<<　 2025年6月　 >>
日	月	火	水	木	金	土
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	1	2	3	4	5

お気に入り日記の更新

まじかよ最後の更新14年前ってぽいう (12月9日 2:12)
急展開ですがいそかぜ (1月29日 15:54)
近況 infadog (1月19日 17:34)
こんんばんはせきぐんでうＳekigun (3月28日 0:16)
うおおおおおお。ロックス (3月21日 23:23)
戦果さくらかな (9月2日 17:14)
10月16日の日記 nama-n (10月16日 19:59)
懐かしく GRANT (11月28日 22:37)
ぶどうエレノア (8月24日 3:59)
MAG5 Aiji (7月8日 1:40)
近況姫りんご (3月28日 1:09)
お　兄貴 (3月5日 8:18)
はろ～火傷 (7月14日 15:23)
Cosmic Break アリル (6月5日 10:22)
移転のお知らせいが (1月20日 18:22)
CF 玉砕覚悟 (10月30日 19:50)

pgroongaとpg_bigmの比較をとりいそぎ

コメント

最新の日記一覧

お気に入り日記の更新

お気に入り日記

テーマ別日記一覧

最新のコメント

この日記について

日記内を検索

pgroongaとpg_bigmの比較をとりいそぎ

コメント

最新の日記 一覧

お気に入り日記の更新

お気に入り日記

テーマ別日記一覧

最新のコメント

この日記について

日記内を検索

最新の日記一覧