请选择 进入手机版 | 继续访问电脑版
  • 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

indexing - PostgreSQL index not used for query on IP ranges

[复制链接]
菜鸟教程小白 发表于 2022-6-22 22:05:12 | 显示全部楼层 |阅读模式 打印 上一主题 下一主题

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips (
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

I've added plain B-tree indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

The query being used is:

SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.



Best Answer-推荐答案


Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.
(There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)

Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2.

Further reading:

If your IP ranges can be covered with one of the built-in network types inet or cidr, consider to replace your two bigint columns. Or, better yet, look to the additional module ip4r by Andrew Gierth (not in the standard distribution. The indexing strategy changes accordingly.

Barring that, you can check out this related answer on dba.SE with using a sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance:

回复

使用道具 举报

懒得打字嘛,点击右侧快捷回复 【右侧内容,后台自定义】
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关注0

粉丝2

帖子830918

发布主题
阅读排行 更多
广告位

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap