OStack程序员社区-中国程序员成长平台 › 门户 › 开源› WEB应用开发›Web爬虫

domain-crawling: 全自动爬取域名信息脚本，易扩展，注释全~

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称：

domain-crawling

开源软件地址：

https://gitee.com/ainilili/domain-crawling

开源软件介绍：

作用

闲来无事，写个脚本，用来爬取[a-z0-9]范围内指定长度的域名的注册信息，将未注册的域名信息写入指定文件中，供兄弟们自己分析！

语言

作为一名javaer，这个脚本使用py开发。

使用

先将domain-crawling拉到本地

git clone https://gitee.com/ainilili/domain-crawling.git

进入目录中执行py脚本

cd domain-crawling

通过--help参数查看帮助

shell>> py domain-crawling.py -husage: domain-crawling.py [-h] [-p PATH] [-l LENGTH] [-o {y,n}] [-d DELAYED]Nico domain name crawler scriptoptional arguments:  -h, --help            show this help message and exit  -p PATH, --path PATH  The available domain name storage path after                        detection.  -l LENGTH, --length LENGTH                        The length of the domain you want to detect is all                        combinations of a-z0-9.  -o {y,n}, --openproxy {y,n}                        Open the IP proxy mode.  -d DELAYED, --delayed DELAYED                        The interval between each climb, Unit s  -s SUFFIX, --suffix SUFFIX                       Domain suffix

直接启动脚本：

shell>> py domain-crawling.py

指定域名长度启动脚本：

shell>> py domain-crawling.py --length 4

这时爬取的域名长度是4（默认也是4），例如：

aaaa.combbbb.comcccc.com

指定爬取的域名（例如cn）后缀：

shell>> py domain-crawling.py --suffix cn

如果要开启代理（默认关闭）：

shell>> py domain-crawling.py--openproxy y

如果更改数据保存文件（默认是时间戳.txt）：

shell>> py domain-crawling.py --path data1

设置爬取间隔时间（默认是0.1s）为0.5s：

shell>> py domain-crawling.py --delayed 0.5

注意

笔者使用的是data5u的爬虫代理，如果您哟啊开启代理模式，脚本中自带的orderId很可能早已失效，请自行注册然后使用，如果您有更好的代理，可以更改源码替换之。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

looter: A toolkit to help you make crawlers at ease.发布时间：2022-02-14

Look52PojieMono: 使用Linux下的MONO框架（C#+Gtk#）重写的Look52pojie工具，实现吾爱 ...发布时间：2022-02-14

客服电话

电子邮件

domain-crawling: 全自动爬取域名信息脚本，易扩展，注释全~

开源软件名称：

开源软件地址：

开源软件介绍：

作用

语言

使用

注意

请发表评论

全部评论

上一篇：

下一篇：

关于我们

产品与服务

解决方案

139-2527-9053