I am interested to do web crawling. I was looking at solr.
solr
Does solr do web crawling, or what are the steps to do web crawling?
Solr 5+ DOES in fact now do web crawling! http://lucene.apache.org/solr/
Older Solr versions do not do web crawling alone, as historically it's a search server that provides full text search capabilities. It builds on top of Lucene.
If you need to crawl web pages using another Solr project then you have a number of options including:
If you want to make use of the search facilities provided by Lucene or SOLR you'll need to build indexes from the web crawl results.
See this also:
Lucene crawler (it needs to build lucene index)
2.1m questions
2.1m answers
60 comments
57.0k users