Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
277 views
in Technique[技术] by (71.8m points)

php - Make a JavaScript-aware Crawler

I want to make a script that's crawling a website and it should return the locations of all the banners showed on that page.

The locations of banners are most of the time from known domains. But banners are not in the HTML as an easy image or swf-file. Most of the times a Javascript is used to show the banner.

So if a .swf-file or image-file is loaded from a banner-domain, it should return that url.

Is that possible to do? And how could I do that roughly?

Best would be if it can also returns the landing page of that ad. How to solve that?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You could use selenium to open the pages in a real browser and then access the DOM.

PhantomJS might also be worth a look - it's a headless version of WebKit (the engine behind Chrome, Safari, etc.).

However, none of those solutions are pure php - if that's a requirement, you'll probably have to write your own JavaScript engine in PHP (which is nothing I'd ask my worst enemy to do ;))


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...