I'm working in php and I have created a function that is getting links from a submitted url.
The code is working fine, but it is picking even links that are not active like mailto:, , javascript:void(0).
How can I avoid picking up a tags whose href are like: href="mailto:" ; href="tel:"; href="javascript:"?
Thanks you in advance.
function check_all_links($url) {
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents($url));
$linklist = $doc->getElementsByTagName("a");
$title = $doc->getElementsByTagName("title");
$href = array();
$page_url = $full_url = $new_url = "";
$full_url = goodUrl($url);
$scheme = parse_url($url, PHP_URL_SCHEME);
$slash = '/';
$links = array();
$linkNo = array();
if ($scheme == "http") {
foreach ($linklist as $link) {
$href = strtolower($link->getAttribute('href'));
$page_url = parse_url($href, PHP_URL_PATH);
$new_url = $scheme."://".$full_url.'/'.ltrim($page_url, '/');
//check if href has mailto: or # or javascipt() or tel:
if (strpos($page_url, "tel:") === True) {
continue;
}
if(!in_array($new_url, $linkNo)) {
echo $new_url."<br>" ;
array_push($linkNo, $new_url);
$links[] = array('Links' => $new_url );
}
}
}else if ($scheme == "https") {
foreach ($linklist as $link) {
$href = strtolower($link->getAttribute('href'));
$page_url = parse_url($href, PHP_URL_PATH);
$new_url = $scheme."://".$full_url.'/'.ltrim($page_url, '/');
if (strpos($page_url, "tel:") === True) {
continue;
}
if(!in_array($new_url, $linkNo)) {
echo $new_url."<br>" ;
array_push($linkNo, $new_url);
$links[] = array('Links' => $new_url );
}
}
}
question from:
https://stackoverflow.com/questions/65871012/how-to-avoid-url-with-mailto 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…