I know this has been asked a thousand times before (apologies), but searching SO/Google etc I am yet to get a conclusive answer.
Basically, I need a JS function which when passed a string, identifies & extracts all URLs based on a regex, returning an array of all found. e.g:
function findUrls(searchText){
var regex=???
result= searchText.match(regex);
if(result){return result;}else{return false;}
}
The function should be able to detect and return any potential urls. I am aware of the inherant difficulties/isses with this (closing parentheses etc), so I have a feeling the process needs to be:
Split the string (searchText
) into distinct sections starting/ending) with either nothing, a space or carriage return either side of it, resulting in distinct content chunks, e.g. do a split.
For each content chunk that results from the split, see whether it fits the logic for a URL of any construction, namely, does it contain a period immediately followed the text (the one constant rule for qualifying a potential URL).
The regex should see whether the period is immediately followed by other text, of the type allowable for a tld, directory structure & query string, and preceded by text of the allowable type for a URL.
I am aware false positives may result, however any returned values will then be checked with a call to the URL itself, so this can be ignored. The other functions I have found often dont return the URLs query string too, if present.
From a block of text, the function should thus be able to return any type of URL, even if it means identifying will.i.am as a valid one!
eg. http://www.google.com, google.com, www.google.com, http://google.com,
ftp.google.com, https:// etc...and any derivation thereof with a query string
should be returned...
Many thanks, apologies again if this exists elsewhere on SO but my searches havent returned it..
See Question&Answers more detail:
os