My answer builds on the technique suggested here.
The tricky part is formulating the correct query string:
http://en.wikipedia.org/w/api.php?action=query&generator=random&prop=extracts&exchars=500&format=json&callback=onWikipedia
generator=random
selects a random page
prop=extracts
and exchars=500
retrieves a 500 character extract
format=json
returns JSON-formatted data
callback=
causes that data to be wrapped in a function call so it can be treated like any other <script>
and injected into your page (see JSONP), thus bypassing cross-domain barriers.
requestid
can optionally be added, with a new value each time, to avoid stale results from the browser cache (required in IE9)
The page served by the query is something that looks like this (I've added whitespace for readability):
onWikipedia(
{"query":
{"pages":
{"12362520":
{"pageid":12362520,
"ns":0,
"title":"Power Building",
"extract":"<p>The <b>Power Building</b> is a historic commercial building in
the downtown of Cincinnati, Ohio, United States. Built in 1903, it
was designed by Harry Hake. It was listed on the National Register
of Historic Places on March 5, 1999. One week later, a group of
buildings in the northeastern section of downtown was named a
historic district, the Cincinnati East Manufacturing and Warehouse
District; the Power Building is one of the district's contributing
properties.</p>
<h2> Notes</h2>"
} } } }
)
Of course you'll get a different article each time.
Here's a full, working example which you can try out on JSBin.
<HTML><BODY>
<p><textarea id="textbox" style="width:350px; height:150px"></textarea></p>
<p><button type="button" id="button" onclick="startFetch(100, 500)">
Fetch random Wikipedia extract</button></p>
<script type="text/javascript">
var textbox = document.getElementById("textbox");
var button = document.getElementById("button");
var tempscript = null, minchars, maxchars, attempts;
function startFetch(minimumCharacters, maximumCharacters, isRetry) {
if (tempscript) return; // a fetch is already in progress
if (!isRetry) {
attempts = 0;
minchars = minimumCharacters; // save params in case retry needed
maxchars = maximumCharacters;
button.disabled = true;
button.style.cursor = "wait";
}
tempscript = document.createElement("script");
tempscript.type = "text/javascript";
tempscript.id = "tempscript";
tempscript.src = "http://en.wikipedia.org/w/api.php"
+ "?action=query&generator=random&prop=extracts"
+ "&exchars="+maxchars+"&format=json&callback=onFetchComplete&requestid="
+ Math.floor(Math.random()*999999).toString();
document.body.appendChild(tempscript);
// onFetchComplete invoked when finished
}
function onFetchComplete(data) {
document.body.removeChild(tempscript);
tempscript = null
var s = getFirstProp(data.query.pages).extract;
s = htmlDecode(stripTags(s));
if (s.length > minchars || attempts++ > 5) {
textbox.value = s;
button.disabled = false;
button.style.cursor = "auto";
} else {
startFetch(0, 0, true); // retry
}
}
function getFirstProp(obj) {
for (var i in obj) return obj[i];
}
// This next bit borrowed from Prototype / hacked together
// You may want to replace with something more robust
function stripTags(s) {
return s.replace(/<w+(s+("[^"]*"|'[^']*'|[^>])+)?>|</w+>/gi, "");
}
function htmlDecode(input){
var e = document.createElement("div");
e.innerHTML = input;
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}
</script>
</BODY></HTML>
One downside of generator=random
is you often get talk pages or generated content that are not actual articles. If anyone can improve the query string to limit it to quality articles, that would be great!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…