Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

powershell - Use GetElementsByClassName in a script

I'm trying to write a PowerShell script to get the text within all the classes named "newstitle" from a website.

This is what I have:

function check-krpano {
    $geturl=Invoke-WebRequest http://krpano.com/news/
    $news=$geturl.parsedhtml.body.GetElementsByClassName("newstitle")[0]
    Write-Host  "$news"
}

check-krpano

It obviously needs much more tweaking, but so far, it doesn't work.

I managed to write an script using GetElementById, but I don't know the syntax for GetElementsByClassName, and to be honest, I haven't been able to find much information about it.

NOTE:

I've ticked the right answer to my question, but that's not the solution that I had chose to use in my script.

Although I was able to find the content within a tag containing a certain class, using 2 methods, they were much slower that searching for links.

Here is the output using Measure-Command:

  • Search for divs containing class 'newstitle' using parsedhtml.body -> 29.6 seconds
  • Search for devs containing class 'newstitle' using Allelements -> 10.4 seconds
  • Search for links which its element 'href' contains #news -> 2.4 seconds

So I have marked as useful the Links method answer.

This is my final script:

function check-krpano {
    Clear-Host
    $geturl=Invoke-WebRequest http://krpano.com/news
    $news = ($geturl.Links |Where href -match '#newsd+' | where class -NotMatch 'moreinfo+' )
    $news.outertext | Select-Object -First 5
}

check-krpano
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you figure out how to get GetElementsByClassName to work, I'd like to know. I just ran into this yesterday and ran out of time so I came up with a workaround:

$geturl.ParsedHtml.body.getElementsByTagName('div') | 
    Where {$_.getAttributeNode('class').Value -eq 'newstitle'}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...