Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

json - Web Scraping with VBA (when HTML <> DOM)

I have having a horrible time scraping this particular webpage's data... Basically I can see the information that I need in the 'DOM Explorer' when I load the URL in a browser and hit F12 manually, but when I programmatically attempt to do the same (see below) the HTMLDoc does not contain the same information that I can see in the 'DOM Explorer'...

Public Sub testCode()

    Dim IE As SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument
    Set IE = New SHDocVw.InternetExplorer
    With IE
        .navigate "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW"
        While .Busy = True Or .ReadyState <> READYSTATE_COMPLETE: Wend
        Set HTMLDoc = .Document
    End With

End Sub

Can someone please help me access the information in the 'DOM Explorer'? I know that HTML is not always what you see in a browser, but rather the instructions to create what you can see in the browser, but then there must be a way to programmatically create the DOM from the HTML...

Also, I believe that the data I am after is being generated by scripts or iFrames, but I have been unable to generate the data I am looking for from messing around with either....

UPDATE

See picture of DOM Explorer below:

DOM

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The outline:

Actually web browser do almost the same stuff each time you open that webpage.

You may use the below VBA code to parse response and output result. Import JSON.bas module into the VBA project for JSON processing.

Sub TestScrapeWunderground()

    Dim sContent As String
    Dim sKey As String
    Dim sLocation As String
    Dim vJSON As Variant
    Dim sState As String
    Dim oDays As Object
    Dim oHours As Object
    Dim vDay As Variant
    Dim vHour As Variant
    Dim aRows() As Variant
    Dim aHeader() As Variant

    ' GET XHR to retrieve location and key
    With CreateObject("MSXML2.ServerXMLHTTP")
        .Open "GET", "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW", False
        .Send
        sContent = .responseText
    End With
    ' Extract location and key from HTML content
    sLocation = Split(Split(sContent, "var query = 'zmw:' + '", 2)(1), "'", 2)(0)
    sKey = Split(Split(sContent, vbTab & "k: '", 2)(1), "'", 2)(0)
    ' GET XHR to retrieve JSON data
    With CreateObject("MSXML2.ServerXMLHTTP")
        .Open "GET", "https://api-ak-aws.wunderground.com/api/" & sKey & "/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/bestfct:1/q/zmw:" & sLocation & ".json", False
        .Send
        sContent = .responseText
    End With
    ' Parse JSON response to data structure
    JSON.Parse sContent, vJSON, sState
    ' Populate dictionaries with daily and hourly forecast data
    Set oDays = CreateObject("Scripting.Dictionary")
    Set oHours = CreateObject("Scripting.Dictionary")
    For Each vDay In vJSON("forecast")("days")
        oDays(vDay("summary")) = ""
        For Each vHour In vDay("hours")
            oHours(vHour) = ""
        Next
    Next
    ' Convert daily forecast data to arrays
    JSON.ToArray oDays.Keys(), aRows, aHeader
    ' Output daily forecast data to table
    With Sheets(1)
        .Cells.Delete
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aRows
        .Columns.AutoFit
    End With
    ' Convert hourly forecast data to arrays
    JSON.ToArray oHours.Keys(), aRows, aHeader
    ' Output hourly forecast data to table
    With Sheets(2)
        .Cells.Delete
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aRows
        .Columns.AutoFit
    End With
    ' Convert response data to arrays
    JSON.ToArray Array(vJSON("response")), aRows, aHeader
    ' Output response transposed data to table
    With Sheets(3)
        .Cells.Delete
        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        .Columns.AutoFit
    End With
    ' Convert current data to arrays
    JSON.ToArray Array(vJSON("current_observation")), aRows, aHeader
    ' Output current transposed data to table
    With Sheets(4)
        .Cells.Delete
        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        .Columns.AutoFit
    End With
    ' Populate dictionary with daily astronomy data
    Set oDays = CreateObject("Scripting.Dictionary")
    For Each vDay In vJSON("astronomy")("days")
        oDays(vDay) = ""
    Next
    ' Convert daily astronomy data to arrays
    JSON.ToArray oDays.Keys(), aRows, aHeader
    ' Output daily astronomy transposed data to table
    With Sheets(5)
        .Cells.Delete
        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        .Columns.AutoFit
    End With
    ' Convert hourly history data to arrays
    JSON.ToArray vJSON("history")("days")(0)("hours"), aRows, aHeader
    ' Output hourly history data to table
    With Sheets(6)
        .Cells.Delete
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aRows
        .Columns.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                1, _
                UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

The second XHR returns JSON data, to make it clear how the necessary data is being extracted from it, you may save the JSON to file, copy the contents and paste it to any JSON viewer for further study. I use online tool http://jsonviewer.stack.hu, root element structure is shown below:

JSON structure

There are 6 main sections, the relevant part of the data is extracted and output to 6 worksheets (which have to be created manually before run):

Sheet1 - Daily forecast
Sheet2 - Horly forecast
Sheet3 - Response data (transposed)
Sheet4 - Current data (transposed)
Sheet5 - Astronomy (transposed)
Sheet6 - Hourly history data

Having that example you can extract the data you need from that JSON response.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...