Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
169 views
in Technique[技术] by (71.8m points)

java - How to convert HTML <table> to a 2D array


Lets say I copy a complete HTML table (when each and every tr and td has extra attributes) into a String. How can I take all the contents (what is between the tags) and create an 2D array that is organized like the original table?

For example for this table:

<table border="1">
    <tr align= "center">
        <td align="char">TD1</td>
        <td>td1</td>
        <td align="char">TD1</td>
        <td>td1</td>
    </tr>
    <tr>
        <td>TD2</td>
        <td>tD2</td>
        <td class="bold>Td2</td>
        <td>td2</td>
    </tr>
</table>

I want this array: array

PS: I know I can use regex but it would be extremely complicated. I want a tool like JSoup that can do all the work automatically without much code writing

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is how it could be done using JSoup (srsly, don't use regexp for HTML).

Document doc = Jsoup.parse(html);
Elements tables = doc.select("table");
for (Element table : tables) {
    Elements trs = table.select("tr");
    String[][] trtd = new String[trs.size()][];
    for (int i = 0; i < trs.size(); i++) {
        Elements tds = trs.get(i).select("td");
        trtd[i] = new String[tds.size()];
        for (int j = 0; j < tds.size(); j++) {
            trtd[i][j] = tds.get(j).text(); 
        }
    }
    // trtd now contains the desired array for this table
}

Also, the class attribute value is not closed properly here in your example:

<td class="bold>Td2</td>

it should be

<td class="bold">Td2</td>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...