selenium webdriver - ChromeDriver --print-to-pdf after page load

Question

Welcome To Ask or Share your Answers For Others

selenium webdriver - ChromeDriver --print-to-pdf after page load

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

selenium webdriver - ChromeDriver --print-to-pdf after page load

According to the docs, Chrome can be started in headless mode with --print-to-pdf in order to export a PDF of a web page. This works well for pages accessible with a GET request.

Trying to find a print-to-pdf solution that would allow me to export a PDF after executing multiple navigation request from within Chrome. Example: open google.com, input a search query, click the first result link, export to PDF.

Looking at the [very limited amount of available] docs and samples, I failed to find a way to instruct Chrome to export a PDF, after a page loads. I'm using the Java chrome-driver.

One possible solution not involving Chrome, is by using a tool like wkhtmltopdf. Going on this path would force me to - before sending the HTML to the tool - do the following:

save the HTML in a local file
traverse the DOM, and download all file links (images, js, css, etc)

Don't prefer this path as it would require a lot of tinkering [I assume] on my part to get downloads' file paths correct for wkhtmltopdf to read correctly.

Is there a way to instruct Chrome to print to PDF, but only after a page loads?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:36:18+0000

As there are no answers, I will explain my workaround. Instead of trying to find how to request from Chrome to print the current page, I went down another route.

For this example we will try to download the results page from Google on the query 'example':

Navigate with driver.get("google.com"), input the query 'example', click 'Google Search'
Wait for the results page to load
Retrieve the page source with driver.getPageSource()
Parse source with e.g. Jsoup in order to remap all relative links to point to an endpoint defined for this purpose (explained below) - example to localhost:8080. Link './style.css' would become 'localhost:8080/style.css'
Save HTML to a file, e.g. named 'query-example'
Run chrome --print-to-pdf localhost:8080/search?id=query-example

What will happen is that chrome will request the HTML from our controller, and for resources defined in the HTML we return, it will go to our controller - since we remapped relative links - which will in turn forward that request to the real location of the resource - google.com. Below is an example Spring controller, and note that the example is incomplete and is here only as a guidance.

@RestController
@RequestMapping
public class InternationalOffloadRestController {
  @RequestMapping(method = RequestMethod.GET, value = "/search/html")
  public String getHtml(@RequestParam("id") String id) {
    File file = new File("location of the HTML file", id);
    try (FileInputStream input = new FileInputStream(file)) {
      return IOUtils.toString(input, HTML_ENCODING);
    }
  }
  @RequestMapping("/**") // forward all remapped links to google.com
  public void forward(HttpServletResponse httpServletResponse, ...) {
    URI uri = new URI("https", null, "google.com", -1, 
      request.getRequestURI(), request.getQueryString(), null);
    httpServletResponse.setHeader("Location", uri.toString());
    httpServletResponse.setStatus(HttpServletResponse.SC_MOVED_PERMANENTLY);
  }
}

Categories

selenium webdriver - ChromeDriver --print-to-pdf after page load

selenium webdriver - ChromeDriver --print-to-pdf after page load

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags