r - RCurl Cookies on Debian

Question

Welcome To Ask or Share your Answers For Others

r - RCurl Cookies on Debian

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - RCurl Cookies on Debian

I am downloading large batches of pdfs from parliaments. I scraped the pdf addresses and now try to download them.

To do this, I set up a debian instance on a university cloud.

It worked fine for most of them, but for 4 parliaments, I downloaded an error page of having to accept cookies. The result is an html page with pdf file ending that contains mainly the question if I accept cookies.

This error does not happen on either Ubuntu or Windows 10. I figure this works because I accepted the cookies here in the Browser. I changed my code to RCurl and exported the cookies as txt files based on the 2 entries I found on stackoverflow.

I used the following example, as I mentioned it works on windows and ubuntu, but also without the cookiefile.

library(RCurl)

# the pdf to dl
appURL<-"http://www.dokumentation.landtag-mv.de/parldok/dokument/44970/eu_ratspraesidentschaft.pdf"

curl = getCurlHandle()
curlSetOpt(cookiefile="cookiesmv.txt"
           , curl=curl, followLocation = TRUE)
pdfData <- getBinaryURL(appURL, curl = curl)
writeBin(pdfData, "test2.pdf")

to reproduce, the cookiefile:

www.landtag-mv.de FALSE / FALSE 1641900313 cookieconsent_status dismiss www.landtag-mv.de FALSE / FALSE 1641900313 dp_cookieconsent_status {"dp--cookie-statistics":true,"dp--cookie-marketing":true} www.dokumentation.landtag-mv.de FALSE / FALSE 1641907216 cookieconsent_dismissed yes www.dokumentation.landtag-mv.de FALSE / FALSE 0 ASP.NET_SessionId ejtlcpjr0saw40ahceu4akb1

Maybe somebody has insights about where RCurl draws the cookies from...

best regards and thank you in advance, I hope I gave all the info necessary!

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

r - RCurl Cookies on Debian

r - RCurl Cookies on Debian

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags