Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
84 views
in Technique[技术] by (71.8m points)

python - How to download images while numbering (with multiprocessing)

I want to save the files in order of the list. (like bbb.jpg->001.jpg, aaa.jpg -> 002.jpg...)

Because of alphabetical order, files are not saved as I want. (like aaa.jpg, bbb.jpg, ccc.jpg...)

There is also a way to sort files chronologically, but it is also impossible to use multiprocessing.

So my question is how can I save the files in the order I want, or in the name I want.

Here is my code.

from urllib.request import Request, urlopen
import urllib.request
import os
import os.path
import re
import time
from multiprocessing import Pool
import multiprocessing
from functools import partial


mylist = ['https://examsite.com/bbb.jpg',
'https://examsite.com/aaa.jpg',
'https://examsite.com/ddd.jpg',
'https://examsite.com/eee.jpg',
'https://examsite.com/ccc.jpg']

def image_URL_download (path, html):
    originNames = (f"{html}".split)('/')[-1]
    PathandNames = (path + str(originNames))
    req = urllib.request.Request(html, headers={'User-Agent': 'Mozilla/5.0'})
    urlopen = request.urlopen(req).read()
    with open(PathandNames,'wb') as savefile2:
        savefile2.write(urlopen)
    print (f"download {originNames}")

if __name__ == "__main__":
    start = time.time()
    path = './down'
    pool = multiprocessing.Pool(processes=4)
    img_down = partial(image_URL_download, path)
    pool.map(img_down, mylist)
    pool.close()
    pool.join()
    print("DONE! time :", time.time() - start)
question from:https://stackoverflow.com/questions/65871742/how-to-download-images-while-numbering-with-multiprocessing

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here is a full example that takes a bunch of images (thumbnails, here) from Wikimedia commons images. It saves them numbered 000.jpg, 001.jpg, etc. (in /tmp, but of course adjust as needed). Bonus: it displays an animated progress bar during download, courtesy tqdm:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm


tld = 'https://commons.wikimedia.org'
url = '/wiki/Category:Images'

soup = BeautifulSoup(requests.get(urljoin(tld, url)).content)
imglist = [x.get('src') for x in soup.find_all('img', src=True)]
imglist = [urljoin(tld, x) for x in imglist if x.endswith('.jpg')]

def load_img(i_url):
    i, url = i_url
    img = requests.get(url).content
    with open(f'/tmp/{i:03d}.jpg', 'wb') as f:
        f.write(img)
    return True

def load_all(imglist):
    with ThreadPoolExecutor() as executor:
        results = list(tqdm(
            executor.map(load_img, enumerate(imglist)),
            total=len(imglist), unit=' images'))
    return results

results = load_all(imglist)

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...