Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
337 views
in Technique[技术] by (71.8m points)

python - How to use glob to read limited set of files with numeric names?

How to use glob to only read limited set of files?

I have json files named numbers from 50 to 20000 (e.g. 50.json,51.json,52.json...19999.json,20000.json) within the same directory. I want to read only the files numbered from 15000 to 18000.

To do so I'm using a glob, as shown below, but it generates an empty list every time I try to filter out for the numbers. I've tried my best to follow this link (https://docs.python.org/2/library/glob.html), but I'm not sure what I'm doing wrong.

>>> directory = "/Users/Chris/Dropbox"
>>> read_files = glob.glob(directory+"/[15000-18000].*")
>>> print read_files
[]

Also, what if I wanted files with any number greater than 18000?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You are using the glob syntax incorrectly; the [..] sequence works per character. The following glob would match your files correctly instead:

'1[5-8][0-9][0-9][0-9].*'

Under the covers, glob uses fnmatch which translates the pattern to a regular expression. Your pattern translates to:

>>> import fnmatch
>>> fnmatch.translate('[15000-18000].*')
'[15000-18000]\..*\Z(?ms)'

which matches 1 character before the ., a 0, 1, 5 or 8. Nothing else.

glob patterns are quite limited; matching numeric ranges is not easy with it; you'd have to create separate globs for ranges, for example (glob('1[8-9][0-9][0-9][0-9]') + glob('2[0-9][0-9][0-9][0-9]'), etc.).

Do your own filtering instead:

directory = "/Users/Chris/Dropbox"

for filename in os.listdir(directory):
    basename, ext = os.path.splitext(filename)
    if ext != '.json':
        continue
    try:
        number = int(basename)
    except ValueError:
        continue  # not numeric
    if 18000 <= number <= 19000:
        # process file
        filename = os.path.join(directory, filename)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...