Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
895 views
in Technique[技术] by (71.8m points)

unix - Get files which are created in last 5 minutes in hadoop using shell script

I have files in HDFS as:

drwxrwx---   - root supergroup          0 2016-08-19 06:21 /tmp/logs/root/logs/application_1464962104018_1639064
drwxrwx---   - root supergroup          0 2016-08-19 06:21 /tmp/logs/root/logs/application_1464962104018_1639065

Now /tmp/logs/root/logs/ directory will continuously get the new files in it. I want to get the files which are created in last five minutes, taking current time into account. Then I need to copy these files into my local machine.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

How about this:

hdfs dfs -ls /tmp | tr -s " " | cut -d' ' -f6-8 | grep "^[0-9]" | awk 'BEGIN{ MIN=5; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'''"$1" "$2"''' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF < LAST){ print $3 }}'

Explanation:

List all the files:

hdfs dfs -ls /tmp

Replace extra spaces:

tr -s " "

Get the required columns:

cut -d' ' -f6-8

Remove non-required rows:

grep "^[0-9]"

Processing using awk:

awk

Initialize the DIFF duration and current time:

MIN=5; LAST=60*MIN; "date +%s" | getline NOW

Create a command to get the epoch value for timestamp of the file on HDFS:

cmd="date -d'''"$1" "$2"''' +%s";

Execute the command to get epoch value for HDFS file:

cmd | getline WHEN;

Get the time difference:

DIFF=NOW-WHEN;

Print the output depending upon the difference:

if(DIFF < LAST){ print $3 }

You just need to change the variable value for MIN depending upon your requirement (here its 5 minutes). HTH


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...