I need to merge two files into a new file.
The two have over 300 Millions pipe-separated records, with first column as primary key. The rows aren't sorted. The second file may have records the first file does not.
Sample File 1:
1001234|X15X1211,J,S,12,15,100.05
Sample File 2:
1231112|AJ32,,,18,JP
1001234|AJ15,,,16,PP
Output:
1001234,X15X1211,J,S,12,15,100.05,AJ15,,,16,PP
I am using following piece of code:
tie %hash_REP, 'Tie::File::AsHash', 'rep.in', split => '|'
my $counter=0;
while (($key,$val) = each %hash_REP) {
if($counter==0) {
print strftime "%a %b %e %H:%M:%S %Y", localtime;
}
}
it takes almost 1 hour prepare associative array.
is it really good or is it really bad?
Is there any faster way to handle such size of records in associative array?
Any suggestion in any scripting language would really help.
Thanks,
Nitin T.
I also tried the following program, walso took 1+ Hour is as below:
#!/usr/bin/perl
use POSIX qw(strftime);
my $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime;
print $now_string . "
";
my %hash;
open FILE, "APP.in" or die $!;
while (my $line = <FILE>) {
chomp($line);
my($key, $val) = split /|/, $line;
$hash{$key} = $val;
}
close FILE;
my $filename = 'report.txt';
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
open FILE, "rep.in" or die $!;
while (my $line = <FILE>) {
chomp($line);
my @words = split /|/, $line;
for (my $i=0; $i <= $#words; $i++) {
if($i == 0)
{
next;
}
print $fh $words[$i] . "|^"
}
print $fh $hash{$words[0]} . "
";
}
close FILE;
close $fh;
print "done
";
my $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime;
print $now_string . "
";
See Question&Answers more detail:
os