Speeding file read and pattern search

hi,

i have very big files named outfile1 to outfile4 and i want to grep "test" in all files so i developed a code "program1.pl" its working fine but run time is very high (for 4 files its taking 10.7 sec ) if i run this for more (>100) big files its running for longtime, so i want to speed up .. can any one suggest me to improve speed for this ..

######### program1.pl ##########
#!/usr/bin/perl
open (INFILE, "> grepfile");
foreach ($i = 1; $i< 5; $i++) {
$outfile = "outfile$i" ; # outfile1 .. outfile4
open ( OUTFILE, $outfile) ;
while (<OUTFILE>) {
if ($_ =~ /test/ ) {
print "$_ \n";
print INFILE "line: $_ ";
}
}
close(OUTFILE);
}
close (INFILE);
#########################################
3 answers Last reply
More about speeding file read pattern search
  1. First question that comes to mind is "why not just use grep?".

    That aside, maybe you can try to read the whole file in an array in a single shot. It would require much more memory, but will also capitalize of the sequential HDD read instead of the much slower random access.

    $data_file="wrestledata.cgi";
    open(DAT, $data_file) || die("Could not open file!");
    @raw_data=<DAT>; 
  2. hi Zenthar,
    thank you for your post,
    actually the time is taking for opening the files. the time taken for grep in a file line by line pattern match or array pattern match is same ( means no much difference), i need a help to avoid the time consuming for file opening. even i used split command to split large files into small for processing and using fork function to process individually , even grepping paralleled for multiple files, their also i found splitting files consuming much time than processing in each big files. is their way to get referrence of line number in a file and that reference should be used for pattern match instead of opening a files ..
  3. Have you tried putting some time counters in your code to identify which part take the most time? The only other suggestion I would have it to maybe try a producer/consumer pattern using threads. In one thread you would read the file into a queue, and in the other one you would do the pattern match and output (the output could even be done in a 3rd thread). However, this would be mostly useful if the reader thread isn't the bottleneck.

    You can find information on perl threading here and an example of a perl producer/consumer implementation here (3rd example: prodcons.cygperl).
Ask a new question

Read More

Programming Apps