Tom's Hardware > Forum > Applications > Programming > Speeding file read and pattern search

Speeding file read and pattern search

Forum Applications : Programming - Speeding file read and pattern search

Tom's Hardware: Over 1.4 million members in 6 different countries available to answer all your high-tech questions. Sign up now! Its free!
Word :    Username :           
 

hi,

i have very big files named outfile1 to outfile4 and i want to grep "test" in all files so i developed a code "program1.pl" its working fine but run time is very high (for 4 files its taking 10.7 sec ) if i run this for more (>100) big files its running for longtime, so i want to speed up .. can any one suggest me to improve speed for this ..

######### program1.pl ##########
#!/usr/bin/perl
open (INFILE, "> grepfile" );
foreach ($i = 1; $i< 5; $i++) {
$outfile = "outfile$i" ; # outfile1 .. outfile4
open ( OUTFILE, $outfile) ;
while (<OUTFILE> ) {
if ($_ =~ /test/ ) {
print "$_ \n";
print INFILE "line: $_ ";
}
}
close(OUTFILE);
}
close (INFILE);
#########################################

Sponsored Links
Register or log in to remove.

First question that comes to mind is "why not just use grep?".

That aside, maybe you can try to read the whole file in an array in a single shot. It would require much more memory, but will also capitalize of the sequential HDD read instead of the much slower random access.

Code :
  1. $data_file="wrestledata.cgi";
  2. open(DAT, $data_file) || die("Could not open file!" );
  3. @raw_data=<DAT>;

Reply to Zenthar

hi Zenthar,
thank you for your post,
actually the time is taking for opening the files. the time taken for grep in a file line by line pattern match or array pattern match is same ( means no much difference), i need a help to avoid the time consuming for file opening. even i used split command to split large files into small for processing and using fork function to process individually , even grepping paralleled for multiple files, their also i found splitting files consuming much time than processing in each big files. is their way to get referrence of line number in a file and that reference should be used for pattern match instead of opening a files ..

Reply to sgravi

Have you tried putting some time counters in your code to identify which part take the most time? The only other suggestion I would have it to maybe try a producer/consumer pattern using threads. In one thread you would read the file into a queue, and in the other one you would do the pattern match and output (the output could even be done in a 3rd thread). However, this would be mostly useful if the reader thread isn't the bottleneck.

You can find information on perl threading here and an example of a perl producer/consumer implementation here (3rd example: prodcons.cygperl).

Reply to Zenthar
Tom's Hardware > Forum > Applications > Programming > Speeding file read and pattern search
Go to:

There are 1016 identified and unidentified users. To see the list of identified users, Click here.

Sponsored links
  • Ask the community now
  • Publish
Ad
They won a badge
Join us in greeting them