I am having a file from which i need to extract different length words into different file. For example 2 letter word into file2, 3 letter word into file3 and so on....
I did one using grep and shell script..
for (( i=1; i<7; i++))
do
egrep -o '\<\(?[a-zA-Z]{$i}\)?\>' $1 | sort -u -f|tr [A-Z] [a-z] >file$i
done
But it is too slow. any better idea? Thanks in advance
$ cat script.pl
use warnings;
use strict;
@ARGV == 1 or die "Usage: perl $0 <input-file>\n";
my %word_length;
while ( <> ) {
chomp;
my @words = split /[^[:alpha:]]+/;
my %repeated_word;
for my $word ( @words ) {
push @{ $word_length{ length $word } }, $word unless $repeated_word{ $word }++;
}
}
for my $length ( keys %word_length ) {
my $outfile = "file" . $length;
open my $fh, ">", $outfile or do {
warn "Cannot open $outfile: $!\n";
next;
};
for my $word ( @{ $word_length{ $length } } ) {
printf $fh "%s\n", $word;
}
close $fh or warn "Cannot close $outfile: $!\n";
}
$ cat infile
This is an example to
test if
my perl program works
as expected.
$ perl script.pl
Usage: perl script.pl <input-file>
$ perl script.pl infile
$ ls -1 file*
file2
file4
file5
file7
file8
#!/usr/bin/awk -f
BEGIN { FS="[^A-Za-z]" }
{
for (i=1;i<=NF;i++)
if ((len = length($i)) < 7 && len >= 1)
a[tolower($i)]++
}
END {
for (e in a)
print e >> "file" length(e) ".txt"
}