Awk to extract lines with a defined number of characters

Xterra · June 16, 2010, 11:47pm

This is my problem, my file (file A) contains the following information:

Now, I would like to create a file (file B) containing only the lines with 10 or more characters but less than 20 with their corresponding ID:

Then, I need to compare the entries and determine their frequency. Thus, I will generate a third file (C) with the following information:

I think it could be done using AWK or grep.
Any help will be greatly appreciated.

clx · June 17, 2010, 12:18am

try:

$ cat x
> ID 1
DFNSALKDNJRGNLANGKNGRIIGINREVN
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 4
DNFKSAD
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
$ awk '/^>/ {R=$0}! />/ && length($0) >= 10 && length($0) < 20 {print R"\n"$0}' x
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
$

I didn't understand the file_c requirement.

guruprasadpr · June 17, 2010, 12:26am

Try this:

awk '/\> ID/{x=$0 ; next}{if ( length >= 10 && length < 20 ){a[$0]++;b[x]=$0}}END {for (i in a) for (j in b) if(i==b[j]){print j "\t freq " a"\n" i;break;}}' file
> ID 3   freq 2
BDSBGOBAOEURBOUEABG
> ID 2   freq 2
KJDFKDSJGNIHG

Guru.

Xterra · June 17, 2010, 12:48am

Anchar,

Your answer work very well! File C should compare each and every line and record the frequency.

Guru,

I am not getting the same result. This is what I am getting:

guruprasadpr · June 17, 2010, 12:59am

Hi
I think yours is a sun machine. Use 'nawk' in place of 'awk'. It should go fine.

Guru.

Xterra · June 17, 2010, 1:03am

Guru,

It did not work.

guruprasadpr · June 17, 2010, 1:06am

Hi
Which is your unix flavor? Try 'gawk' if Linux.

Guru.

Xterra · June 17, 2010, 1:11am

I usually use RedHat but I am now using Cygwin. I tried gawk and I got the same result: