Xterra
June 16, 2010, 11:47pm
1
This is my problem, my file (file A) contains the following information:
Now, I would like to create a file (file B) containing only the lines with 10 or more characters but less than 20 with their corresponding ID:
Then, I need to compare the entries and determine their frequency. Thus, I will generate a third file (C) with the following information:
I think it could be done using AWK or grep.
Any help will be greatly appreciated.
clx
June 17, 2010, 12:18am
2
try:
$ cat x
> ID 1
DFNSALKDNJRGNLANGKNGRIIGINREVN
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 4
DNFKSAD
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
$ awk '/^>/ {R=$0}! />/ && length($0) >= 10 && length($0) < 20 {print R"\n"$0}' x
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
$
I didn't understand the file_c requirement.
Try this:
awk '/\> ID/{x=$0 ; next}{if ( length >= 10 && length < 20 ){a[$0]++;b[x]=$0}}END {for (i in a) for (j in b) if(i==b[j]){print j "\t freq " a"\n" i;break;}}' file
> ID 3 freq 2
BDSBGOBAOEURBOUEABG
> ID 2 freq 2
KJDFKDSJGNIHG
Guru.
1 Like
Xterra
June 17, 2010, 12:48am
4
Anchar,
Your answer work very well! File C should compare each and every line and record the frequency.
Guru,
I am not getting the same result. This is what I am getting:
Hi
I think yours is a sun machine. Use 'nawk' in place of 'awk'. It should go fine.
Guru.
Hi
Which is your unix flavor? Try 'gawk' if Linux.
Guru.
Xterra
June 17, 2010, 1:11am
8
I usually use RedHat but I am now using Cygwin. I tried gawk and I got the same result: