lColli
September 9, 2010, 1:40pm
1
Dear Forum,
I would like to know how could I found every result from one column at a table based at two table.
Exemplo:
Table:
Red 4
Red 5
Red 10
Black 33
Black 44
Black 5
Green 2
Green 55
Green 78
I would like to have every color with the lower result from the second column:
Red 4
Black 5
Green 2
Thank you very much!
Leandro
Try:
awk '!a[$1]{a[$1]=$2} $2<a[$1]{a[$1]=$2} END{for(i in a)print i, a}' file
1 Like
Another way, but with two commands:
# myFile="TestFile.txt"
# numDiff=`cut -f1 -d" " "${myFile}" | sort | uniq | wc -l`
# sort -n -k2 "${myFile}" | head -"${numDiff}"
Green 2
Red 4
Black 5
## Or in one!
# sort -n -k2 "${myFile}" | head -`cut -f1 -d" " "${myFile}" | sort | uniq | wc -l`
Regards!
1 Like
lColli
September 9, 2010, 3:19pm
4
Dear Sir Franklin52,
Thank you very much! How can I change the code for print more columns \(like, column 3, 4 and 5?\)
Thank you again,
Best Regards, Leandro
---------- Post updated at 02:19 PM ---------- Previous update was at 01:34 PM ----------
Dear Felipe.vinturin,
I tried to use your code, but I don't know why when I use for larger data it prints repetitive objetcs from the first column. My file has more columns, could be this?
Thank you very much!
Best Regards, Leandro
Hi,
Maybe, if your file columns does not match the ones you put here!
In my command, I sort using the second column, if it is not the column with the number, it will print repetitive data.
Can you post an example of your file? With all columns?
Regards.
Post an example of the input file and the desired output.
lColli
September 9, 2010, 3:37pm
7
Dear Felipe,
Here is the data and I would like just the lines in red!
Thanks again!
Best Regard, Leandro
Tag Database rank order Accession UniGene cluster number
AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAC 46 AA636074 _
AAAAAAAAAC 77 AI861786 _
AAAAAAAAAC 77 BE536886 _
AAAAAAAAAC 77 BE548727 _
AAAAAAAAAC 77 BF691443 _
AAAAAAAAAC 79 BF754052 _
AAAAAAAAAC 80 BI004369 _
AAAAAAAAAC 86 BU535655 _
AAAAAAAAAC 86 BU539770 _
AAAAAAAAAC 86 BX111054 _
AAAAAAAAAC 86 BX643752 _
AAAAAAAAAC 86 CA390277 _
AAAAAAAAAC 94 CD522477 _
AAAAAAAAAC 94 DR159089 _
AAAAAAAAAG 5 _ Hs.190440
AAAAAAAAAG 26 _ Hs.445936
AAAAAAAAAG 46 _ Hs.567437
AAAAAAAAAG 74 _ Hs.674701
AAAAAAAAAG 77 _ Hs.700965
AAAAAAAAAG 77 AI034458 _
AAAAAAAAAG 77 AV699522 _
AAAAAAAAAG 77 AV719142 _
AAAAAAAAAG 79 AW150843 _
AAAAAAAAAG 80 AW801953 _
AAAAAAAAAG 80 AW857024 _
AAAAAAAAAG 86 BC033980 Hs.721040
AAAAAAAAAG 86 BE072635 _
AAAAAAAAAG 86 BF104499 _
AAAAAAAAAG 86 BF897128 _
AAAAAAAAAG 93 BG537298 _
AAAAAAAAAG 93 BG706646 _
AAAAAAAAAG 94 BI025564 _
AAAAAAAAAG 94 BX425052 _
AAAAAAAAAG 100 F24242 _
---------- Post updated at 02:37 PM ---------- Previous update was at 02:37 PM ----------
Dear Frank,
The output would be
AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAG 5 _ Hs.190440
The awk code gives me:
AAAAAAAAAC 46
AAAAAAAAAG 5
Thank you!
Something like this?
awk '!a[$1]{a[$1]=$2;b[$1]=$3 FS $4} $2<a[$1]{a[$1]=$2} END{for(i in a)print i, a, b}' file
A basic change:
myFile="TestFile.txt"
cut -f1 -d" " "${myFile}" | sort | uniq | \
while read fKey
do
egrep '^'"${fKey}"'' "${myFile}" | sort -n -k2 | head -1
done
lColli
September 9, 2010, 4:13pm
10
Dear Frank,
Thank you for the answer.
But the new code shown partially the line.
The output were AAAAAAAAAC 46 _ , but should be AAAAAAAAAC 46 _ Hs.621716
Thank you very much.
Leandro Colli
---------- Post updated at 03:13 PM ---------- Previous update was at 03:00 PM ----------
Dear Felipe,
The new code shown that:
$ cut -f1 -d" " "${myFile}" | sort | uniq | while read fKey; do egrep '^'"${fKey}"'' "${myFile}" | sort -n -k2 | head -1; done
AAAAAAAAAC 46 AA636074 _
AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAC 77 AI861786 _
AAAAAAAAAC 77 BE536886 _
AAAAAAAAAC 77 BE548727 _
AAAAAAAAAC 77 BF691443 _
AAAAAAAAAC 79 BF754052 _
AAAAAAAAAC 80 BI004369 _
AAAAAAAAAC 86 BU535655 _
AAAAAAAAAC 86 BU539770 _
AAAAAAAAAC 86 BX111054 _
AAAAAAAAAC 86 BX643752 _
AAAAAAAAAC 86 CA390277 _
AAAAAAAAAC 94 CD522477 _
AAAAAAAAAC 94 DR159089 _
AAAAAAAAAG 100 F24242 _
AAAAAAAAAG 26 _ Hs.445936
AAAAAAAAAG 46 _ Hs.567437
AAAAAAAAAG 5 _ Hs.190440
Am I doing any thing wrong?
Thank you very much for your help!
Leandro
Sorry, I've adapt the code after posting, try it again:
awk '!a[$1]{a[$1]=$2;b[$1]=$3 FS $4} $2<a[$1]{a[$1]=$2} END{for(i in a)print i, a, b}' file
lColli
September 9, 2010, 4:28pm
12
Dear, Frank!
Do not sorry! You've helped very much!!
It's working!!!
Thank you very much!
Best Regars, Leandro
One way to do it with Perl -
$
$
$ cat f15
AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAC 46 AA636074 _
AAAAAAAAAC 77 AI861786 _
AAAAAAAAAC 77 BE536886 _
AAAAAAAAAC 77 BE548727 _
AAAAAAAAAC 77 BF691443 _
AAAAAAAAAC 79 BF754052 _
AAAAAAAAAC 80 BI004369 _
AAAAAAAAAC 86 BU535655 _
AAAAAAAAAC 86 BU539770 _
AAAAAAAAAC 86 BX111054 _
AAAAAAAAAC 86 BX643752 _
AAAAAAAAAC 86 CA390277 _
AAAAAAAAAC 94 CD522477 _
AAAAAAAAAC 94 DR159089 _
AAAAAAAAAG 5 _ Hs.190440
AAAAAAAAAG 26 _ Hs.445936
AAAAAAAAAG 46 _ Hs.567437
AAAAAAAAAG 74 _ Hs.674701
AAAAAAAAAG 77 _ Hs.700965
AAAAAAAAAG 77 AI034458 _
AAAAAAAAAG 77 AV699522 _
AAAAAAAAAG 77 AV719142 _
AAAAAAAAAG 79 AW150843 _
AAAAAAAAAG 80 AW801953 _
AAAAAAAAAG 80 AW857024 _
AAAAAAAAAG 86 BC033980 Hs.721040
AAAAAAAAAG 86 BE072635 _
AAAAAAAAAG 86 BF104499 _
AAAAAAAAAG 86 BF897128 _
AAAAAAAAAG 93 BG537298 _
AAAAAAAAAG 93 BG706646 _
AAAAAAAAAG 94 BI025564 _
AAAAAAAAAG 94 BX425052 _
AAAAAAAAAG 100 F24242 _
$
$
$ ##
$ perl -lane 'if ($.==1) {
$rec = $_;
$p=$F[0];
$m=$F[1];
} elsif ($p eq $F[0]) {
$rec = $_ if $F[1] < $m;
} else {
push @x, $rec;
$rec = $_;
$p=$F[0];
$m=$F[1];
}
END {push @x, $rec; print for (@x)}
' f15
AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAG 5 _ Hs.190440
$
$
tyler_durden
1 Like