How to filter a table by two columns

lColli · September 9, 2010, 1:40pm

Dear Forum,

 I would like to know how could I found every result from one column at a table based at two table.

Exemplo:

Table:

Red    4
Red    5
Red    10
Black   33
Black   44
Black   5
Green  2
Green 55
Green 78

I would like to have every color with the lower result from the second column:

Red   4
Black 5
Green 2

Thank you very much!

Leandro

Franklin52 · September 9, 2010, 1:55pm

Try:

awk '!a[$1]{a[$1]=$2} $2<a[$1]{a[$1]=$2} END{for(i in a)print i, a}' file

felipe.vinturin · September 9, 2010, 2:03pm

Another way, but with two commands:

# myFile="TestFile.txt"
# numDiff=`cut -f1 -d" " "${myFile}" | sort | uniq | wc -l`
# sort -n -k2 "${myFile}" | head -"${numDiff}"
Green 2
Red 4
Black 5

## Or in one!
# sort -n -k2 "${myFile}" | head -`cut -f1 -d" " "${myFile}" | sort | uniq | wc -l`

Regards!

lColli · September 9, 2010, 3:19pm

Dear Sir Franklin52,

Thank you very much! How can I change the code for print more columns \(like, column 3, 4 and 5?\)

Thank you again,

Best Regards, Leandro

---------- Post updated at 02:19 PM ---------- Previous update was at 01:34 PM ----------

Dear Felipe.vinturin,

I tried to use your code, but I don't know why when I use for larger data it prints repetitive objetcs from the first column. My file has more columns, could be this?

Thank you very much!

Best Regards, Leandro

felipe.vinturin · September 9, 2010, 3:33pm

Hi,

Maybe, if your file columns does not match the ones you put here!

In my command, I sort using the second column, if it is not the column with the number, it will print repetitive data.

Can you post an example of your file? With all columns?

Regards.

Franklin52 · September 9, 2010, 3:35pm

Post an example of the input file and the desired output.

lColli · September 9, 2010, 3:37pm

Dear Felipe,

Here is the data and I would like just the lines in red!

Thanks again!

Best Regard, Leandro

Tag Database rank order Accession UniGene cluster number

AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAC 46 AA636074 _
AAAAAAAAAC 77 AI861786 _
AAAAAAAAAC 77 BE536886 _
AAAAAAAAAC 77 BE548727 _
AAAAAAAAAC 77 BF691443 _
AAAAAAAAAC 79 BF754052 _
AAAAAAAAAC 80 BI004369 _
AAAAAAAAAC 86 BU535655 _
AAAAAAAAAC 86 BU539770 _
AAAAAAAAAC 86 BX111054 _
AAAAAAAAAC 86 BX643752 _
AAAAAAAAAC 86 CA390277 _
AAAAAAAAAC 94 CD522477 _
AAAAAAAAAC 94 DR159089 _
AAAAAAAAAG 5 _ Hs.190440
AAAAAAAAAG 26 _ Hs.445936
AAAAAAAAAG 46 _ Hs.567437
AAAAAAAAAG 74 _ Hs.674701
AAAAAAAAAG 77 _ Hs.700965
AAAAAAAAAG 77 AI034458 _
AAAAAAAAAG 77 AV699522 _
AAAAAAAAAG 77 AV719142 _
AAAAAAAAAG 79 AW150843 _
AAAAAAAAAG 80 AW801953 _
AAAAAAAAAG 80 AW857024 _
AAAAAAAAAG 86 BC033980 Hs.721040
AAAAAAAAAG 86 BE072635 _
AAAAAAAAAG 86 BF104499 _
AAAAAAAAAG 86 BF897128 _
AAAAAAAAAG 93 BG537298 _
AAAAAAAAAG 93 BG706646 _
AAAAAAAAAG 94 BI025564 _
AAAAAAAAAG 94 BX425052 _
AAAAAAAAAG 100 F24242 _

---------- Post updated at 02:37 PM ---------- Previous update was at 02:37 PM ----------

Dear Frank,

The output would be

AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAG 5 _ Hs.190440

The awk code gives me:

AAAAAAAAAC 46
AAAAAAAAAG 5

Thank you!

Franklin52 · September 9, 2010, 3:45pm

Something like this?

awk '!a[$1]{a[$1]=$2;b[$1]=$3 FS $4} $2<a[$1]{a[$1]=$2} END{for(i in a)print i, a, b}' file

felipe.vinturin · September 9, 2010, 3:46pm

A basic change:

myFile="TestFile.txt"
cut -f1 -d" " "${myFile}" | sort | uniq | \
while read fKey
do
	egrep '^'"${fKey}"'' "${myFile}" | sort -n -k2 | head -1
done

lColli · September 9, 2010, 4:13pm

Dear Frank,

Thank you for the answer.

But the new code shown partially the line.

The output were AAAAAAAAAC 46 _ , but should be AAAAAAAAAC 46 _ Hs.621716

Thank you very much.

 Leandro Colli

---------- Post updated at 03:13 PM ---------- Previous update was at 03:00 PM ----------

Dear Felipe,

The new code shown that:

$ cut -f1 -d" " "${myFile}" | sort | uniq | while read fKey; do egrep '^'"${fKey}"'' "${myFile}" | sort -n -k2 | head -1; done
AAAAAAAAAC      46      AA636074        _
AAAAAAAAAC      46      _       Hs.621716
AAAAAAAAAC      77      AI861786        _
AAAAAAAAAC      77      BE536886        _
AAAAAAAAAC      77      BE548727        _
AAAAAAAAAC      77      BF691443        _
AAAAAAAAAC      79      BF754052        _
AAAAAAAAAC      80      BI004369        _
AAAAAAAAAC      86      BU535655        _
AAAAAAAAAC      86      BU539770        _
AAAAAAAAAC      86      BX111054        _
AAAAAAAAAC      86      BX643752        _
AAAAAAAAAC      86      CA390277        _
AAAAAAAAAC      94      CD522477        _
AAAAAAAAAC      94      DR159089        _
AAAAAAAAAG      100     F24242  _
AAAAAAAAAG      26      _       Hs.445936
AAAAAAAAAG      46      _       Hs.567437
AAAAAAAAAG      5       _       Hs.190440

Am I doing any thing wrong?

Thank you very much for your help!

Leandro

Franklin52 · September 9, 2010, 4:15pm

Sorry, I've adapt the code after posting, try it again:

awk '!a[$1]{a[$1]=$2;b[$1]=$3 FS $4} $2<a[$1]{a[$1]=$2} END{for(i in a)print i, a, b}' file

lColli · September 9, 2010, 4:28pm

Dear, Frank!

Do not sorry! You've helped very much!!

It's working!!!

Thank you very much!

Best Regars, Leandro

durden_tyler · September 9, 2010, 4:31pm

One way to do it with Perl -

$
$
$ cat f15
AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAC 46 AA636074 _
AAAAAAAAAC 77 AI861786 _
AAAAAAAAAC 77 BE536886 _
AAAAAAAAAC 77 BE548727 _
AAAAAAAAAC 77 BF691443 _
AAAAAAAAAC 79 BF754052 _
AAAAAAAAAC 80 BI004369 _
AAAAAAAAAC 86 BU535655 _
AAAAAAAAAC 86 BU539770 _
AAAAAAAAAC 86 BX111054 _
AAAAAAAAAC 86 BX643752 _
AAAAAAAAAC 86 CA390277 _
AAAAAAAAAC 94 CD522477 _
AAAAAAAAAC 94 DR159089 _
AAAAAAAAAG 5 _ Hs.190440
AAAAAAAAAG 26 _ Hs.445936
AAAAAAAAAG 46 _ Hs.567437
AAAAAAAAAG 74 _ Hs.674701
AAAAAAAAAG 77 _ Hs.700965
AAAAAAAAAG 77 AI034458 _
AAAAAAAAAG 77 AV699522 _
AAAAAAAAAG 77 AV719142 _
AAAAAAAAAG 79 AW150843 _
AAAAAAAAAG 80 AW801953 _
AAAAAAAAAG 80 AW857024 _
AAAAAAAAAG 86 BC033980 Hs.721040
AAAAAAAAAG 86 BE072635 _
AAAAAAAAAG 86 BF104499 _
AAAAAAAAAG 86 BF897128 _
AAAAAAAAAG 93 BG537298 _
AAAAAAAAAG 93 BG706646 _
AAAAAAAAAG 94 BI025564 _
AAAAAAAAAG 94 BX425052 _
AAAAAAAAAG 100 F24242 _
$
$ 
$ ##
$ perl -lane 'if ($.==1) {
                $rec = $_;
                $p=$F[0];
                $m=$F[1];
              } elsif ($p eq $F[0]) {
                $rec = $_ if $F[1] < $m;
              } else {
                push @x, $rec;
                $rec = $_;
                $p=$F[0];
                $m=$F[1];
              }
              END {push @x, $rec; print for (@x)}
             ' f15
AAAAAAAAAC 46 _ Hs.621716
AAAAAAAAAG 5 _ Hs.190440
$
$

tyler_durden