Help removing lines with duplicated columns

yahyaaa · May 16, 2008, 6:27pm

Hi Guys...

Please Could you help me with the following ?

aaaa bbbb cccc sdsd
aaaa bbbb cccc qwer

as you can see, the 2 lines are matched in three fields...
how can I delete this pupicate ? I mean to delete the second one if 3 fields were duplicated ?

Thanks

fabtagon · May 16, 2008, 6:51pm

i) Use useful topics!
ii)

awk -F" " '! something[$3]++' inputfile

yahyaaa · May 16, 2008, 6:56pm

Could you explain this command for me ....

yahyaaa · May 16, 2008, 6:58pm

and what do u mean by " Use useful topics "...

Regards

fabtagon · May 16, 2008, 8:08pm

You've named this thread - your question - simply "help". It would be much better to use something like "removing duplicate lines".

And well, I see you've finally used the board search http://www.unix.com/shell-programming-scripting/62574-finding-duplicates-columns-removing-lines.html\#post302196106 .

Let's put it together:
i) you've used a meaningless topic
ii) you've posted your problem three times
iii) you've already found a thread in which you problem is answered and explained

Do you really believe this will motivate me to explain this to you? I don't.

yahyaaa · May 16, 2008, 10:02pm

Dear fabtagon.....

I posted this topic in 3 different places to make sure that my problem can be handled by someone who can deal well enough with Unix.. and as I expected, someone came, like you, and gave a solution that makes no sense at all... thats why I had to post it in different places.. GOT IT !!

Please if you have a stright answer... be my guest, otherwise... go and practice som unix commands.....
Cappich ???

Warm regards Bozo

yahyaaa · May 16, 2008, 10:34pm

I went back and checked the problem stated here http://www.unix.com/shell-programmin...#post302196106 .
, believe me it's different....

I have 3 matched fields out of 4, NOT one out of 4.
Here is another example.

SSSSS DDDDDDD 10:10:00 15:22:22
XXXXX AAAAAAA 00:00:11 00:02:11
XXXXX AAAAAAA 00:00:11 06:02:10
EEEEEE VVVVVVV 04:12:00 01:10:02
EEEEEE VVVVVVV 04:12:00 05:12:00
SSSSS DDDDDDD 10:10:00 13:23:21
EEEEEE FFFFFFFF 20:20:20 24:00:00

I want the output to be like

XXXXX AAAAAAA 00:00:11 06:02:10
EEEEEE VVVVVVV 04:12:00 05:12:00
SSSSS DDDDDDD 10:10:00 13:23:21
EEEEEE FFFFFFFF 20:20:20 24:00:00

If 2 lines are matched by 3 fields, I want to delete the first.

Thanks...

era · May 17, 2008, 5:03am

If you reverse the file, the same solution can be used.

tac file |
awk '!a[$1 $2 $3]++'

This uses the three first fields to decide whether it's seen the same data before.

If you don't have the tac command, maybe you can sort the input before feeding it to awk.

There are certainly ways to make awk print the last instead of the first line; you can search the forums for a plethora of examples of this.

yahyaaa · May 17, 2008, 6:31am

Dear era,

I tried to use tac command, put the unix didnt recognize it at all. Also, when I used the awk command alone, it gave an error ( Bailing out )...
Could you tell me about the "a" in the awk, what does it stand for ?

Thanks so much for your kind help.

fabtagon · May 17, 2008, 6:33am

Have a look at the forum FAQ Simple rules of the UNIX.COM forums: . Duplicating and crossposting is strongly discouraged.

If you aren't able to understand above command line even after a quite similiar one has been explained in detail in another thread maybe you should start really practising shell programming (which consists of reading man pages/online ressources) instead of demanding a solution from someone beeing as kind as to sacrifice his free time for you.

yahyaaa · May 17, 2008, 6:38am

Dear fabtagon,

Read the last example and you will find that it's not duplicated, it's another Question, NOT AS THE ONE THAT YOU COPIED IT's ANSWER AND PASTED IT TO MINE..... Look again if you are interested, otherwise, you have my best regards.

era · May 17, 2008, 6:51am

There are many different variants of awk. If your awk does not understand that script, see if you can find nawk or mawk or gawk instead. On some systems (Sun, HP-UX) you might be able to find a "XPG4" version of awk which is more modern than the bare-bones "old awk".

The name of awk comes from the family names of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan.

If you are unable to abide by the forum rules in spite of several remarks by forum users, perhaps these forums are not for you.

yahyaaa · May 17, 2008, 7:02am

I meant the "a" in the command you wrote (awk '!a[$1 $2 $3]++'), because it was not clear enough for me... Im new to awk and I needed a quick solution.

and I do abide the forum rules, see for your self above... I dare you if you find similar thread like this one or even close to..

Nevertheless, thanks for your help,

era · May 17, 2008, 7:19am

The forums' own search tool stupidly treats "awk" as a stop word, so I took a detour via Google.

site:unix.com awk duplicate - Google Search

a is just the name of a variable; if the associative array already contains a value for the given key, we have already seen that key before, and suppress printing. (The default if no action is given is to print anything matching the condition.)

yahyaaa · May 17, 2008, 7:33am

I used the nawk and it worked,
Thanks for your kind help, really appreciated.