To Delete the repeated occurances and print in same line by appending values

shaliniyadav · June 25, 2009, 5:30am

Hi All,

I am trying to work on below text

a b c 1
a b c 2
a b c 3
x y z 6
x y z 44
a b c 89

Need to delete the occurances and get in single line like below:

a b c 1 2 3 89
x y z 6 44 89

Please help me i am new into unix scripting .....

---------- Post updated at 03:00 PM ---------- Previous update was at 02:55 PM ----------

Even when i try using a xcel pivot table to do this also doesnt help as output is something like this
abc 1
2
3
89
Xyz 6
44
89

Not getting in same row.......

How to use awk to get this

rakeshawasthi · June 25, 2009, 5:43am

awk '/a b c/ {t=t" "$4} /x y z/ {s=s" "$4} END {print "a b c" t;print "x y z" s}' inputfile

shaliniyadav · June 25, 2009, 5:48am

Hey thanks but this is for only xyz and abc but my data file is huge with different patterns occuring many times....

panyam · June 25, 2009, 7:44am

 
awk '{ a[$1" "$2" "$3" "]=a[$1" "$2" "$3" "]" "$4}END { for(i in a) print i,a}' input_file.txt

rakeshawasthi · June 25, 2009, 7:53am

Check if this meets ur requirement...

/bin/sort inputfile | awk ' BEGIN { FS=OFS=" ";getline;prev=$1" "$2" "$3;s=$4 }

{
  curr=$1" "$2" "$3
#  print curr
  if ( prev == curr )
    s=s" "$4
  if ( prev != curr )
   {
   print prev " " s
   prev=curr;s=$4
   }
}
END {
   print prev " " s
}'

panyam's solution is cool and exact..

ghostdog74 · June 25, 2009, 8:22am

why does "x y z" has an extra 89?

rakeshawasthi · June 25, 2009, 8:25am

Looks typo to me.

shaliniyadav · June 26, 2009, 12:40am

@ Panyam and @ Rakesh thanks a lot.... It did serve my purpose.... Thanks.. But i am stil working on it.. and will get back, as my original input file is different, so trying to impliment on it... I think it wil be better to have a delimiter to differentiate...

@ ghostdog74 ya da its a type error sorry for confusion

---------- Post updated 06-26-09 at 10:10 AM ---------- Previous update was 06-25-09 at 05:59 PM ----------

Originally quoted by Rakesh:

----------------------------------------
/bin/sort inputfile | awk ' BEGIN { FS=OFS=" ";getline;prev=$1" "$2" "$3;s=$4 }

{
curr=$1" "$2" "$3
# print curr
if ( prev == curr )
s=s" "$4
if ( prev != curr )
{
print prev " " s
prev=curr;s=$4
}
}
END {
print prev " " s
}'
------------------------------------------

This piece of code worked fine but thr is one more thing i would like to highlight.. You have sirted entire inputfile hence the occurances are not printed in order they are see the example:

inputfile:
a b c 1
a b c 2
a b c 3
x y z 6
x y z 44
a b c 89
x y z 9
b s c 100
a b c 19

Output of above code:
a b c 1 19 2 3 89
b s c 100
x y z 44 6 9

So here al these values appended ones also sorted..
Instead shud get as below:

a b c 1 2 3 89 19
b s c 100
x y z 6 44 9

I dont mind the order of values if 1,2,3.. N then can sort only til N-1.. Bec last value is one with different values

Please help in this regard.

rakeshawasthi · June 26, 2009, 1:27am

In such case I will use paynam's code (i really liked that and learnt) with little modification...

sort -n -k4,4 inputfile | awk '{a[$1" "$2" "$3]=a[$1" "$2" "$3]" "$4}END{for (i in a) print i a}'

---------- Post updated at 10:57 AM ---------- Previous update was at 10:56 AM ----------

i suggest you to produce your code while posting queries... you could have tried ...
man sort.