Help with Data Sorting Command

793589 · September 29, 2009, 4:16pm

Hi,

I have a problem on data sorting, example my file as below:

123 123/789 aaa bbb ccc ddd (adf)
112 112/123 aaa bbb ccc (ade)
102 1a3/7g9 (adf)03
110 12b/129 aaa bbb ccc ddd fff(a8f)03
117 42f/8c9 aaa bbb ccc ddd (adf)
142 120/tyu fff
612 023/w03 bbb ccc ddd (adf)03 lll kkk
152 d68/b56 {eng} aaa ddd (a)05
710 129/po9 aaa bbb ccc ddd (adf)
189 822/78y aaa ccc ddd (adf)03
345 g67/239 aaa bbb ccc ddd (adf) eng jkl ggg
189 822/78y (adf)03

Not sure what is the command that i can use to extract and display the data only on column 3(eg, aaa,bbb,ccc) then elimanate the line that without aaa,bbb,ccc and any others unmatched attributes(eg eng,ggg,ddd...)

example output:
123 123/789 aaa bbb ccc
112 112/123 aaa bbb ccc
110 12b/129 aaa bbb ccc
117 42f/8c9 aaa bbb ccc
612 023/w03 bbb ccc
152 d68/b56 aaa
710 129/po9 aaa bbb ccc
189 822/78y aaa ccc
345 g67/239 aaa bbb ccc

appreciate your help!!!

sanjay.login · September 29, 2009, 5:54pm

HI,

make use of this code.....

grep -e "aaa" -e "bbb" -e "ccc" data|egrep -o '.*ccc'

It gives the output

123 123/789 aaa bbb ccc
112 112/123 aaa bbb ccc
110 12b/129 aaa bbb ccc
117 42f/8c9 aaa bbb ccc
612 023/w03 bbb ccc
710 129/po9 aaa bbb ccc
189 822/78y aaa ccc
345 g67/239 aaa bbb ccc

Regards,
Sanjay

793589 · September 29, 2009, 10:02pm

i've try the method above, but result returned inaccurately.
Is there any other way to sort the data, other than grep

117 42f/8c9 aaa bbb ccc
612 023/w03 fff bbb ccc
710 129/po9 aaa bbb ccc

dr.house · September 30, 2009, 3:27am

Sanjay's code works on the input posted - which seems to differ from the input you're actually using: there's no line '612 ... fff ...' in the data given to us, and to my knowledge, grep is not that creative

[house@leonov] cat test.file | grep -e 'aaa' -e 'bbb' -e 'ccc' | egrep -o '.*ccc'
[...]
612 023/w03 bbb ccc
[...]

793589 · September 30, 2009, 10:45pm

I'm sorry , acutally i didn't gave the full set of data from my file(i just copy a part from my file) and that's why you guys did not see the line with "612 023/w03 fff bbb ccc"....

maybe, i'm in too rush and not given the details enough...
my file is actually like this,
0123123:56:Y01:S32 123/00/00/000T ddasr_#3T aaa#6 bbb ccc ddd (adf) 88
6897112:46:R51:B00 112/32/03/003M ca_a#6 aaa# ddpsr_2A_RA#3T ddasr_#3T bbb# ccc (ade) ce_b#6 27
5548102:66:Y03:B02 1a3/7g9/65/7YY (adf)03 ddmsrsb2sgy_2g#188T ddMgfdus1889rsb2sgy_ART_2g#36T
1645810:97:Y87:B55 12b/129/00/110 prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 aaa zmzgjklphmk5af#6 ddp35sad2Cr_2C_RA#35T bbb ccc# ddd fff(a8f)03 63
5348117:02:R89:B31 42f/8c9/28/7YU prefsdb_sgdbf_bdfb_aaa aaa#6 bbb ccc ddd (adf) 99
1479992:93:R22:B85 120/tyu/36/DFU prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 fff 26
6745512:76:Y65:S99 023/w03/10/11P ddmsrsb2sgy_2g#188T bbb ccc ddd# (adf)03 lll kkk#
0234152:85:R00:S55 d68/b56/65/11U {eng} aaa ddd (a)05 ddmsrsb2sgy_2g#88
7689710:65:R01:B45 129/po9/85/027 aaa bbb ccc ddd (adf)
1001289:10:R01:B76 822/78y/64/008 prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 fff 26
3356745:23:Y66:B96 g67/239/65/11M aaa bbb#7 ccc ddd (adf) eng jkl ggg prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 fff 26
1889429:15:Y02:S88 822/78y/00/04M (adf)03

my 1st question, for the column with bold:
what is the suitable command(other than grep) use to extract and eliminate all unwanted alphabet/symbol/number and display exactly "aaa","bbb" and "ccc"...

the output should look like this
0123123:56:Y01:S32 123/00/00/000T aaa bbb ccc
6897112:46:R51:B00 112/32/03/003M aaa bbb ccc
1645810:97:Y87:B55 12b/129/00/110 aaa bbb ccc
5348117:02:R89:B31 42f/8c9/28/7YU aaa bbb ccc
6745512:76:Y65:S99 023/w03/10/11P bbb ccc
0234152:85:R00:S55 d68/b56/65/11U aaa
7689710:65:R01:B45 129/po9/85/027 aaa bbb ccc
3356745:23:Y66:B96 g67/239/65/11M aaa bbb ccc

my 2nd question, for the column 2 with "/":
What is the most appropriate/eazy command to help me remove 1st and 2nd "/" and then replace the 3rd "/" with "."

i'm expected the output look in this way,
0123123:56:Y01:S32 1230000.000T aaa bbb ccc
6897112:46:R51:B00 1123203.003M aaa bbb ccc
1645810:97:Y87:B55 12b12900.110 aaa bbb ccc
5348117:02:R89:B31 42f8c928.7YU aaa bbb ccc
6745512:76:Y65:S99 023w0310.11P bbb ccc
0234152:85:R00:S55 d68b5665.11U aaa
7689710:65:R01:B45 129po985.027 aaa bbb ccc
3356745:23:Y66:B96 g6723965.11M aaa bbb ccc

Thanks in advance

sanjay.login · October 1, 2009, 5:52am

Hi 793589,

Hope this code will work for you.

awk '{x=$1" "$2; for (i=3;i<=NF;i++){ if ($i~"^aaa"||$i~"^bbb"||$i~"^ccc"){$i=substr($i,1,3);x=x" "$i}}; print x;x="" }' infile |awk 'NF>2{print $0}'

input :

0123123:56:Y01:S32 123/00/00/000T ddasr_#3T aaa#6 bbb ccc ddd (adf) 88
6897112:46:R51:B00 112/32/03/003M ca_a#6 aaa# ddpsr_2A_RA#3T ddasr_#3T bbb# ccc (ade) ce_b#6 27
5548102:66:Y03:B02 1a3/7g9/65/7YY (adf)03 ddmsrsb2sgy_2g#188T ddMgfdus1889rsb2sgy_ART_2g#36T
1645810:97:Y87:B55 12b/129/00/110 prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 aaa zmzgjklphmk5af#6 ddp35sad2Cr_2C_RA#35T bbb ccc# ddd fff(a8f)03 63
5348117:02:R89:B31 42f/8c9/28/7YU prefsdb_sgdbf_bdfb_aaa aaa#6 bbb ccc ddd (adf) 99
1479992:93:R22:B85 120/tyu/36/DFU prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 fff 26
6745512:76:Y65:S99 023/w03/10/11P ddmsrsb2sgy_2g#188T bbb ccc ddd# (adf)03 lll kkk#
0234152:85:R00:S55 d68/b56/65/11U {eng} aaa ddd (a)05 ddmsrsb2sgy_2g#88
7689710:65:R01:B45 129/po9/85/027 aaa bbb ccc ddd (adf)
1001289:10:R01:B76 822/78y/64/008 prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 fff 26
3356745:23:Y66:B96 g67/239/65/11M aaa bbb#7 ccc ddd (adf) eng jkl ggg prefsdb_sgdbf_bdfb_aaa zzgthmk5af_6#6 fff 26
1889429:15:Y02:S88 822/78y/00/04M (adf)03

output:

0123123:56:Y01:S32 123/00/00/000T aaa bbb ccc
6897112:46:R51:B00 112/32/03/003M aaa bbb ccc
1645810:97:Y87:B55 12b/129/00/110 aaa bbb ccc
5348117:02:R89:B31 42f/8c9/28/7YU aaa bbb ccc
6745512:76:Y65:S99 023/w03/10/11P bbb ccc
0234152:85:R00:S55 d68/b56/65/11U aaa
7689710:65:R01:B45 129/po9/85/027 aaa bbb ccc
3356745:23:Y66:B96 g67/239/65/11M aaa bbb ccc

and for your second question for first "/" replace wth null and third "/" replaced with a "."

you can go for the command

sed 's/\//./3g
s/\///g' file_name

so for single command you can go for :

awk '{x=$1" "$2; for (i=3;i<=NF;i++){ if ($i~"^aaa"||$i~"^bbb"||$i~"^ccc"){$i=substr($i,1,3);x=x" "$i}}; print x;x="" }' infile |awk 'NF>2 {print $0}'|sed 's/\//./3g
s/\///g'

which will give the desired output:

0123123:56:Y01:S32 1230000.000T aaa bbb ccc
6897112:46:R51:B00 1123203.003M aaa bbb ccc
1645810:97:Y87:B55 12b12900.110 aaa bbb ccc
5348117:02:R89:B31 42f8c928.7YU aaa bbb ccc
6745512:76:Y65:S99 023w0310.11P bbb ccc
0234152:85:R00:S55 d68b5665.11U aaa
7689710:65:R01:B45 129po985.027 aaa bbb ccc
3356745:23:Y66:B96 g6723965.11M aaa bbb ccc

enjoy

Regards,
Sanjay

793589 · October 1, 2009, 6:54am

Thank you very much...
To have a better understand, could you please help to give a simple brief on every switchs that you have applied on awk command?

sanjay.login · October 1, 2009, 8:12am

x=$1" "$2

at the very begining $1(field 1) then space and $2(field 2 ) are getting stored in variable x.
then the "for loop" concept is here started from the third field unwards and should be compared for the "if condition"if ($i~"^aaa"||$i~"^bbb"||$i~"^ccc"){$i=substr($i,1,3);x=x" "$i} (which is if the particular field will be starting by "aaa" or "bbb" or "ccc") then make the $i which is the corrent field to it first 3 character(suppose a field is aaa#6 then the first 3 character which is "aaa" will be stored in$i) and the x is again modified to x" "$i(which is nothig but its content with a space and the current field ) and finally print x does the printing of the content of x which is nothing but the first two field of the lines and all the matches of aaa bbb ccc.

you can remove the red code which also give the same output

awk '{x=$1" "$2; for (i=3;i<=NF;i++){ if ($i~"^aaa"||$i~"^bbb"||$i~"^ccc"){$i=substr($i,1,3);x=x" "$i}}; print x;x="" }' infile |awk 'NF>2 {print $0}'|sed 's/\//./3g
s/\///g'

regards,
Sanjay