awk to search similar strings and arrange in a specified pattern

prashu_g · February 18, 2012, 8:25am

Hi,

I'm running a DB query which returns names of people and writes it in a text file as shown below:

Carey, Jim; Cena, John
Cena, John
Sen, Tim; Burt, Terrence
Lock, Jessey; Carey, Jim
Norris, Chuck; Lee, Bruce
Rock, Dwayne; Lee, Bruce

I want to use awk and get all the names (excluding multiple entries) and write it in a separate text file in the pattern shown below:

Carey, Jim; Cena, John; Sen, Tim; Burt, Terrence; Lock, Jessey; Norris, Chuck; Lee, Bruce; Rock, Dwayne

Please advice how this can be done.. Also, can this be done in perl?

P.S- I'm working on ksh.

Thanks & Regards,
Prashu

jim_mcnamara · February 18, 2012, 8:37am

awk -F ';'  '{for(i=1; i<=NR; i++) {arr[$i]++}
                END{for(i in arr){printf "%s;", i)} {print ""} }' oldfile > newfile

bartus11 · February 18, 2012, 8:37am

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) print i}' file

prashu_g · February 18, 2012, 8:49am

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) print i}' file
This is throwing an errror:

awk: syntax error near line 1
awk: bailing out near line 1

---------- Post updated at 07:19 PM ---------- Previous update was at 07:15 PM ----------

Sorry.. This was the code thrwoing the error:

awk -F ';'  '{for(i=1; i<=NR; i++) {arr[$i]++}
                END{for(i in arr){printf "%s;", i)} {print ""} }' oldfile > newfile

And

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) print i}' file

The above code does work but the output is not in the format that am looking for. The output is coming

Cena, John
Norris, Chuck
Carey, Jim
Sen, Tim
 Cena, John
 Lee, Bruce
Rock, Dwayne
 Burt, Terrence
 Carey, Jim
Lock, Jessey

but i want it like

Carey, Jim; Cena, John;Sen, Tim; Burt, Terrence; Lock, Jessey; Norris, Chuck; Lee, Bruce; Rock, Dwayne

complex.invoke · February 18, 2012, 9:24am

sed -e 's#; #\n#g' infile | sort -u | awk '{printf $0"; "}' | sed 's/; $//'

prashu_g · February 18, 2012, 9:30am

does not return anything..

complex.invoke · February 18, 2012, 9:38am

[root@node2 ~]# cat data 
Carey, Jim; Cena, John
Cena, John
Sen, Tim; Burt, Terrence
Lock, Jessey; Carey, Jim
Norris, Chuck; Lee, Bruce
Rock, Dwayne; Lee, Bruce
[root@node2 ~]# sed -e 's#; #\n#g' data | sort -u | awk '{printf $0"; "}' | sed 's/; $//'
Burt, Terrence; Carey, Jim; Cena, John; Lee, Bruce; Lock, Jessey; Norris, Chuck; Rock, Dwayne; Sen, Tim

agama · February 18, 2012, 9:50am

prashu_g:

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) print i}' file

The above code does work but the output is not in the format that am looking for. The output is coming

Cena, John
Norris, Chuck
Carey, Jim
Sen, Tim
 Cena, John
 Lee, Bruce
Rock, Dwayne
 Burt, Terrence
 Carey, Jim
Lock, Jessey

but i want it like

Carey, Jim; Cena, John;Sen, Tim; Burt, Terrence; Lock, Jessey; Norris, Chuck; Lee, Bruce; Rock, Dwayne

Simple change to the awk you have working:

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) printf( "%s; ", i ); printf( "\n" ); }' file

prashu_g · February 18, 2012, 9:58am

That works fine, but there are multiple names coming which i want to avoid.

Current output of the code:

Cena, John; Norris, Chuck; Carey, Jim; Sen, Tim;  Cena, John;  Lee, Bruce; Rock, Dwayne;  Burt, Terrence;  Carey, Jim; Lock, Jessey;

i want the names only to be printed once, i.e a name should not come more than once. Like:

Burt, Terrence; Carey, Jim; Cena, John; Lee, Bruce; Lock, Jessey; Norris, Chuck; Rock, Dwayne; Sen, Tim

Sorry to be bothering with this..

Thanks,
Prashu

agama · February 18, 2012, 10:52am

I just cut and pasted both the script and your list of names and this is the output I am getting:

Burt, Terrence; Lee, Bruce; Cena, John; Sen, Tim; Lock, Jessey; Carey, Jim; Rock, Dwayne; Norris, Chuck;

Which seems what you want.

What version of awk are you running (try the command awk --version to see), and are you running on Solaris? If you are using Solaris then try nawk rather than awk and see if that makes a difference.

And no bother at all!