awk to search similar strings and arrange in a specified pattern

Hi,

I'm running a DB query which returns names of people and writes it in a text file as shown below:

Carey, Jim; Cena, John
Cena, John
Sen, Tim; Burt, Terrence
Lock, Jessey; Carey, Jim
Norris, Chuck; Lee, Bruce
Rock, Dwayne; Lee, Bruce

I want to use awk and get all the names (excluding multiple entries) and write it in a separate text file in the pattern shown below:

Carey, Jim; Cena, John; Sen, Tim; Burt, Terrence; Lock, Jessey; Norris, Chuck; Lee, Bruce; Rock, Dwayne

Please advice how this can be done.. Also, can this be done in perl?

P.S- I'm working on ksh.

Thanks & Regards,
Prashu

awk -F ';'  '{for(i=1; i<=NR; i++) {arr[$i]++}
                END{for(i in arr){printf "%s;", i)} {print ""} }' oldfile > newfile
awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) print i}' file

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) print i}' file
This is throwing an errror:

awk: syntax error near line 1
awk: bailing out near line 1

---------- Post updated at 07:19 PM ---------- Previous update was at 07:15 PM ----------

Sorry.. This was the code thrwoing the error:

awk -F ';'  '{for(i=1; i<=NR; i++) {arr[$i]++}
                END{for(i in arr){printf "%s;", i)} {print ""} }' oldfile > newfile

And

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) print i}' file

The above code does work but the output is not in the format that am looking for. The output is coming

Cena, John
Norris, Chuck
Carey, Jim
Sen, Tim
 Cena, John
 Lee, Bruce
Rock, Dwayne
 Burt, Terrence
 Carey, Jim
Lock, Jessey

but i want it like

Carey, Jim; Cena, John;Sen, Tim; Burt, Terrence; Lock, Jessey; Norris, Chuck; Lee, Bruce; Rock, Dwayne
sed -e 's#; #\n#g' infile | sort -u | awk '{printf $0"; "}' | sed 's/; $//'

does not return anything.. :frowning:

[root@node2 ~]# cat data 
Carey, Jim; Cena, John
Cena, John
Sen, Tim; Burt, Terrence
Lock, Jessey; Carey, Jim
Norris, Chuck; Lee, Bruce
Rock, Dwayne; Lee, Bruce
[root@node2 ~]# sed -e 's#; #\n#g' data | sort -u | awk '{printf $0"; "}' | sed 's/; $//'
Burt, Terrence; Carey, Jim; Cena, John; Lee, Bruce; Lock, Jessey; Norris, Chuck; Rock, Dwayne; Sen, Tim

Simple change to the awk you have working:

awk -F"; " '{for (i=1;i<=NF;i++) a[$i]=1}END{for (i in a) printf( "%s; ", i ); printf( "\n" ); }' file
1 Like

That works fine, but there are multiple names coming which i want to avoid.

Current output of the code:

Cena, John; Norris, Chuck; Carey, Jim; Sen, Tim;  Cena, John;  Lee, Bruce; Rock, Dwayne;  Burt, Terrence;  Carey, Jim; Lock, Jessey;

i want the names only to be printed once, i.e a name should not come more than once. Like:

Burt, Terrence; Carey, Jim; Cena, John; Lee, Bruce; Lock, Jessey; Norris, Chuck; Rock, Dwayne; Sen, Tim

Sorry to be bothering with this..

Thanks,
Prashu

I just cut and pasted both the script and your list of names and this is the output I am getting:

Burt, Terrence; Lee, Bruce; Cena, John; Sen, Tim; Lock, Jessey; Carey, Jim; Rock, Dwayne; Norris, Chuck;

Which seems what you want.

What version of awk are you running (try the command awk --version to see), and are you running on Solaris? If you are using Solaris then try nawk rather than awk and see if that makes a difference.

And no bother at all!