awk script to split field data

forumthreads · July 21, 2009, 11:08am

Hi Experts,

I have a Input.txt document which contains data fields seperated by tabs. There are 4 fields totally Named UNIQUE, ORDER, CONTACT and WINS. The UNIQUE field contains unique ID and the CONTACT field contains data seperated by comma in some records. I am looking to write an awk script which will split the CONTACT field values by comma and produce an output file with the split data and Unique ID as shown in the attachement. Any help is appreciated very much.

Many Thanks.

Input.txt

###########################################
UNIQUE	ORDER	CONTACT	WINS
G001	R01	SP,SD	solution
G002	R01A	B,DR,P	
G003	R02	AF	fix
G004	R03	PO,AC	build 
###########################################

Output.txt

###########################################
UNIQUE	CONTACT
G001	SP
G001	SD
G002	B
G002	DR
G002	P
G003	AF
G004	PO
G004	AC
###########################################

zaxxon · July 21, 2009, 11:22am

Next time better use CODE-tags to enhance readability (the # in the editor window when writing a post) instead of attaching a file, ty.

$> awk '/^###/ {print} {split($3,arr,","); for(e in arr){print $1"\t"arr[e]} }' infile
###########################################
UNIQUE  CONTACT
G001    SP
G001    SD
G002    B
G002    DR
G002    P
G003    AF
G004    PO
G004    AC
###########################################

forumthreads · July 22, 2009, 9:41am

Many thanks, your code works perfectly. I need some more help from you. when my CONTACT field values contain spaces between words then the code produces only the 1st value and ignores the rest of the field value. For example please see the 1st record.

If my input file is in the below format

UNIQUE	ORDER	CONTACT		WINS
G001	R01	Swr Pq,SD	solution
G002	R01A	B,DR,P	
G003	R02	A F		fix
G004	R03	PO,A C		build

then i want the output to be

UNIQUE	CONTACT
G001	Swr Pq
G001	SD
G002	B
G002	DR
G002	P
G003	A F
G004	PO
G004	A C

however i get the following output

UNIQUE	CONTACT
G001	Swr
G002	B
G002	DR
G002	P
G003	A
G004	PO

Any idea how I can correct it, that is to allow spaces between words.

Many Thanks again.

summer_cherry · July 22, 2009, 10:04pm

awk 'NR==1{print $1"  "$3;next;} {n=split($3,arr,",");for(i=1;i<=n;i++){print $1" "arr;}}' yourfile

danmero · July 22, 2009, 10:33pm

Previous solutions spinup

awk '{n=split($3,arr,",");for(i=0;i++< n;){printf "%s\t%s\n",$1,arr;}}/#/' file

Ygor · July 22, 2009, 11:23pm

I would think that the field-separator needs to be set to a tab.

danmero · July 23, 2009, 12:24am

That's the output on my OS.

Ygor · July 23, 2009, 2:07am

$ awk '{n=split($3,arr,",");for(i=0;i++< n;){printf "%s\t%s\n",$1,arr;}}/#/' file1
UNIQUE  CONTACT
G001    Swr
G002    B
G002    DR
G002    P
G003    A
G004    PO
G004    A

With FS set to tab..

$ awk -F '\t' '{n=split($3,arr,",");for(i=0;i++< n;){printf "%s\t%s\n",$1,arr;}}/#/' file1
UNIQUE  CONTACT
G001    Swr Pq
G001    SD
G002    B
G002    DR
G002    P
G003    A F
G004    PO
G004    A C

forumthreads · July 23, 2009, 5:32am

Many thanks to help me. I tried this code but it returned the following result

G001	Pq
G001	sD
G003	F
G004	c

this is not good to me. I need to acheive the following result

UNIQUE	CONTACT
G001	Swr Pq
G001	SD
G002	B
G002	DR
G002	P
G003	A F
G004	PO
G004	A C

please could you help me to acheive this.

Many Many thanks

Franklin52 · July 23, 2009, 6:02am

Have you tried the solution of Ygor?

danmero · July 23, 2009, 7:08am

Right, I check only the original requirement :rolleyes:

Next time try each solution provided by forum users

forumthreads · July 23, 2009, 9:15am

ygor:

$ awk '{n=split($3,arr,",");for(i=0;i++< n;){printf "%s\t%s\n",$1,arr;}}/#/' file1
UNIQUE  CONTACT
G001    Swr
G002    B
G002    DR
G002    P
G003    A
G004    PO
G004    A

With FS set to tab..

$ awk -F '\t' '{n=split($3,arr,",");for(i=0;i++< n;){printf "%s\t%s\n",$1,arr;}}/#/' file1
UNIQUE  CONTACT
G001    Swr Pq
G001    SD
G002    B
G002    DR
G002    P
G003    A F
G004    PO
G004    A C

Ygor,
I tried this code and get the following message, I have typed exactly as shown by you.
awk: syntax error near line 1
awk: bailing out near line 1
Any idea what i am doing wrong

Many Thanks for all your responses.

---------- Post updated at 02:15 PM ---------- Previous update was at 02:13 PM ----------

Franklin,

Yes i did try Ygor solution (with tab) but get syntax error, any idea what could go wrong?

Thanks

Franklin52 · July 23, 2009, 9:27am

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards