Getting fields from a file having multiple delimiters

dev.devil.1983 · March 9, 2017, 3:20am

Hi All,

I have a file with a single row having the following text

ABC.ABC.ABC,Database,New123,DBNAME,F,ABC.ABC.ABC_APP,"@FUNCTION1("ENT1") ,@FUNCTION2("ENT2")",R,

I want an output in the following format

 
ABC.ABC.ABC DBNAME ABC.ABC.ABC_APP '@FUNCTION1("ENT1") ,@FUNCTION2("ENT2")' R

Note : the commas should be replaced with spaces, except for the one's between the double quotes. Also, the outer double quotes have to be replaced with a single quote
Lastly, the length of the 'functions'(statements between double quotes) is not fixed and can vary.

Looking forward for your expert advise. Thanks in Advance.

Regards,
Devender

Don_Cragun · March 9, 2017, 3:42am

Assuming that your input sample is representative:

awk -F, -v sq="'" '{match($0, /".*"/); print $1, $4, $6, sq substr($0, RSTART+1, RLENGTH-2) sq, $(NF-1)}' file

seems to do what you want.

If you are running this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

dev.devil.1983 · March 9, 2017, 4:06am

Thanks Don, seems to work.

If you could please help me with the explanation as well.

Regards,
Dev

Don_Cragun · March 9, 2017, 2:47pm

Hi dev.devil.1983,
Here is a copy of the awk script with comments...

# Invoke the awk utility with the input field separator set to a comma (-F,)
# and a variable named sq that is set to a string containing a single-quote
# character (-v sq="'").
awk -F, -v sq="'" '
{	# For each line read from the input file...
	# Search for the longest string in the current input line that starts
	# and ends with a double-quote character.  If a match is found, set
	# RSTART to the index of the 1st double-quote character and set RLENGTH
	# to the number of characters matched:
	match($0, /".*"/)
	# Print the:
	#	1st field from the input line ($1),
	#	print the output field separator (,), (Note that the default
	#	   OFS is a <space> character and the default is not overridden
	#	   in this script.)
	#	2nd field from the input line ($2),
	#	print the output field separator (,),
	#	4th field from the input line ($4),
	#	print the output field separator (,),
	#	6th field from the input line ($6),
	#	print the output field separator (,),
	#	print a single-quote character (sq),
	#	print the substring of the input line starting after the 1st
	#	   double-quote character up to, but not including, the last
	#	   double-quote character (substr(...)),
	#	print a single-quote character (sq),
	#	print the output field separator (,),
	#	print the next to the last field from the input line ($(NF-1)),
	#	and print the output record separator.  (Note that the default
	#	  ORS is a <newline> character and the default is not overridden
	#	  in this script.)
	print $1, $4, $6, sq substr($0, RSTART + 1, RLENGTH - 2) sq, $(NF - 1)
}' file	# Terminate the awk script and name the input file(s) to be processed.

Does this help?