Shuffle Columns in Pipe delimited data

BrahmaNaiduA · February 9, 2015, 3:33pm

My sample data contains escape characters followed by delimiter. I'm stuck in writing awk comand to swap the columns. please help me out.


Sample Data: 12345678|ABN\|XYZ MED CHEM PTY. LTD.|C||100.00|22|AB"C\|Corp|"XYZ|CDEF"|

Expected Output Data: 12345678|C|ABN\|XYZ MED CHEM PTY. LTD.|22|AB"C\|Corp||100.00|"XYZ|CDEF"|

Thanks,
Brahma

mjf · February 9, 2015, 3:40pm

This is a very straightforward task in awk. Please post the code you are stuck on.

BrahmaNaiduA · February 9, 2015, 3:48pm

In my pre-processing stage data contains delimiter i'm replacing it as \| and when I try to swap the columns I'm not getting the proper results

Below is the code I'm using to swap the columns.

echo '12345678|ABN\|XYZ MED CHEM PTY. LTD.|C||100.00|22|AB"C\|Corp|"XYZ|CDEF"|'| awk 'BEGIN{FS=OFS="|"}{print $1, $3, $2, $3, $6, $7, $4,$5,$8,$9}'

Output getting: 12345678|XYZ MED CHEM PTY. LTD.|ABN\|XYZ MED CHEM PTY. LTD.|100.00|22|C||AB"C\|Corp

Expected Output Data: 12345678|C|ABN\|XYZ MED CHEM PTY. LTD.|22|AB"C\|Corp||100.00|"XYZ|CDEF"|

Please provide your suggestions.

Scrutinizer · February 9, 2015, 3:51pm

One way would be to first replace all the \| with some character that will not appear in the text, then do the awk and them replace those characters back to \| , e.g.

sed 's/\\|/�/g' file | awk ... | sed 's/�/\\|/g'

mjf · February 9, 2015, 4:07pm

Just using awk which is ignoring the escape char:

 
echo '12345678|ABN\|XYZ MED CHEM PTY. LTD.|C||100.00|22|AB"C\|Corp|"XYZ|CDEF"|'| awk 'BEGIN{FS=OFS="|"}{print $1, $4, $2, $3, $7, $8, $9, $5, $6, $10, $11, $12}'
12345678|C|ABN\|XYZ MED CHEM PTY. LTD.|22|AB"C\|Corp||100.00|"XYZ|CDEF"|

BrahmaNaiduA · February 9, 2015, 4:16pm

this won't solve my problem.. some how the awk should not consider \| as a delimiter and swap the columns.

Don_Cragun · February 9, 2015, 5:16pm

Why doesn't Scrutinizer's suggestion solve your problem?

mjf · February 9, 2015, 6:46pm

BrahmaNaiduA
Registered User

BrahmaNaiduA,
The awk solution I provided ignores the escape characters which produces the output that matches your expected output. If you want the escape characters to ignore the delimiters then I think you need to refine your expected output.

disedorgue · February 9, 2015, 8:21pm

Hi,
An example of awk program that transform fields:

$ echo '12345678|ABN\|XYZ MED CHEM PTY. LTD.|C||100.00|22|AB"C\|Corp|"XYZ|CDEF"|' | awk -F\| '{A=1;for(i=1;i<NF;i++){ if ($i ~ /\\$/) {T[A++]=$i"|"$(++i)} else {T[A++]=$i}};for(i=1;i<A;i++)$i=T;NF=A;print}' OFS="\n"
12345678
ABN\|XYZ MED CHEM PTY. LTD.
C

100.00
22
AB"C\|Corp
"XYZ
CDEF"
"XYZ

Here, each line represent one field.
Just change OFS and your print.

Regards.

BrahmaNaiduA · February 24, 2015, 11:49am

Don,

I'll have such type of fields in my file so many. so thats why i'm requesting for suggestions to do it in a efficient way.

Thanks,
Brahma

Don_Cragun · February 24, 2015, 12:56pm

Did you try:

sed 's/\\|/�/g' file | awk ... | sed 's/�/\\|/g'

with your awk script in the middle of the pipeline as Scrutinizer suggested? How long did it take? Does it run so slowly that you can't complete the processing before you need to run it again?

It might be able to be done more efficiently in a more complex awk script. It might be able to be done more efficiently in a C program. But, unless you can explain to us why this simple solution won't work (other than you don't like it), there isn't much incentive for us to try to guess at something that might work faster when we don't know anything about the sizes of your files, the hardware you're using (disk types and how they're connected, operating system, CPU, memory, etc.) there is no way that can guess at a more efficient way to do what this simple pipeline does.

There are execution costs, and there are maintenance costs. The maintenance costs for simple code are much lower.