Proper Column wise matching

My below code works fine if none of the columns has pipe as its content in it, If it has the pipe in any of the content then the value moves to the next column.

I wanted my code to work fine even if the column has pipe in it apart from the delimiter.

NOTE : If there is a pipe in the content apart from the delimiter it is been escaped by \(backslash)

#set -x
awk  '
NR==1 {for (cc=1; cc<=NF; cc++) n[$cc]=$cc; t=$0; next;}
{
   if ($1 != '0') c[1]++;
   for (i=2; i<=NF; i++) if ($i != "NA" && $i != "null" && $i != "") c++;
}
END {
   print t;
   --NR
   r="";
   for (i=1 ; i<cc; i++) {
      p=(c/NR)*100;
      r=(i == 1) ? "" p : r OFS p;
   }
   print r
}
' FS="|" OFS="|" $1

[/CODE]

Not sure I understand what you are up to. How about a decent input sample, the desired result, and the logics connecting them?

To ignore escaped delimiters, replace them by a token upfront, work on the modified file, and then reverse the replacement.

[sdp@blr-qe101 .nikhil]$ sh filler.sh c10.txt 
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
100|100|100|100
[sdp@blr-qe101 .nikhil]$ sh filler.sh 10.txt 
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
100|100|100|100
cat 10.txt 
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
076679010|WALMART|Walmart|TP
2242937867|PUBLIX SUPER MARKETS INC|Publix Super Markets|TP
100441566|CHICK-FIL-A|Chick|jacke|TP
1000549208|BURLINGTON - BURLINGTON COAT FACTORY|Burlington Coat Factory|TP
1000146040284|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000146428873|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000539406|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
10005847326|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
100056070|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
[sdp@blr-qe101 .nikhil]$ cat c10.txt  
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
076679010|WALMART|Walmart|TP
2242937867|PUBLIX SUPER MARKETS INC|Publix Super Markets|TP
100441566|CHICK-\|FIL-A|Chick||TP
1000549208|BURLINGTON - BURLINGTON COAT FACTORY|Burlington Coat Factory|TP
1000146040284|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000146428873|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000539406|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
10005847326|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
100056070|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP

---------- Post updated 08-12-16 at 01:57 PM ---------- Previous update was 08-11-16 at 06:48 PM ----------

any one can plz help? In the above content, If u observe the BOLD one, You would realise that there is a extra pipe in it.

My query here is, If there is a extra pipe with the backslash (\|) It should be ignored not considered as the next column

Did you try the hint given?

Rudi,

It is a huge file of some 8 GB's, the prob is we have constraint of space.. Hence can't try...

Uhm where is the problem to try it with a short example like you have already given? It comes to principle about the problem, not to process an 8GB file...

Zaxxon,

I'll try implementing if u give the solution for small file as well.
Plz help