Replacing character "|" in given character range

Hi

I am having file :

1|2443094                  |FUNG SIU TO |CLEMENT
2|2443095                  |FUNG KIL FO |REMENT

This file contains only 3 fields delimeted by "|". Last field is a decsription filed and it contains character "|". Due to this my output if breaking in 4 fields. I need to replace the last "|" from description fields "FUNG SIU TO |CLEMENT" and make it "FUNG SIU TO _CLEMENT".

Can some one guide how to do this using AWK or Sed?

---------- Post updated at 02:07 AM ---------- Previous update was at 02:01 AM ----------

Length of the last field is also known and fixed.

start character count 29 and length 20 chars.

awk 'BEGIN{FS = OFS = "|"}
NF > 3 {for(i = 4; i <= NF; i++)
  {$3 = $3 "_" $i};
  NF = 3}1' file

---------- Post updated at 03:48 AM ---------- Previous update was at 03:21 AM ----------

perl solution

perl -lne '@A = split(/\|/, $_, 3);
  $A[2] =~ s/\|/_/g;
  print join("|", @A)' file

---------- Post updated at 03:48 AM ---------- Previous update was at 03:48 AM ----------

sed

sed 's/|/_/3g' file

Hi Srini

Awesome. But My actual case is little different.

Actual file may be like that :
1|24xx|x96 |wewewewewe|Aps (ueasTng) Ltd(00101|2500000)|001012561|558 |NYL|GB |G179300844|1012561038 |Orriva P|LC|GB |O718483442|Y

This is one record and here field no 2 (24xx|x96 ),field no 4(Aps (ueasTng) Ltd(00101|2500000)), and field no. 11(Orriva P|LC) will be having '|' appened. Fields in Red contains the '|'. Each field has fixed length.

---------- Post updated at 03:06 AM ---------- Previous update was at 03:05 AM ----------

Hi Srini

Awesome. But My actual case is little different.

Actual file may be like that :

1|24xx|x96 |wewewewewe|Aps (ueasTng) Ltd(00101|2500000)|001012561|558 |NYL|GB |G179300844|1012561038 |Orriva P|LC|GB |O718483442|Y

This is one record and here field no 2 (24xx|x96 ),field no 4(Aps (ueasTng) Ltd(00101|2500000)), and field no. 11(Orriva P|LC) will be having '|' appened. Fields in Red contains the '|'. Each field has fixed length.

Do you think this matches with your initial requirement?

Why couldn't you post this actual case previously.

1 Like

This is not a little different. There is a HUGE difference between changing all "|" characters after the first 3 on a line to "_" characters and changing an unknown number of "|" characters in the middle of a line to some other unspecified character(s).

What are the exact field widths for this new file format (or what is the format of the file that specifies the file format for the file(s) you want to process)? Are embedded "|" characters all supposed to be changed to "_", or is a different character used in some fields? Do all fields need to be checked? If not, how will your script know which fields should be checked?

What have you tried to solve this problem?

Hi

Here is the field description [

1|24xx|x96 |wewe|Aps (ueasTng) Ltd(00101|2500000)|001012561|558 |NYL|GB |G179300844|1012561038 |Orriva P|LC|GB |O718483442|Y
 
 
Field   Length    value 
 
1             1           1
 
2             4           24xx
 
3             3           x96
 
4             4           wewe 
 
5             33         Aps (ueasTng) Ltd(00101|2500000)    
 
6              9              001012561
 
7            3              558
 
8              3              NYL
 
9            10            G179300844
 
10         10             1012561038
 
11         14             Orriva P|LC|GB
 
12          15             O718483442
 
13          1               Y
_________________________________________________

Field seperator is '|'. i need to replace all '|' in fields 5th and 11th.

I can do this using substr function like

str1=substr() ----Contains fields 1 to 4
str2=substr ()----Contains field 5 (Using Sub function to replace | to _) 
str3=substr ()----Contains fields 6 to 10
str4=substr ()----Contains fields 11  (Using Sub function to replace | to _) 
str5=substr ()----Contains fields 12 to 13

Finally joining all these ..However I am looking for better approach to do this.
Please let me know if I am clear now.

Sorry, that description doesn't help either. You say the third field is 3 chars long, but your line holds 4 chars: X96 . Same for field 7. And between field 8 and 9 an entire field is missing: GB in your line is not reflected. So which one should we rely on?
BTW - can't you address the problem at the root and persuade the generating application to use different field separators?

In addition to what RudiC said in the previous post, it is also interesting to note that the "|" between fields 2 and 3 was described in post #3 in this thread as an embedded character (that needed to be changed) in the 2nd field rather than a separator between fields.

The problem described in the original posting has been solved long ago.

The new problem has been inconsistently and incompletely described.

This thread is closed.