Large File masking incorrectly happening Ç delimeter issue

The OS version is
Red Hat Enterprise Linux Server release 6.10
I have a script to mask some columns with **** in a data file which is delimeted with � ,
I am using awk for the masking , when I try to mask a small file the awk works fine and masks the required column ,
but when the file is large the masked file gets appended with �<96><92> and the masking does not happen properly.

the data in the files is as below

"D"�"20181224"�183593739656��"C"��865�"Test TEST"���������"1262548446"���"CLIENT"�"Y"�������"009171562000"��XXX���4�"Status Not Known"�2738000.000000000000000��"SSS"����2843382.526000000000000�����0.050000000�������"912810QU51"����"SS"�"SSSSSS"���XXX���"99991231"������XXX����������������"531648568"�19��"31648568"��"PARTY"�"1648568"�"4"�"COMB"�"D2792331"�"D2812619"

the script is as below

 columnArray= (30,61)
for i in "${columnArray[@]}"
do
echo "replacing values for column number  $i"
position=$i
awk -v col="$position" -v var=$replaceval -F "�" 'BEGIN {OFS = FS} NR==1; NR > 1 && NR < '$file_count' {$col = var; print}; END{print}'  "$filename" > "$filename_pre"_masked."$filename_ext"
chmod 775 "$filename_pre"_masked."$filename_ext"
 mv "$filename_pre"_masked."$filename_ext" "$filename"
 
 done

I have attached the complete sh file for refrence.
Any quick help appreciated.

Moving thread from "how to post..." to appropriate forum
To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

Sure? The variables "replaceval", "file_count", "$filename_pre", "$filename_ext", and "$filename" seem undefined.

For your problem:

  • any error messages?

  • for how many lines does the script work correctly? When do the errors start? Any structural difference between the last line working and the first one failing?

  • that "�<96><92>" is appended where?

  • "does not happen properly" means what? partial replacemant? No replacement?

Sure? The variables "replaceval", "file_count", "$filename_pre", "$filename_ext", and "$filename" seem undefined.
.......

Please find my answers below

Sure? The variables "

replaceval", "file_count", "$filename_pre", "$filename_ext", and "$filename" 

seem undefined.

yes they are working

For your problem:

  • any error messages?
    No

  • for how many lines does the script work correctly?
    Its a delimiter issue for few lines also its causing the issue
    When do the errors start? No error Messages
    Any structural difference between the last line working and the first one failing? No

  • that "�<96><92>" is appended where? end of each line

  • "does not happen properly" means what? partial replacemant? No replacement? No replacement

However
when I ran the below command on my data file my script works fine

iconv -f ISO-8859-1 -t UTF-8 testact.data_orig.data >testact.data 

Can you explain why its working after running this command on my file.

In ISO 8859, � is the single byte E7, not a multibyte sequence. Garbage in, garbage out.

Is there any other way without running the command

iconv -f ISO-8859-1 -t UTF-8 testact.data_orig.data >testact.data

to resolve this issue.

Using the ISO-8859-1 field separator of -F$'\xE7' perhaps?

That is bash syntax, other shells might need to do VAR=$(printf "\xe7") or the like to get that byte into a string so you can do -F"$VAR"

Or convince whatever's generating these files to use your preferred character set.