Hi,
I have pipe separated file which contains some data having comma(,) in it. I want to remove the comma(,) only from particular column without changing data in other columns.
Below is the sample data file, I want to remove the comma(,) only from 5th column.
$ cat file1
ABC | DEF, HIJ| LMN| opq,| 12,345|XYZ
PQR,123 | LMN| RWX|pqr| 6,12|LMN
13,67| PQR,123 | LMN| RWX|7,89|XY
LMN,SDF|ABC,S|OPQ|SLM,SD|9,00|P
123,23|45,789|12,567.04|78,765|6,90|9,00,875
I tried using sed as below but, I am not able to figure out why the used command is removing comma(,) from only 5th column for the 1st 4 lines and not removing any comma(,) from last line.
$ sed 's/\(.*\),\(.*\)/\1\2/g' file1
ABC | DEF, HIJ| LMN| opq,| 12345|XYZ
PQR,123 | LMN| RWX|pqr| 612|LMN
13,67| PQR,123 | LMN| RWX|789|XY
LMN,SDF|ABC,S|OPQ|SLM,SD|900|P
123,23|45,789|12,567.04|78,765|6,90|9,00875
$
I also tried using awk as below but, it is not removing comma(,). To be frank, I did not understand why I have to use "\\1" in awk, because in sed as used above I can use \1 for representing groups. It would be helpful if somebody can explain the difference in using regex in sed and awk for grouping data.
$ awk -F"|" '{print gensub(/\(.*\),\(.*\)/,"\\1""\\2",g,$5);}' file1
12,345
6,12
7,89
9,00
6,90
I am thinking whether we could use sed inside awk for particular column, but not sure how to do it.
Expected output is as follows:
ABC | DEF, HIJ| LMN| opq,| 12345|XYZ
PQR,123 | LMN| RWX|pqr| 612|LMN
13,67| PQR,123 | LMN| RWX|789|XY
LMN,SDF|ABC,S|OPQ|SLM,SD|900|P
123,23|45,789|12,567.04|78,765|690|9,00,875
Thanks in advance for any help on this.