Replacing Comma delimiter coming inside the data.

Hello,

     I am having flat file \(Comma Delimiter\) and the data in the file is as given below.

EMPNO, ENAME, DESIGNATION, SALARY
10979, Arun Kumar, Cosultant, 35000
13555, Bidhu Shekar, Senior Consultant, 45000
15000, Kiran, Kumar, Senior, Consultant, 40000

           If you notice the 3rd record, there are Comma's presents in the column \(Column 2: Kiran, Kumar   Column 3: Senior, Consultant\). Please help me in fixing this problem? 

Thanks in Advance!!

what exactly needs to be done

You can see that it's a comma delimiter. It means each and every record is identified by a comma.
But if you notice the last record, The 2nd column is separated by 2 commas, and also 3rd column is separated by 2 commas which means it contains 6 columns. But actually we are having only 4 columns.
How can we supress the commas that are present inside the data of a column?

if you just want to suppress comma for this particular text, here's the command that works for me:

cat filename | sed 's/Kiran,/Kiran/g' | sed 's/Senior,/Senior/g'

1 Like

Try:

perl -F, -ape 's/(\d, \w+),/\1/ && s/, (\w+, \d+$)/ \1/ if $#F==5' file
1 Like

Thanks for the reply bartus11
Can we do the samething with SED or AWK command? If yes can you please give me the answer?

awk -F, 'NF==6{$0=gensub ("([0-9]+, [[:alpha:]]+),","\\1",1);$0=gensub (", ([[:alpha:]]+, [0-9]+$)"," \\1",1)}1' file
1 Like

Hi bartus11,

          Can you please explain what the above 'awk' is doing?

Thanks to all who replied to my question :slight_smile:

It is just the perl code translated to AWK ;). It removes the second comma from beginning, and second comma from the end of the line if number of fields is equal 6.

# cat tst
EMPNO, ENAME, DESIGNATION, SALARY
10979, Arun Kumar, Cosultant, 35000
13555, Bidhu Shekar, Senior Consultant, 45000
15000, Kiran, Kumar, Senior, Consultant, 40000
# sed '$s/,//4;$s/,//2' tst
EMPNO, ENAME, DESIGNATION, SALARY
10979, Arun Kumar, Cosultant, 35000
13555, Bidhu Shekar, Senior Consultant, 45000
15000, Kiran Kumar, Senior Consultant, 40000

But in my opinion you should care about formatting the file directly at generation step (SQL?) since - according to the example provided, there are no deterministic rule(**) for identifying which of the coma is superfulous (all the more if you have big files with many different values in field $2 and $3 and that those fields may contains some additionnal unexpected comas.)

(**)Example:

32134, Tom, Billy, Joe, Corporation, 90909

Could as well be :

32134, Tom Billy, Joe Corporation, 90909
32134, Tom, Billy Joe Corporation, 90909