Grab text after pattern and replace

gpk_newbie · November 4, 2015, 8:14am

i have a file which contains data seperated by comma. i want to replace text after 3rd occurrence of a comma.
the input file looks like this

abcdef,11/02/2015 11:55:47,1001,1234567812345678,12364,,abc
abcdefg,11/02/2015 11:55:47,01,1234567812345678,123,,abc
abcdefhih,11/02/2015 11:55:47,1001,1234567812345678,1234,,abc
abcdef,11/02/2015 11:55:47,001,1234567812345678,1236487,,abc

i want the output to be like

abcdef,11/02/2015 11:55:47,1001,1234-5678-1234-5678,12364,,abc
abcdefg,11/02/2015 11:55:47,01,1234-5678-1234-5678,123,,abc
abcdefhih,11/02/2015 11:55:47,1001,1234-5678-1234-5678,1234,,abc
abcdef,11/02/2015 11:55:47,001,1234-5678-1234-5678,1236487,,abc

i was able to replace the text with "-" after 4 digits using awk and sed like below but not sure how to get it work within file and then redirect it to a different file

awk -F',' '{print $4}' file | sed -n -e "s_\(....\)\(....\)\(....\)\(....\)_\1-\2-\3-\4_p"

RavinderSingh13 · November 4, 2015, 9:03am

Hello gpk_newbie,

Could you please try following and let me know if this helps you.

awk -F, '{gsub(/..../,"&-",$4);sub(/\-$/,X,$4);} 1' OFS=,  Input_file

Output will be as follows.

abcdef,11/02/2015 11:55:47,1001,1234-5678-1234-5678,12364,,abc
abcdefg,11/02/2015 11:55:47,01,1234-5678-1234-5678,123,,abc
abcdefhih,11/02/2015 11:55:47,1001,1234-5678-1234-5678,1234,,abc
abcdef,11/02/2015 11:55:47,001,1234-5678-1234-5678,1236487,,abc

Thanks,
R. Singh

gpk_newbie · November 4, 2015, 9:15am

Hi RavinderSingh13

thanks a lot that worked as required. Can you pls explain on how it works.

RavinderSingh13 · November 4, 2015, 9:25am

Hello gpk_newbie,

Following may help you to understand the command.

awk -F,                           ########### Making comma(,) as a field seprator.
'{gsub(/..../,"&-",$4);           ########### using global subtitutaion to change any 4 chars/digits etc to their value with - by using (&-) in $4 as per your request
sub(/\-$/,X,$4);}                 ########### When we use global subtitutaion then it will place - at last of the string then I am removing it by sub which means only single time subtitutaion, That's the difference between global subtitutaion and single substitutaion in awk.
1' OFS=,  Input_file              ########### awk works on condition and then action, if condition is TRUE following action mentioned in it will be performed. So here I am making comndition TRUE by mentioning 1 and no action mentoined so awk will take default action which is print so it will print the complete line then.

Thanks,
R. Singh

gpk_newbie · November 4, 2015, 9:41am

thanks a lot RavinderSingh13