Replace a pattern in a file with a generated number using sed or awk

my file has thousands of line but let me show what i want to achieve... here is one line from that file

 
cat fileName.txt
(2,'','user3002,user3003','USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');

here you can observe that for

'user3002,user3003'

there is a comma inbetween single quotes... here instead of this comma i want to replace a varchar say Xox9870563xoX

we cant directly cut using this single quotes since there are others strings having single quotes... one way we could do is if there is any comma between ,' and ', then replace with that number.... is this achievable using sed or awk...?

the output should look something like below

 
cat fileName.txt
(2,'','user3002Xox9870563xoXuser3003','USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');

any help is deeply appreciated....

try this

sed "s/'\([[:alnum:]][[:alnum:]]*\),\([[:alnum:]][[:alnum:]]*\)'/\1Xox9870563xoX\2/" infile
[jun19]# cat vkk
INSERT INTO `Table55` (`id`, `backupReason`, `backupScope`, `backupType`, `controllingZoneId`, `delaySeconds`, `immediateBackup`, `jobType`, `maxRetries`, `performingZoneId`, `recurring`, `requestingUserId`, `schemaDifference_blob_reserved`, `schemaDifference_reserved`, `scpKey`, `startDateTime`, `storageFile`, `storageType`, `zoneScope`, `lastModified`) VALUES (2,'','user3002,user3003','USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');
[jun19]#  sed "s/'\([[:alnum:]][[:alnum:]]*\),\([[:alnum:]][[:alnum:]]*\)'/\1Xox9870563xoX\2/"  vkk
INSERT INTO `Table55` (`id`, `backupReason`, `backupScope`, `backupType`, `controllingZoneId`, `delaySeconds`, `immediateBackup`, `jobType`, `maxRetries`, `performingZoneId`, `recurring`, `requestingUserId`, `schemaDifference_blob_reserved`, `schemaDifference_reserved`, `scpKey`, `startDateTime`, `storageFile`, `storageType`, `zoneScope`, `lastModified`) VALUES (2,'',user3002Xox9870563xoXuser3003,'USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');

the code partially works.. because its deleting the enclosing single quotes for

user3002Xox9870563xoXuser3003

lets add the single quotes

sed "s/'\([[:alnum:]][[:alnum:]]*\),\([[:alnum:]][[:alnum:]]*\)'/'\1Xox9870563xoX\2'/" infile

thanks it working good :slight_smile:

If your file is always formatted the same way :

sed 's/,/Xox9870563xoX/3' yourfile

but wont that replace all comma's in the file.... i wanted to replace only if it comes between ,' ',

---------- Post updated at 02:05 PM ---------- Previous update was at 01:56 PM ----------

@ygemici

the code wont work for below scenario... how to modify the code to work for below scenario

(1,NULL,'FULL',1,0,'Immediate Backup,Database backup successfull,License backup successfull,Completed backup operation successfully',5,'COMPLETED','BACKUP',2,100,NULL,NULL,NULL,NULL,'2011-04-04 05:33:33');

there are multiple comma's between ,' and ',

Is this working? Changed the script to multiple patterns.

awk 'BEGIN{FS=OFS="'"}{for(i=1;i<=NF;i++) if($i ~ /user.*,user.*|Immediate Backup|whatever/) gsub(/,/,"Xox9870563xoX",$i)};1' infile >outfile

its not working some syntax problem i guess... with quotes

awk 'BEGIN{FS=OFS="'"}{for(i=1;i<=NF;i++) if($i ~ /user.*user/) sub(",","Xox9870563xoX",$i)};1'  vo
>
>
>

here vo is file name and whn i execute nothing happens..

Filename vo is not the problem. Try this code. Have changed sub to gsub

awk 'BEGIN{FS=OFS="'"}{for(i=1;i<=NF;i++) if($i ~ /user.*,user.*|Immediate Backup|whatever/) gsub(/,/,"Xox9870563xoX",$i)};1' vo

@sdf one more doubt are these

user.*,user.*|Immediate Backup|whatever/)

specific to file contents.... what if there are different names instead of "user"

it should be irrespective of any words in the file... :frowning:

and still above code dint run... i am guessing my linux machine doesnt support gsub or sub... can it be this reason...?

Nope, the sed statement i proposed (post #6 ) will only replace the third coma of every line.

That's why it supposes that the format of your file doesn't vary.

Just another one in another style:

sed "s/\([\'][^\'][^\']*\),\([^\'][^\']*\)/\1Xox9870563xoX\2/" yourfile

---------- Post updated at 03:03 PM ---------- Previous update was at 01:49 PM ----------

awk '{OFS=NR%2?",":"Xox9870563xoX";$1=$1}1' RS="\'" FS=,  ORS="\'" yourfile
# cat tst
a,b,'jklfds','a,b,a,c','f','g','c','c,d'
a,b,'jklfds','a,b,a,c','f','g','c','c,d'
# awk '{OFS=NR%2?",":"#";$1=$1}1' RS="\'" FS=,  ORS="\'" tst
a,b,'jklfds','a#b#a#c','f','g','c','c#d'
a,b,'jklfds','a#b#a#c','f','g','c','c#d'
'# awk '{OFS=NR%2?",":"Xox9870563xoX";$1=$1}1' RS="\'" FS=,  ORS="\'" tst
a,b,'jklfds','aXox9870563xoXbXox9870563xoXaXox9870563xoXc','f','g','c','cXox9870563xoXd'
a,b,'jklfds','aXox9870563xoXbXox9870563xoXaXox9870563xoXc','f','g','c','cXox9870563xoXd'
'#

Oops ... there is a trailing simple quote to remove ...

1 Like

thanks will try this now.. i am getting a warning is this normal

awk '{OFS=NR%2?",":"Xox9870563xoX";$1=$1}1' RS="\'" FS=,  ORS="\'" vo
awk: cmd. line:1: warning: escape sequence `\'' treated as plain `''

---------- Post updated at 06:50 PM ---------- Previous update was at 06:39 PM ----------

thanks this code is working superb:-)

thanks a lot ctsgnd

try this

# awk -F"'[,']" -vc="Xox9870563xoX" -vx="'" -vxx="," '{for(i=1;i<=NF;i++)if($i~x ".*" xx){l=split($i,a,xx x);;for(j=1;j<l;j++)
lx=lx?lx a[j]:a[j] xx x;;gsub(xx,c,a[l]);;$i=lx a[l] x xx;printf "%s",$i;lx=""}else if(i!=NF)printf "%s%s",$i,x xx;else printf "%s",$i}END{print ""}' file
(1,NULL,'FULL',1,0,'Immediate BackupXox9870563xoXDatabase backup successfullXox9870563xoXLicense backup successfullXox9870563xoXCompleted backup operation successfully',5,'COMPLETED','BACKUP',2,100,NULL,NULL,NULL,NULL,'2011-04-04 05:33:33');