Replace a pattern in a file with a generated number using sed or awk

vivek_d_r · June 20, 2012, 1:37am

my file has thousands of line but let me show what i want to achieve... here is one line from that file

 
cat fileName.txt
(2,'','user3002,user3003','USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');

here you can observe that for

'user3002,user3003'

there is a comma inbetween single quotes... here instead of this comma i want to replace a varchar say Xox9870563xoX

we cant directly cut using this single quotes since there are others strings having single quotes... one way we could do is if there is any comma between ,' and ', then replace with that number.... is this achievable using sed or awk...?

the output should look something like below

 
cat fileName.txt
(2,'','user3002Xox9870563xoXuser3003','USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');

any help is deeply appreciated....

ygemici · June 20, 2012, 1:52am

try this

sed "s/'\([[:alnum:]][[:alnum:]]*\),\([[:alnum:]][[:alnum:]]*\)'/\1Xox9870563xoX\2/" infile

vivek_d_r · June 20, 2012, 2:04am

[jun19]# cat vkk
INSERT INTO `Table55` (`id`, `backupReason`, `backupScope`, `backupType`, `controllingZoneId`, `delaySeconds`, `immediateBackup`, `jobType`, `maxRetries`, `performingZoneId`, `recurring`, `requestingUserId`, `schemaDifference_blob_reserved`, `schemaDifference_reserved`, `scpKey`, `startDateTime`, `storageFile`, `storageType`, `zoneScope`, `lastModified`) VALUES (2,'','user3002,user3003','USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');
[jun19]#  sed "s/'\([[:alnum:]][[:alnum:]]*\),\([[:alnum:]][[:alnum:]]*\)'/\1Xox9870563xoX\2/"  vkk
INSERT INTO `Table55` (`id`, `backupReason`, `backupScope`, `backupType`, `controllingZoneId`, `delaySeconds`, `immediateBackup`, `jobType`, `maxRetries`, `performingZoneId`, `recurring`, `requestingUserId`, `schemaDifference_blob_reserved`, `schemaDifference_reserved`, `scpKey`, `startDateTime`, `storageFile`, `storageType`, `zoneScope`, `lastModified`) VALUES (2,'',user3002Xox9870563xoXuser3003,'USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');

the code partially works.. because its deleting the enclosing single quotes for

user3002Xox9870563xoXuser3003

ygemici · June 20, 2012, 2:10am

lets add the single quotes

sed "s/'\([[:alnum:]][[:alnum:]]*\),\([[:alnum:]][[:alnum:]]*\)'/'\1Xox9870563xoX\2'/" infile

vivek_d_r · June 20, 2012, 2:19am

thanks it working good

ctsgnb · June 20, 2012, 4:22am

If your file is always formatted the same way :

sed 's/,/Xox9870563xoX/3' yourfile

vivek_d_r · June 20, 2012, 4:35am

but wont that replace all comma's in the file.... i wanted to replace only if it comes between ,' ',

---------- Post updated at 02:05 PM ---------- Previous update was at 01:56 PM ----------

@ygemici

the code wont work for below scenario... how to modify the code to work for below scenario

(1,NULL,'FULL',1,0,'Immediate Backup,Database backup successfull,License backup successfull,Completed backup operation successfully',5,'COMPLETED','BACKUP',2,100,NULL,NULL,NULL,NULL,'2011-04-04 05:33:33');

there are multiple comma's between ,' and ',

sdf · June 20, 2012, 4:57am

Is this working? Changed the script to multiple patterns.

awk 'BEGIN{FS=OFS="'"}{for(i=1;i<=NF;i++) if($i ~ /user.*,user.*|Immediate Backup|whatever/) gsub(/,/,"Xox9870563xoX",$i)};1' infile >outfile

vivek_d_r · June 20, 2012, 6:21am

its not working some syntax problem i guess... with quotes

awk 'BEGIN{FS=OFS="'"}{for(i=1;i<=NF;i++) if($i ~ /user.*user/) sub(",","Xox9870563xoX",$i)};1'  vo
>
>
>

here vo is file name and whn i execute nothing happens..

sdf · June 20, 2012, 6:38am

Filename vo is not the problem. Try this code. Have changed sub to gsub

awk 'BEGIN{FS=OFS="'"}{for(i=1;i<=NF;i++) if($i ~ /user.*,user.*|Immediate Backup|whatever/) gsub(/,/,"Xox9870563xoX",$i)};1' vo

vivek_d_r · June 20, 2012, 6:43am

@sdf one more doubt are these

user.*,user.*|Immediate Backup|whatever/)

specific to file contents.... what if there are different names instead of "user"

it should be irrespective of any words in the file...

and still above code dint run... i am guessing my linux machine doesnt support gsub or sub... can it be this reason...?

ctsgnb · June 20, 2012, 9:03am

Nope, the sed statement i proposed (post #6 ) will only replace the third coma of every line.

That's why it supposes that the format of your file doesn't vary.

Just another one in another style:

sed "s/\([\'][^\'][^\']*\),\([^\'][^\']*\)/\1Xox9870563xoX\2/" yourfile

---------- Post updated at 03:03 PM ---------- Previous update was at 01:49 PM ----------

awk '{OFS=NR%2?",":"Xox9870563xoX";$1=$1}1' RS="\'" FS=,  ORS="\'" yourfile

# cat tst
a,b,'jklfds','a,b,a,c','f','g','c','c,d'
a,b,'jklfds','a,b,a,c','f','g','c','c,d'
# awk '{OFS=NR%2?",":"#";$1=$1}1' RS="\'" FS=,  ORS="\'" tst
a,b,'jklfds','a#b#a#c','f','g','c','c#d'
a,b,'jklfds','a#b#a#c','f','g','c','c#d'
'# awk '{OFS=NR%2?",":"Xox9870563xoX";$1=$1}1' RS="\'" FS=,  ORS="\'" tst
a,b,'jklfds','aXox9870563xoXbXox9870563xoXaXox9870563xoXc','f','g','c','cXox9870563xoXd'
a,b,'jklfds','aXox9870563xoXbXox9870563xoXaXox9870563xoXc','f','g','c','cXox9870563xoXd'
'#

Oops ... there is a trailing simple quote to remove ...

vivek_d_r · June 20, 2012, 9:20am

thanks will try this now.. i am getting a warning is this normal

awk '{OFS=NR%2?",":"Xox9870563xoX";$1=$1}1' RS="\'" FS=,  ORS="\'" vo
awk: cmd. line:1: warning: escape sequence `\'' treated as plain `''

---------- Post updated at 06:50 PM ---------- Previous update was at 06:39 PM ----------

thanks this code is working superb:-)

thanks a lot ctsgnd

ygemici · June 20, 2012, 10:06am

vivek d r:

but wont that replace all comma's in the file.... i wanted to replace only if it comes between ,' ',

---------- Post updated at 02:05 PM ---------- Previous update was at 01:56 PM ----------

@ygemici

the code wont work for below scenario... how to modify the code to work for below scenario
(1,NULL,'FULL',1,0,'Immediate Backup,Database backup successfull,License backup successfull,Completed backup operation successfully',5,'COMPLETED','BACKUP',2,100,NULL,NULL,NULL,NULL,'2011-04-04 05:33:33');
there are multiple comma's between ,' and ',

try this

# awk -F"'[,']" -vc="Xox9870563xoX" -vx="'" -vxx="," '{for(i=1;i<=NF;i++)if($i~x ".*" xx){l=split($i,a,xx x);;for(j=1;j<l;j++)
lx=lx?lx a[j]:a[j] xx x;;gsub(xx,c,a[l]);;$i=lx a[l] x xx;printf "%s",$i;lx=""}else if(i!=NF)printf "%s%s",$i,x xx;else printf "%s",$i}END{print ""}' file
(1,NULL,'FULL',1,0,'Immediate BackupXox9870563xoXDatabase backup successfullXox9870563xoXLicense backup successfullXox9870563xoXCompleted backup operation successfully',5,'COMPLETED','BACKUP',2,100,NULL,NULL,NULL,NULL,'2011-04-04 05:33:33');