sed or awk command to replace a string pattern with another string based on position of this string

vivek_d_r · June 14, 2012, 5:47am

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb

 
cat dump.sql
 
INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10 13:00:00'),(2,'dsfsd','2000-04-06 00:00:00','Updated','2011-02-10 13:00:00'),(3,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10 13:00:00')
.
.
.
INSERT INTO `table2` (`id`, `action`, `date`, `lastModified`, `dynamics`) VALUES (1,'Change','2011-05-05 00:00:00','2012-02-10 13:00:00',null),(2,'dsfsd','2000-04-06 00:00:00','2011-02-10 13:00:00',null),(3,'Change','2011-05-05 00:00:00','2012-02-10 13:00:00',123)
.
 
.
.many tables

the above are just sample data... here some table lastModified field can be at end or somewhere in between.... what i want to achieve is replace all corresponding lastModifed parameter with '0000-0000 00:00:00' instead of exisiting values. i can make this by directly updating it in database but i want to modify it directly it in this dump file rather than updating it in database first and then taking the dump. so is this feasible..? the logic should happen like this it should compare from one "INSERT INTO" to next INSERT INTO and between this from find the position of last modified and replace it with 000's.. but a table can have many timestamp parameter. but the command or script should modify only the corresponding lastModified column. any help would be appreciated.

cabrao · June 14, 2012, 6:07am

I'm not sure if I understood your problem, but is it this what you're looking for?

awk -F, -vQ="'" '/^INSERT INTO/&&$8=$13=$18=Q"0000-0000 00:00:00"Q' OFS=, file

vivek_d_r · June 14, 2012, 6:16am

let me explain once again..

cat dump.sql
 
INSERT INTO `table1` 
(`id`, `action`, `date`, `descrip`, `lastModified`) VALUES
(1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10 13:00:00'),
(2,'dsfsd','2000-04-06 00:00:00','Updated','2011-02-10 13:00:00'),
(3,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10 13:00:00');
.
.
.
INSERT INTO `table2` (`id`, `action`, `date`, `lastModified`, `dynamics`) VALUES 
(1,'Change','2011-05-05 00:00:00','2012-02-10 13:00:00',null),
(2,'dsfsd','2000-04-06 00:00:00','2011-02-10 13:00:00',null),
(3,'Change','2011-05-05 00:00:00','2012-02-10 13:00:00',123);
.
 
.
.many tables

here consider Table1
lastModified column is at the last field in table. so REPLACE all last field before every closing bracket with this '0000-00-00 00:00:00' with single quotes till you end up with ';' which marks the end of that table query. if lastModified field comes any position in filed sequence as shown in Table2 where the lastModified field comes one position before last. so replace corresponding fields after this till you hit ';' which marks the end of that table.

output should look like below

cat dump.sql
 
INSERT INTO `table1` 
(`id`, `action`, `date`, `descrip`, `lastModified`) VALUES
(1,'Change','2011-05-05 00:00:00','Account Updated','0000-00-00 00:00:00'),
(2,'dsfsd','2000-04-06 00:00:00','Updated','0000-00-00 00:00:00'),
(3,'Change','2011-05-05 00:00:00','Account Updated','0000-00-00 00:00:00');
.
.
.
INSERT INTO `table2` (`id`, `action`, `date`, `lastModified`, `dynamics`) VALUES 
(1,'Change','2011-05-05 00:00:00','0000-00-00 00:00:00',null),
(2,'dsfsd','2000-04-06 00:00:00','0000-00-00 00:00:00',null),
(3,'Change','2011-05-05 00:00:00','0000-00-00 00:00:00',123);
.
 
.
.many tables

guruprasadpr · June 14, 2012, 7:40am

Hi

$ cat file
INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES
(1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10 13:00:00'),
(2,'dsfsd','2000-04-06 00:00:00','Updated','2011-02-10 13:00:00'),
(3,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10 13:00:00');
INSERT INTO `table2` (`id`, `action`, `date`, `lastModified`, `dynamics`) VALUES
(1,'Change','2011-05-05 00:00:00','2012-02-10 13:00:00',null),
(2,'dsfsd','2000-04-06 00:00:00','2011-02-10 13:00:00',null),
(3,'Change','2011-05-05 00:00:00','2012-02-10 13:00:00',123);

Output:

$ awk -F, '/INSERT/{for(i=1;i<=NF;i++){if ($i ~ /lastModified/){x=i;print;getline;break;}}}{y="0000-0000 00:00:00";sub(/[-0-9: ]+/,y,$x);}1'  OFS=, file
INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES
(1,'Change','2011-05-05 00:00:00','Account Updated','0000-0000 00:00:00'),
(2,'dsfsd','2000-04-06 00:00:00','Updated','0000-0000 00:00:00'),
(3,'Change','2011-05-05 00:00:00','Account Updated','0000-0000 00:00:00');
INSERT INTO `table2` (`id`, `action`, `date`, `lastModified`, `dynamics`) VALUES
(1,'Change','2011-05-05 00:00:00','0000-0000 00:00:00',null),
(2,'dsfsd','2000-04-06 00:00:00','0000-0000 00:00:00',null),
(3,'Change','2011-05-05 00:00:00','0000-0000 00:00:00',123);

Guru.

vivek_d_r · June 14, 2012, 7:50am

totally awesome... a million thanks guruprasad...
if you have time can you please explain how this command works?

---------- Post updated at 05:20 PM ---------- Previous update was at 05:16 PM ----------

is the command 'number of column' specific..? cause just now i tried with a big file and it dint work

the above wont work for

INSERT INTO `Table44` (`id`, `action`, `date`, `description`, `modifiedId`, `object`, `performedById`, `schemaDifference_blob_reserved`, `schemaDifference_reserved`, `sourceIPAddress`, `zoneId`, `lastModified`) VALUES 
(1,'Change','2011-05-05 00:00:00','User Account Updated',3,'Account',1,NULL,NULL,'',1,'2012-06-09 13:56:24');

guruprasadpr · June 14, 2012, 7:56am

When I ran the same command for the file specified, it is coming properly.

$ awk -F, '/INSERT/{for(i=1;i<=NF;i++){if ($i ~ /lastModified/){x=i;print;getline;break;}}}{y="0000-0000 00:00:00";sub(/[-0-9: ]+/,y,$x);}1'  OFS=, file
INSERT INTO `Table44` (`id`, `action`, `date`, `description`, `modifiedId`, `object`, `performedById`, `schemaDifference_blob_reserved`, `schemaDifference_reserved`, `sourceIPAddress`, `zoneId`, `lastModified`) VALUES
(1,'Change','2011-05-05 00:00:00','User Account Updated',3,'Account',1,NULL,NULL,'',1,'0000-0000 00:00:00');

Make sure your file should be in the format as you had specified in the sample earlier like:

INSERT INTO . ............ VALUES
(.....................),
(.....................);

not all in the same line.

vivek_d_r · June 14, 2012, 8:04am

oh.... hmmmm the file is very huge with thousands of records.... all of them are in single line.... what to do now...
i gave the sample file to match the columns and show the format accordingly.. the original lines are as shown in my first post in the thread

is there any way so that i could put a next line after every ") Values" and "),(" strings in whole file...
or is there any way to modify the command which you gave?

guruprasadpr · June 14, 2012, 8:16am

Hi

$ sed -e 's/VALUES */&\n/;s/),/&\n/g' file |
> awk -F, '/INSERT/{for(i=1;i<=NF;i++){if ($i ~ /lastModified/){x=i;print;getline;break;}}}{y="0000-0000 00:00:00";sub(/[-0-9: ]+/,y,$x);}1'  OFS=,

elixir_sinari · June 14, 2012, 8:30am

Try this... a bit ugly though...

awk -F"(" 'BEGIN{OFS=FS} {
split($2,t,", ");
for(i in t)
{
 if(t ~ /lastModified/) 
  posit=i
}
for(i=3;i<=NF;i+=1)
{
 n=split($i,s,",")
 sub(/[-0-9: ]+/,"0000-00-00 00:00:00",s[posit])
 temp=""
 for(j=1;j<=n;j++)
 {
  if(j!=1)
   temp=temp","s[j]
  else
   temp=s[j]
 }
 $i=temp
}
print
}' inputfile

vivek_d_r · June 14, 2012, 8:38am

thanks elixir you code works awesome too.. thanks a lot

vivek_d_r · June 19, 2012, 9:35am

the code is not working for below table

INSERT INTO `table66` (`id`, `backupReason`, `backupScope`, `backupType`, `controllingZoneId`, `delaySeconds`, `immediateBackup`, `jobType`, `maxRetries`, `performingZoneId`, `recurring`, `requestingUserId`, `schemaDifference_blob_reserved`, `schemaDifference_reserved`, `scpKey`, `startDateTime`, `storageFile`, `storageType`, `zoneScope`, `lastModified`) VALUES (2,'','user3002,user3003','USER_DATA_SINGLE',1,0,0,'BACKUP',2,NULL,0,450,NULL,NULL,'','2011-05-10 09:22:06','/var/opt/ipc/service_backup_restore','LOCAL',NULL,'2012-06-16 08:12:24');

how to fix this...