Regex to Parse data

Experts and Informed folks,

Need some help here in parsing the log file.

1389675	Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2) VALUES (1, 'TEST1'), (2,'TEST2');
1389685	Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3) VALUES (9,'Siz12, FormFit', 'Test');

Now, I need to print the rest of line ie., from INSERT INTO �. till ";" which is the end of the line. This part is easy and all I have to do is -

cat file | sed 's/) VALUES/,NEWCOLUMN) VALUES/g' |  sed 's/.*INSERT//g' & that will print it. 

However the trickiest part in here is, I need to move the 1st column as the last "insertable column" inside the values clause. So, what I would necessarily want to see as final output would be as below -

INSERT INTO Opera_ShirtCatalog(COL1, COL2, NEWCOLUMN) VALUES (1, 'TEST1', 1389675), (2,'TEST2', 1389675);
INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3, NEWCOLUMN) VALUES (9,'Siz12, FormFit', 'Test', 1389685);

I cant seem to come up a regex that would not only mark the value but also move it inside just before the closing paranthesis.

So to make it clearer,

Input file -

1389675	Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2) VALUES (1, 'TEST1'), (2,'TEST2');
1389685	Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3) VALUES (9,'Siz12, FormFit', 'Test');

Output Expected -

INSERT INTO Opera_ShirtCatalog(COL1, COL2, NEWCOLUMN) VALUES (1, 'TEST1', 1389675), (2,'TEST2', 1389675);
INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3, NEWCOLUMN) VALUES (9,'Siz12, FormFit', 'Test', 1389685);

Any and all help is duly appreciated.

Using awk:

awk '{
   gsub(/)/, ", "$1")");
   gsub(", "$1") VALUES", ", NEWCOLUMN) VALUES");
}
gsub("^"$1".* INSERT","INSERT")' file

Chubler_XL,

At the outset, let me express my heartfelt thanks in addressing my issue.

Unfortunately this is erroring out as follows & I tried to muck with it but I dont understand what you wrote in here so, I am a bit helpless. I am running this on bash if that provides any context in here....

/Users/ManoharChandran 07:49:39 $cat NEWOUTPUT | grep 'INSERT' | cut  -f 5- | awk '{print $2=$3=$4=""}1' | awk '{
gsub(/)/, ", "$1")");
gsub(", "$1") VALUES", ", NEWCOLUMN) VALUES");
}
gsub("^"$1".* INSERT","INSERT")'
awk: illegal primary in regular expression ) at 
 source line number 2
 context is
    gsub(/)/, ", >>>  "$1")") <<< 
/Users/ManoharChandran 07:50:04 $

Might have to escape the ) characters:

awk '{
gsub(/\)/, ", "$1")");
gsub(", "$1"\) VALUES", ", NEWCOLUMN) VALUES");
}
gsub("^"$1".* INSERT","INSERT")'

Chubler_XL,

Sorry, even escaping the brackets didnt do the trick ... here it is

/Users/ManoharChandran 10:33:43 $cat NEWOUTPUT | grep 'INSERT' | cut  -f 5- | awk '{print $2=$3=$4=""}1' | awk '{
> gsub(/\)/, ", "$1")");
> gsub(", "$1"\) VALUES", ", NEWCOLUMN) VALUES");
> }
> gsub("^"$1".* INSERT","INSERT")'
awk: syntax error in regular expression , ) VALUES at  VALUES
 input record number 1, file 
 source line number 3

Please advise !

regards,
Manohar.

Something on these lines:

awk -F\( 'BEGIN{OFS=FS} {for(i=2;i<=NF;i++) sub(/)/,", NEWCOLUMN)",$i);print}' inputfile

elixir_sinari,

Thanks for taking time and addressing my issue.

I dont like to say this but it is still not working. I am not sure as to why.

regards,
Manohar.

/Users/ManoharChandran 14:07:33 $cat NEWOUTPUT | grep 'INSERT' | cut  -f 5- | awk '{print $2=$3=$4=""}1' | awk -F\( 'BEGIN{OFS=FS} {for(i=2;i<=NF;i++) sub(/)/,", NEWCOLUMN)",$i);print}' 
awk: illegal primary in regular expression ) at 
 source line number 1
 context is
    BEGIN{OFS=FS} {for(i=2;i<=NF;i++) sub(/)/,", >>>  NEWCOLUMN)",$i) <<<

Looks like a broken regular expression parser. ) should not be treated as a special character in an extended regular expression unless it occurs outside a bracket expression in conjunction with a preceeding ( that is also outside a bracket expression.

If this fails ...

awk 'BEGIN {s ~ /)/}'

but this succeeds ...

awk 'BEGIN {s ~ /\)/}'

... then I'm almost certainly correct.

\) is technically an undefined sequence, but that workaround has been known to work with some buggy implementations.

Regards,
Alister

Hi Everyone,

For some strange reason, I cant seem to get an answer on a topic that pales in comparison with some of the tough questions that have been posed here.

At any rate, after mucking with what was posted here by elixir_sinari here is what I got.

/Users/ManoharChandran 21:36:21 $cat ANOTHER | awk -F\( 'BEGIN{OFS=FS} {for(i=2;i<=NF;i++) sub(/\)/,", NEWCOLUMN)",$i);print}'
1389675 Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2, NEWCOLUMN) VALUES (1, 'TEST1', NEWCOLUMN), (2,'TEST2', NEWCOLUMN);
1389685 Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3, NEWCOLUMN) VALUES (9,'Siz12, FormFit', 'Test', NEWCOLUMN);

Now, the LAST REMAINING ISSUE is I dont want the NEWCOLUMN in the values, I want the first value to be moved as "VALUE" to be inserted ... essentially what I need is below -

/Users/ManoharChandran 21:36:31 $cat ANOTHER | awk -F\( 'BEGIN{OFS=FS} {for(i=1;i<=NF;i++) sub(/\)/,", NEWCOLUMN)",$i);print}' 	
Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2, NEWCOLUMN) VALUES (1, 'TEST1', 1389675), (2,'TEST2', 1389675);	
Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3, NEWCOLUMN) VALUES (9,'Siz12, FormFit', 'Test', 1389685);

NOT

/Users/ManoharChandran 21:36:31 $cat ANOTHER | awk -F\( 'BEGIN{OFS=FS} {for(i=1;i<=NF;i++) sub(/\)/,", NEWCOLUMN)",$i);print}'
1389675 Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2, NEWCOLUMN) VALUES (1, 'TEST1', NEWCOLUMN), (2,'TEST2', NEWCOLUMN);
1389685 Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3, NEWCOLUMN) VALUES (9,'Siz12, FormFit', 'Test', NEWCOLUMN);

If someone can push this over this last bump, I would be truly grateful.

Please advise !

regards,
Manohar.

I had deliberately used "NEWCOLUMN" instead of any value as I didn't know the logic behind inserting those particular values (and still don't know it :))...

I presumed that you understood what needs to be done in here ... Anyway, let me rehash as to what is needed .... sorry if this wasnt clear to begin with...

Lets look at first ROW, the first column, 1389675 is the value of the NEWCOLUMN and as such, it has to be inserted in the table ... well let me show you clearly as to what is the input and what is the expected output....

INPUT-

1389675 Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2) VALUES (1, 'TEST1'), (2,'TEST2');
1389685 Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3) VALUES (9,'Siz12, FormFit', 'Test');

OUTPUT -

INSERT INTO Opera_ShirtCatalog(COL1, COL2, NEWCOLUMN) VALUES (1, 'TEST1', 1389675), (2,'TEST2', 1389675);
INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3, NEWCOLUMN) VALUES (9,'Siz12, FormFit', 'Test', 1389685);

I saved and viewed this couple of times to make sure that necessary values are properly highlighted and please bear with me on this as I m new to this edit / highlighting ... If something is not clear, please let me know and I shall respond right away...

Once again thank you Elixir_sinari.

I am sorry for not seeing the value. I thought it was just another part of the line..:o

Anyway, this should work:

awk -F\( 'BEGIN{OFS=FS} {split($1,a," ");sub(/.*/,"INSERT INTO "a[2],$1);for(i=2;i<=NF;i++)i==2?sub(/)/,", NEWCOLUMN)",$i):sub(/)/,", "a[1]")",$i);print}'  inputfile

In any case, Chubler_XL's solution is much simpler and better than this. I'd rather use his solution for your purpose than this one.

1 Like

elixir_sinari,

Can u please explain this to me as to what exactly are you doing here as this bit as this is working finally....

Now, the bigger question...why cant we do this using SED ? I got it working up to the last portion as below -

/Users/ManoharChandran 00:37:37 $cat ANOTHER | grep 'INSER' | sed 's/) VALUES/,NEWCOLUMN) VALUES/g'
1389675	Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2,NEWCOLUMN) VALUES (1, 'TEST1'), (2,'TEST2');
1389685	Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3,NEWCOLUMN) VALUES (9,'Siz12, FormFit', 'Test');

The only pending thing in the above statement is mark up the first set of numbers and move them inside the closing brackett ... can u please come up with something on that line as that is what I need. Also, that is the reason as to why I titled it as REGEX help coz I couldnt come up with the mover for the closing bracket when there are more than one closing brackets

I understand as I did not get what you are doing inside the awk statement in here ...

Btw, thanks a lot for your help !

regards,
Manohar.