How to search & compare paragraphs between two files

NARESH1302 · August 7, 2009, 3:16am

Hello Guys, Greetings to All.

I am stuck in my work here today while trying to comapre paragraphs between two files, I need your help on urgent basis, without your inputs I can not proceed. Kindly find some time to answer my question, I'll be grateful to you for ever. My detailed issue is as follows-

I have extracted DDLs of some tables from Production server (saved in PROD.log file), and DDLs of same tables from DEV server (saved in DEV.log file).

The snippet of the contents of these files like this-

$ cat PROD.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

$ cat DEV.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
         DISTRIBUTE BY HASH("SALARY")
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

You can cleary notice, both the files have DDLs of two tables, TABLE1 & TABLE2. Out of these two, TABLE1 DDL is not same but TABLE2 DDL is same in both the files.
My requirement is- I need to write a shell script which will compare both the files PROD.log and Dev.log and will give me the output of whole DDL paragraph which is not matching/existing in other file, the output should look like this-

CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

Guys please reply soon, I'll be waiting for your replies eagerly.

Thank you very much.
Naresh

Franklin52 · August 7, 2009, 3:57am

Try this:

awk 'NR==FNR{a[$3]=$0;next}
a[$3] && a[$3]!=$0 {print $0 RS}
' RS=";" DEV.log PROD.log

Regards

NARESH1302 · August 7, 2009, 4:23am

Wow !! the script is working fine, Thank you very mcuh Franklin for your timely help. Many many thanks...

Just a small concern, though few paragraphs are same in text content, but in other file DEV.log the same para have few more blankspaces at the start and also in between texts for which I am not getting the desired output. The situation is like this-

$cat PROD.log
CREATE TABLE "HELLO"."TABLE2" (
"NAME" VARCHAR(18) NOT NULL ,
"AGE" DECIMAL(18,0) NOT NULL )
IN "DAT1" INDEX IN "IDX1" ;

$cat DEV.log
CREATE TABLE "HELLO"."TABLE2" (
"NAME" VARCHAR(18) NOT NULL ,
"AGE" DECIMAL(18,0) NOT NULL )
IN "DAT1" INDEX IN "IDX1" ;

Because of these blank spaces, these paragraphs are also getting listed in output which should not be. Any workaround for this? I tried to manipulate the code, but all in vain.

Thanks,
Naresh

thanhdat · August 7, 2009, 5:25am

Maybe u want to remove leading whitespace before front of each line:

sed 's/^[ \t]*//' PROD.log > PROD.tmp
sed 's/^[ \t]*//' DEV.log > DEV.tmp

and compare these new files

awk 'BEGIN{RS=";"} NR==FNR{a[$0]} !($0 in a) {print $0 RS}' DEV.tmp PROD.tmp

ranjithpr · August 7, 2009, 5:35am

Hope this will be helpful for you.

$ cat PROD.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
               IN "DAT1" INDEX IN "IDX1" ;
$ cat DEV.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           DISTRIBUTE BY HASH("SALARY")
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;
$ awk '{$1=$1} NR==FNR {a[$3]=$0;next} a[$3] && a[$3]!=$0 {print $0 RS} ' RS=";" DEV.log PROD.log
CREATE TABLE "HELLO"."TABLE1" ( "SALARY" DECIMAL(18,0) NOT NULL , "JOB" DECIMAL(18,0) NOT NULL ) IN "DAT1" INDEX IN "IDX1";
$

NARESH1302 · August 7, 2009, 5:49am

Thanks All for your replies

Thanhdat, I am not only trying to remove first tab lines, but also want to remove the blank spaces between text. My script should recognize them as one blankspaces instead of many. For ex-

$ cat file
CREATE TABLE "HELLO"."TABLE1" <b><b><b> ("SALARY" DECIMAL(18,0) NOT NULL )

Output should be like this-
CREATE TABLE "HELLO"."TABLE1"<b>("SALARY" DECIMAL(18,0) NOT NULL)

Please do suggest.

thanhdat · August 7, 2009, 7:18am

so, you can change the sed command to:

sed 's/^[ \t]*//;s/[ \t]\{1,\}/ /g' yourfile > newfile

Franklin52 · August 7, 2009, 8:26am

Try this:

awk 'NR==FNR{ 
  gsub(/^[ \t]+/, "")
  gsub(/[ \t]+$/, "")
  gsub(/[ ]+/," ")
  a[$3]=$0
  next
}
{  
  gsub(/^[ \t]+/, "")
  gsub(/[ \t]+$/, "")
  gsub(/[ ]+/," ")
}
a[$3] && a[$3]!=$0 {
  print $0 RS
}
' RS=";" DEV.log PROD.log

Regards

NARESH1302 · August 7, 2009, 9:30am

Hi Franklin,

Many many thanks for finding some time to clear my doubts. Your first command worked very fine, this cmd is giving little bit different than my desired output. Still I will try to work on this, will let you know if I could resove the issue.

Can I have your direct contact please if you won't mind?

Thanks,
Naresh

Franklin52 · August 7, 2009, 9:59am

That's against rule 10 of the forum rules:

(10) Don't post your email address and ask for an email reply. Don't send a private message with a technical question. The forums are for the benefit of all, so all Q&A should take place in the forums.

Regards

NARESH1302 · August 7, 2009, 10:04am

Oh! I am sorry, did not know that. I joined this forum today only, and I am very glad to join this forum. Today itself I could clear most of my doubts. Thanks to all who replied to my queries, also to those who are participating in this forum, though actively or passively.