Compare values in two files. For matching rows print corresponding values from File 1 in File2.

  • I have two files (File 1 and File 2) and the contents of the files are mentioned below.

  • I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2.

  • I tried to modify and use the solution provided in the forum for a similar problem, but did not get the required result.

awk 'NR==FNR{A[$1]=$2}A[$3]{sub($3,A[$3]);print}' file2 file1 
=>cat File1.out
19296   RPRRYM
19223   BRIJAA3
19222   ARAFAY
19439   LRATAG
19227   HRAEAV
19154   ARADA1
19226   FRAJAX
19225   SRAJAS
19369   OCRILL
19224   ARPJAD
82395   IRKJAY
 
 
=>cat File2.out
19296       WAITING              04/27/2012-10:05:15   DWYGSTC         -              OPEN                  SELECT EMPID,EMPNAME,EMPADD
                                                                                                                                 FROM EMPLOYEE
                                                                                                                               WHERE EMPID=101
19223       WAITING              04/27/2012-13:00:00   SARMLTE         -              NONE                  -
19222       WAITING              04/27/2012-13:20:42   SMBRMTSE        -            NONE                  -
19439       WAITING              04/27/2012-13:13:51   ZMDRTVEL        -              STATIC_ROLLBAK        -
19445       EXECUTING           04/27/2012-13:23:06   SARMLTE         -              NONE                  -
19227       WAITING              04/27/2012-11:47:17   KKAYAL          -                EXECUTE               INSERT INTO ADDRESS (SELECT CITY, STATE, COUNTRY
                                                                                                            FROM EMP_ADDRESS
                                                                                                            WHERE BIRTHCITY='XYZ'
19154       WAITING              04/27/2012-13:22:31   SARMLTET        -              NONE                  -
19226       WAITING              04/27/2012-13:00:00   VKAZA1          -              NONE                  -
19225       CONNECTED          04/27/2012-08:20:35   BMCRTSEL        -                FETCH                 SELECT COUNT(1) FROM COUNTRY WITH UR
19369       WAITING              04/27/2012-11:35:40   GCZILLG         -                STATIC_COMMIT         -
19224       WAITING              04/27/2012-08:20:35   SMQRTQEL        -              NONE                  -
82395       WAITING              04/27/2012-13:21:14   JSARTAEL        -              PREP_COMMIT           -
 
 
 
Required Result:
 
19296       WAITING              04/27/2012-10:05:15   DWYGSTC         RPRRYM     OPEN                  SELECT EMPID,EMPNAME,EMPADD
                                                                                                                                   FROM EMPLOYEE
                                                                                                                           WHERE EMPID=101
19223       WAITING              04/27/2012-13:00:00   SARMLTE         BRIJAA3     NONE                   -
19222       WAITING              04/27/2012-13:20:42   SMBRMTSE       ARAFAY     NONE                    -
19439       WAITING              04/27/2012-13:13:51   ZMDRTVEL       LRATAG     STATIC_ROLLBAK       -
19445       EXECUTING            04/27/2012-13:23:06   SARMLTE         -       NONE                  -
19227       WAITING              04/27/2012-11:47:17   KKAYAL         HRAEAV    EXECUTE              INSERT INTO ADDRESS (SELECT CITY, STATE, COUNTRY
                                                                                                            FROM EMP_ADDRESS
                                                                                                            WHERE BIRTHCITY='XYZ'
19154       WAITING              04/27/2012-13:22:31   SARMLTET       ARADA1     NONE                  -
19226       WAITING              04/27/2012-13:00:00   VKAZA1         FRAJAX     NONE                  -
19225       CONNECTED           04/27/2012-08:20:35   BMCRTSEL       SRAJAS     FETCH                 SELECT COUNT(1) FROM COUNTRY WITH UR
19369       WAITING              04/27/2012-11:35:40   GCZILLG         OCRILL      STATIC_COMMIT     -
19224       WAITING              04/27/2012-08:20:35   SMQRTQEL       ARPJAD     NONE                  -
82395       WAITING              04/27/2012-13:21:14   JSARTAEL       IRKJAY      PREP_COMMIT         -
 
 
awk 'NR==FNR{A[$1]=$2;next} A[$1]{$5=A[$1];print}' file2 file1 

You may lose the tabs which separate the fields, however.

Thank you very much for the response Otheus. I tried using the solution, column2 in the output printed below seems fine against the two columns 1 & 3. But the output is missing the rest of the columns from file2. The last column has SQL Statment which runs into multiple lines, so I am mainly concerned as to how it will show up once that column is included in the output.

=> awk 'NR==FNR{A[$1]=$2;next} A[$1]{$5=A[$1];print}' file2 file1
19296 RPRRYM   WAITING
19223 BRIJAA3  WAITING
19222 ARAFAY   WAITING
19439 LRATAG   WAITING
19227 HRAEAV   WAITING
19154 ARADA1   WAITING
19226 FRAJAX   WAITING
19225 SRAJAS   CONNECTED
19369 OCRILL   WAITING
19224 ARPJAD   WAITING
82395 IRKJAY   WAITING

I think my answer is correct but you have the files on the command line in the wrong order.

awk 'NR==FNR{A[$1]=$2;next} A[$1]{$5=A[$1];print}' file1 file2
1 Like

Thank You so much. It worked, as you said the spaces between the fields are gone.

---------- Post updated 05-17-12 at 12:25 AM ---------- Previous update was 05-16-12 at 11:36 AM ----------

In some environments, the file has around 800 lines hence it is difficult to read from the file. This output will be generated 30 mins once and 24/7. I am trying to add spaces upon identification of a space, but this is failing to give the output in a formatted manner. Can you please help. A portion of output is printed below.

19296 WAITING 04/27/2012-10:05:15 DWYGSTC RPRRYM OPEN SELECT EMPID,EMPNAME,EMPADD
FROM EMPLOYEE
WHERE EMPID=101
19223 WAITING 04/27/2012-13:00:00 SARMLTE BRIJAA3 NONE -
19222 WAITING 04/27/2012-13:20:42 SMBRMTSE ARAFAY NONE -
19439 WAITING 04/27/2012-13:13:51 ZMDRTVEL LRATAG STATIC_ROLLBAK -
19445 EXECUTING 04/27/2012-13:23:06 SARMLTE - NONE -
19227 WAITING 04/27/2012-11:47:17 KKAYAL HRAEAV EXECUTE INSERT INTO ADDRESS (SELECT CITY, STATE, COUNTRY
FROM EMP_ADDRESS
WHERE BIRTHCITY='XYZ'
19154 WAITING 04/27/2012-13:22:31 SARMLTET ARADA1 NONE -
19226 WAITING 04/27/2012-13:00:00 VKAZA1 FRAJAX NONE  -
19225 CONNECTED 04/27/2012-08:20:35 BMCRTSEL SRAJAS FETCH SELECT COUNT(1) FROM COUNTRY WITH UR
19369 WAITING 04/27/2012-11:35:40 GCZILLG OCRILL STATIC_COMMIT -
19224 WAITING 04/27/2012-08:20:35 SMQRTQEL ARPJAD NONE -
82395 WAITING 04/27/2012-13:21:14 JSARTAEL IRKJAY PREP_COMMIT -

would you be able to break this down pls so i can understand how it's achieving the result? I have a similar problem and i think i should be able to tweak this if i can understand it. thanks

Hello Jack,

This is what I am trying to do :

1) Running the below query against the DB2 database.

db2 "SELECT SUBSTR(T1.AGENT_ID,1,8) AS APPL_HANDLE,
        CASE APPL_STATUS
        WHEN 'AUTONOMOUS_WAIT' THEN 'AUTONOMOUS_WAIT'
        WHEN 'BACKUP' THEN 'DB_BACKUP'
        WHEN 'COMMIT_ACT' THEN 'COMMIT'
        WHEN 'COMP' THEN 'COMPILING'
        WHEN 'CONNECTED' THEN 'CONNECTED'
        WHEN 'CONNECTPEND' THEN 'CONNECT_PENDING'
        WHEN 'CREATE_DB' THEN 'CREATE_DB'
        WHEN 'DECOUPLED' THEN 'DECOUPLED'
        WHEN 'DISCONNECTPEND' THEN 'DISCONNECT_PENDING'
        WHEN 'INTR' THEN 'REQUEST_INTERRUPTED'
        WHEN 'IOERROR_WAIT' THEN 'TBLSPCE_DISABLE_WAIT'
        WHEN 'LOAD' THEN 'LOAD'
        WHEN 'LOCKWAIT' THEN 'LOCKWAIT'
        WHEN 'QUIESCE_TABLESPACE' THEN 'QUIESCE_TBLSPCE'
        WHEN 'RECOMP' THEN 'RECOMPILING'
        WHEN 'REMOTE_RQST' THEN 'FEDRATD_RQST_PENDING'
        WHEN 'RESTART' THEN 'DB_RESTART'
        WHEN 'RESTORE' THEN 'DB_RESTORE'
        WHEN 'ROLLBACK_ACT' THEN 'ROLLBACK'
        WHEN 'ROLLBACK_TO_SAVEPOINT' THEN 'ROLLBACK_SAVEPOINT'
        WHEN 'TEND' THEN 'TRANSACTION_ENDED'
        WHEN 'THABRT' THEN 'GLOBAL_TRANSN_ROLLBK'
        WHEN 'THCOMT' THEN 'GLOBAL_TRANSN_COMMIT'
        WHEN 'TPREP' THEN 'TRANSACTION_PREPARED'
        WHEN 'UNLOAD' THEN 'DATA_FAST_UNLOAD'
        WHEN 'UOWEXEC' THEN 'EXECUTING'
        WHEN 'UOWQUEUED' THEN 'QUEUED'
        WHEN 'UOWWAIT' THEN 'WAITING'
        WHEN 'WAITFOR_REMOTE' THEN 'PENDING_REMOTE_RQST'
        ELSE ''
        END AS APPL_STATUS,
        CAST(T1.STATUS_CHANGE_TIME AS DATE)||'-'||CAST(T1.STATUS_CHANGE_TIME AS TIME) as STATUS_CHANGE_TIME,
        SUBSTR(PRIMARY_AUTH_ID,1,15) AS AUTH_ID,
        SUBSTR(T1.TPMON_CLIENT_USERID,1,15) as CLIENT_USERID,
        T2.STMT_OPERATION,
        SUBSTR(T2.STMT_TEXT,1,2000) as STMT_TEXT
FROM TABLE(SNAP_GET_APPL_INFO(CAST(NULL AS VARCHAR(128)),-1)) AS T1,
     TABLE(SNAP_GET_STMT('',-1)) AS T2
WHERE T1.AGENT_ID=T2.AGENT_ID"|sed -e 's/[ \t]*$//' -e '5d' -e 'N;$!P;$!D;$d' > File1.out

2) The field "SUBSTR(T1.TPMON_CLIENT_USERID,1,15) as CLIENT_USERID" in the query is not returning any value but the rest of fields in the SELECT are.

3) Since I am not getting the value for the field - "SUBSTR(T1.TPMON_CLIENT_USERID,1,15) as CLIENT_USERID", I am running the below query to get the output. - (Checked in the DB2 Forums about it. Yet to get a response on this)

db2 "get snapshot for applications on $2 AT DBPARTITIONNUM $CORD_PART "| grep -e "Application handle                         =
Client login ID                            = "|awk -F "=" '{print $2}' > Intermediate.out

4) I am then doing a row to column transpose of the output present in the file - Intermediate.out

paste -d' ' - - < CON_AUTH_CLNT.out > File2.out

5) Comparing the value "SUBSTR(T1.AGENT_ID,1,8) AS APPL_HANDLE" in File1.out with "Application handle" value in File2.out, if there is a match printing the value "Client login ID" from File2.out in the 5th column of File1.out. (Suggested by Otheus)

awk 'NR==FNR{A[$1]=$2;next} A[$1]{$5=A[$1];print}' File2.out File1.out > Final.out

6) The script will be run 30 mins once and 24/7 in multiple database environments. A team will review the results periodically. The output can sometimes be a 1000 lines which makes it hard to read (SQL Statements can themselves run into 100's of lines).

7) Below is a portion of the output.

19296 WAITING 04/27/2012-10:05:15 DWYGSTC RPRRYM OPEN SELECT EMPID,EMPNAME,EMPADD
FROM EMPLOYEE
WHERE EMPID=101
19223 WAITING 04/27/2012-13:00:00 SARMLTE BRIJAA3 NONE -
19222 WAITING 04/27/2012-13:20:42 SMBRMTSE ARAFAY NONE -
19439 WAITING 04/27/2012-13:13:51 ZMDRTVEL LRATAG STATIC_ROLLBAK -
19445 EXECUTING 04/27/2012-13:23:06 SARMLTE - NONE -
19227 WAITING 04/27/2012-11:47:17 KKAYAL HRAEAV EXECUTE INSERT INTO ADDRESS (SELECT CITY, STATE, COUNTRY
FROM EMP_ADDRESS
WHERE BIRTHCITY='XYZ'
19154 WAITING 04/27/2012-13:22:31 SARMLTET ARADA1 NONE -
19226 WAITING 04/27/2012-13:00:00 VKAZA1 FRAJAX NONE  -
19225 CONNECTED 04/27/2012-08:20:35 BMCRTSEL SRAJAS FETCH SELECT COUNT(1) FROM COUNTRY WITH UR
19369 WAITING 04/27/2012-11:35:40 GCZILLG OCRILL STATIC_COMMIT -
19224 WAITING 04/27/2012-08:20:35 SMQRTQEL ARPJAD NONE -
82395 WAITING 04/27/2012-13:21:14 JSARTAEL IRKJAY PREP_COMMIT -

Thanks

What do you want to do with the very long SQL statements? Do you want to indent them on second/third lines?

BTW: This is what perl was designed for:

#!/usr/bin/perl
format =
@######  @<<<<<<<<  @<<<<<<<<<<<<<<<  ^<<<<<<<<<<<<<<<<<<<<<<<<<<<
$pid, $status, $datetime, $sql
~~       ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$sql                                                                                           .
while (<>) {
  ($pid,$status,$datetime,$sql) = split(" ",$_,4);
  write;
}

pipe the output of awk into this program and save as your report. It will look real pretty :slight_smile:

If you need more help with formatting, do man perlform

---------- Post updated at 06:45 PM ---------- Previous update was at 06:38 PM ----------

Jack, here's a break-down of the awk command:

awk 'NR==FNR{A[$1]=$2;next} A[$1]{$5=A[$1];print}'

In awk, NR is the total input lines seen, while FNR is the number of input lines seen in the current file. So this essentially means: if we're processing the first file, "do this" (the code in the first set of braces {...}).

That code sets an associate array to the value of the 2nd column (in the first file, remember), where the index is the 1st column -- which is common in both files.

Now do a "next" which means do not process any more code for the current line. This ensures the rest of the awk script is not executed for the first file.

So the first "pattern/program" applies to the first file -- and only the first -- while the second "pattern/program" applies to the second (and subsequent) file(s).

The second pattern/program looks at each line (in the second file) and if the first column is found in the array A, and if the value is not null or not blank, it runs the portion between the braces.

The code in the braces simply replaces the 5th field of that line with the contents of what was seen in line indexed by the first column in the first file.

1 Like

thanks otheus, very well explained. one question though. "and if the value is not null or not blank"

which bit is doing the above validation?

The A[$1] before the code block: A[$1]{$5=A[$1];print}

nulls and blanks are considered false and won't run the code, anything else is true and will run the code block.

3 Likes

Thank You guys for the clear and detailed explanation.

man perlform has great options for formatting. I will try to use those options.