I have two files; file A and file B. I need all the entries of file A to be compared with file B line by line. If the entry exists on file B, then save those on file C; if no then save it on file D
Note :- all the columns of the lines of file A need to be compared, except the last two columns (date & time)
---------- Post updated at 07:06 AM ---------- Previous update was at 07:03 AM ----------
I wrote one script which will compare the entries and save to fileC if enties existed on both the file. But i am not able to put a condition for those which does not exists and save on fileD
cat fileB | while read STATUS CLIENT DB POLICY SCHEDULE DATE TIME
do
grep -w "$DB" fileA | grep -w "$CLIENT" | grep -w "$POLICY" | grep -w "$SCHEDULE" | grep -w "$DATE" >> fileC
done
You could also try this logic which builds a temp file for fileB without the last two fields and then just uses grep -f, however if fileB is large, then the script may a little lacking in performance. I've used the internal code rather than some convoluted echo $line through some sort of field counter, subtract two then echo $line | cut -f -$wanted that spawns several processes for each record trim and is a lot slower, but I've seen it quite a lot elsewhere :wall: and probably used it myself too before I found a better way -
#!/bin/ksh
{ cat fileB | while read line
do
outline="${line% * *}"
echo $outline
done } > temp-fileB
grep -f temp-fileB fileA > fileC
grep -vf temp-fileB fileA > fileD
cat fileA | while read CLIENT DB POLICY SCHEDULE DATE TIME
do
if ( grep -w "$DB" fileB | grep -w "$CLIENT" | grep -w "$POLICY" | grep -w "$SCHEDULE" )
then echo $DB $CLIENT $POLICY $SCHEDULE $DATE $TIME >> fileC
else echo $DB $CLIENT $POLICY $SCHEDULE $DATE $TIME >> fileD
fi
done
---------- Post updated at 06:37 AM ---------- Previous update was at 06:30 AM ----------
now situation becomes more complicated; we need to apply 2 more conditions
INITIAL SETUP
base condition
values from column 1 to 5 of fileA should match with fileB
;
if matching, put it on fileC and if not fileD
NOW
values from column 1 to 5 of fileA should match with fileB
and
values of column 6 & 7 of fileA are greater than fileB
;
if matching, put it on fileC and if not fileD
i wrote one script
cat fileA | while read STATUS CLIENT DB POLICY SCHEDULE DATE TIME ; do
if ( grep -w "$DB" fileB | grep -w "$CLIENT" | grep -w "$POLICY" | grep -w "$SCHEDULE" ) ; if ( $6 < "'$DATE'" ) ; if ( $7 < "'$TIME'" )
then echo $STATUS $DB $CLIENT $POLICY $SCHEDULE $DATE $TIME >> fileC
else echo $STATUS $DB $CLIENT $POLICY $SCHEDULE $DATE $TIME >> fileD
fi
done
but its erroring out as below :wall:
line 14: syntax error near unexpected token `done'
For a large fileB, you will be spawning lots of grep processes, 4 for each record, and that will take time.
You are also assuming that the date can be compared so easily. You will need to reformat them so they come out as yyyy/mm/dd else your comparison would find something with a date of 15/01/2011 as "newer" than 10/02/2011
You could call a conversion for each record, but that could get rather complex. I will have a think. I would still recommend against grep | grep | grep stuff though. It could cripple your system for serisous size files.
files are not so large. date wise, yes you are correct. i need to split and then compare. But once i have a base script, then can modify that date part later. Any idea why the syntax error is coming ?
i tried test1.sh and test3.sh and both are not working, its keep on looping and not creating any fileC and dumping lots of data to fileD. :wall:
Pls try on below files
CONDITIONS
values from column 1 to 5 of fileA should match with fileB
and
values of column 6 & 7 of fileA are greater than fileB
;
if matching, put those matching entries that matched with fileB on fileC and if not then put those unmatching entries of fileA on fileD