Dynamic Array Issue

ddedic · March 11, 2007, 1:48pm

Could one of you, please, provide some input regarding my problem below and it is as follows:

I have 2 files that I need to make sure are identical before processing:

First, I sort both files

Second, I do a diff file1 file2 > File 3
This provides me with the difference.

Now, I need to read File 3 - into an array and remove the difference from both files - if any

filename = /.../.../.../... #input file
set -A <array_name> [ < $filename]

Could I use the sed '/expression/' to remove the line???

Please advise.

Don

Perderabo · March 11, 2007, 2:39pm

Just:
cp file1 file2
or copy the larger one to the smaller one (or vice versa).

ddedic · March 11, 2007, 4:44pm

sorry, for not being more specific:

I am processing 2 large files - 10mil records

One is master and the other is detail. Because there is no uniqueness between them I have to rely on the row count of each file before processing.

With that said, I cut -d "|" -f2 from both files into > aa.out and ab.out

Once this is complete, I sort both files and do a diff to get the records in one and not the other and vice versa.

Now that I have a diff.out (most of the time it will only be a record or two)
I need to be able to remove them from the original file before application processing.

I will attempt to illustrate and for your review:

File 1
111111
222222
333333
555555

File 2
111111
222222
333333
444444

cat diff.out
03c4400
444444
03c5556
555555

I now need to remove those two records from both files - please let me know if any ideas.

Perderabo · March 11, 2007, 6:29pm

I assume you have something like file1.orig and file2.orig with more fields but using | as a field separator. Here is a way to do this with comm and sed:

$
$ cat file2.orig
111111|sdjksd
222222|sdjksd
333333|sdjksd
444444|sdjksd
$ comm -13 file1 file2 | sed 's=^=/^=;s=$=\|/d=' > file2.sed
$ sed -f file2.sed < file2.orig
111111|sdjksd
222222|sdjksd
333333|sdjksd
$

ddedic · March 11, 2007, 10:38pm

thank you for your prompt response. This is exactly what I was looking for. Thank you.

ddedic · March 11, 2007, 11:08pm

correct - i am using a "|" delimited file and the only common field between the 2 files is -f3 that i cut before comparing - the rest of the fields are different.

Specifically, I am struggling with step to remove those records from the individual files once difference has been determined.

Please let me know if you can help.

Example.

File 1 - master
ab|dc|11111|2|3|4|5|6|
ab|dc|22222|2|3|4|5|6|
ab|dc|33333|2|3|4|5|6|
ab|dc|55555|2|3|4|5|6|

File 2 - detail
xa|zd|qs|11111|w|3|4|66|b
xa|zd|qs|22222|w|3|4|8|d
xa|zd|qs|33333|w|3|4|3|s
xa|zd|qs|44444|w|3|4|1|f

===============================

After compare result should be:
File 1 - master
ab|dc|11111|2|3|4|5|6|
ab|dc|22222|2|3|4|5|6|
ab|dc|33333|2|3|4|5|6|

File 2 - detail
xa|zd|qs|11111|w|3|4|66|b
xa|zd|qs|22222|w|3|4|8|d
xa|zd|qs|33333|w|3|4|3|s

Perderabo · March 11, 2007, 11:56pm

$ cat file2.orig
aa|bb|111111|sdjksd
aa|bb|222222|sdjksd
aa|bb|333333|sdjksd
aa|bb|444444|sdjksd
$ comm -13 file1 file2 | sed 's=^=/^[^|]*|[^|]*|=;s=$=\|/d='  > file2.sed
$ sed -f file2.sed < file2.orig
aa|bb|111111|sdjksd
aa|bb|222222|sdjksd
aa|bb|333333|sdjksd
$