Shell script, bash, have 2 large files around 1.2 GB data, with key and values, I need to compare both files based on the key and store difference in the value in the third file,
File 2 will always be a subset of File 1, just need to find values(against key) which are not present in file 2 and unique ones in File 1.
File 1:
test1 marco;polo;angus
test2 mike;zen;liza
test3 tom;harry;alan
test4 bob;june;janet
1332239_44557576_CONTI Lased & Micro kjd $353.50_30062020_lsdf3_no-rule 343323H;343434311H;454656556H;343343432H
1332240_44557576_CONTI Mazed & Micro kjd $353.50_30062020_lsdf3_some-rule 232324L;2226556H;343223432H
File 2:
test1 polo;angus
test2 mike
test4 bob;janet
1332240_44557576_CONTI Mazed & Micro kjd $353.50_30062020_lsdf3_some-rule 232324L;343223432H
I would like to compare the first two columns of file1 with file2 (search through the entire contents of file2 in first two columns) if they match print the difference of values. Then search for the second line of file 1 and so on. Also keys unique in file 1 should be printed.
Expected Output:
test1 marco
test2 zen;liza
test3 tom;harry;alan
test4 june
1332239_44557576_CONTI Lased & Micro kjd $353.50_30062020_lsdf3_no-rule 343323H;343434311H;454656556H;343343432H
1332240_44557576_CONTI Mazed & Micro kjd $353.50_30062020_lsdf3_some-rule 2226556H
The files I have are huge, containing about 100,000 lines, so I would like to make the execution fast. This is running in shell script, using bash shell scripting. These files file1 and file2 are text file, with this as key ( 1332239_44557576_CONTI Lased & Micro kjd $353.50_30062020_lsdf3_no-rule) and these as values : (343323H;343434311H;454656556H;343343432H
)