In the attached files, I am trying to use import.txt to find what is missing in all.txt and print the missing lines in missing.txt. I used SQL to import a list into a database and got errors and need to figure out what didn't import correctly. The below script is close, I think, but doesn't result in the desired output (all the lines that do not have a match with import.txt)
So if the text in import.txt is in all.txt that line is not printed, however if the text in import.txt is not in all.txt, then the entire line is printed in missing.txt. Thank you :).
awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) print k }' import.txt all.txt > missing.txt
cmccabe,
I'm having trouble making out the lines on the smaller file you attached (the other one is 87.7mb). Can you attach smaller exmaple files? Also, can you show your expected results and the undesired results you are getting?
If you are comparing the entire lines on the files (e.g. not matching on specific keys) and are only looking for lines in import.txt not in all.txt, another way you can try is the below if files are sorted (or you can pre-sort):
comm -23 import.txt all.txt > missing.txt
Re: using SQL to import into a database and you need to figure out what didn't import correctly, are you not capturing the records that failed to load at that point? e.g. using Oracle SQL Loader and a .bad file, you can store the records that failed to load during the insert.
I have attached smaller files of each. Basically, the desired output.txt would be all the lines that do not match import.txt (should be 6 out of the 10) - All the PXL- do not match so they are written to output.txt. Thank you :).
Good, you're welcome. But have you looked at identifying these records (or the next ones that you might load) that failed at insert time? Or perhaps this was a one time load and not something you will repeat again?
This was a one time load, that, hopefully, will not be repeated again. I think the error "timed-out" and only completed half of the files. But I had noo idea which ones until now.... thanks again :).
Actually you don't need to store the long $0 in memory, map[$1]=1 is enough.
Most awk can store even nothing, and can lookup elements without defining them if not found
awk 'FNR==NR{map[$1];next} ($18 in map)' import.txt all.txt