Two files, remove lines from second based on lines in first

esoffron · January 31, 2014, 8:49am

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted.

keepout:

user1
buser3
anuser19
notheruser27

database:

user1,2343,"information about",field,blah,34
user2,4231,"mo info",etc,stuff,43
notheruser27,4344,"hiya",thing,more thing,423
user5,6666,"test text",info,stuff,833

Output would be

user2,4231,"mo info",etc,stuff,43
user5,6666,"test text",info,stuff,833

Make sense? If a line in keepout matches a field (would be nice to just be first field) in database then I want to dump the line in the output.

I've tried awky ways and greppy ways but can't get it just right. It's not entire lines that are matching, just the lines. Kind of like a loopy grep -v.

Thanks for your help!

Yoda · January 31, 2014, 8:56am

awk -F'[, ]' 'NR==FNR{A[$1];next}!($1 in A)' keepout database

esoffron · January 31, 2014, 9:07am

Awesome, except it isn't working on my "real" files. It works entirely correctly on the sample data I gave -- your answer is correct.

I wonder if one of the hacks I had tried before might have worked...

In the real world, with messy data, what might be keeping this from working? Some unprintable nonsense? Line ending CR/LF vs CR vs LF crud? Do you have tips for me where this might be brittle?

Thanks!

Yoda · January 31, 2014, 9:10am

Use hexdump or od to check if you have any such characters in your file.

esoffron · January 31, 2014, 9:27am

Craaaaaap. I think I had this on my own but was bitten by a trip through an editor that changed the line endings. So, takeaway for my internet self when I search for this again.

1) Yoda was right! Thanks for the help!
2) Self, it might not be your code but the input. Maybe create your test files before making them up to ask for help...
3) LF line endings vs CR line endings matter, and seem to be what broke this for you. (LF is what you needed and what seems to fix it.)