Delete lines containing key words dynamically

weknowd · May 15, 2017, 5:21pm

Hi Frens,

I have a requirement where I need to delete lines having key words and am using the below command to do that

sed '/UNIX/d' inputfile > output

But now I have one more requirement where in there will be one reference file which has the ID's to be deleted from the master file.

Eg:

Master File

ID,NAME
1,XX
2,YY
3,ZZ
4,DD

Reference File

ID
2
3

So my output file should be

ID,NAME
1,XX
4,DD

It would be helpful if you can let me know how we can acheive it dynamically.

drl · May 15, 2017, 6:02pm

Hi.

Unclear to me what is meant by dynamically. Here is a solution with grep :

#!/usr/bin/env bash

# @(#) s1       Demonstrate delete lines with reference file, grep.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C grep

FILE=${1-data1}
E=expected-output.txt

pl " Input data file and reference file data2:"
head data[12]

pl " Expected output:"
cat $E

pl " Results:"
grep -vf data2 data1 |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.7 (jessie) 
bash GNU bash 4.3.30
grep (GNU grep) 2.20

-----
 Input data file and reference file data2:
==> data1 <==
1,XX
2,YY
3,ZZ
4,DD

==> data2 <==
2
3

-----
 Expected output:
1,XX
4,DD

-----
 Results:
1,XX
4,DD

-----
 Verify results if possible:

-----
 Comparison of 2 created lines with 2 lines of desired results:
 Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.

See man grep for details.

Best wishes ... cheers, drl

weknowd · May 15, 2017, 11:16pm

Thanks drl for your time. It works and once again thanks for your step by step explanation.

Don_Cragun · May 16, 2017, 1:55am

Hi weknowd,
Note that according to post #1 in this thread, the desired output included the header that was present in both input files, while the output produced by:

grep -vf file2 file1

deleted the header line. Note also that if the ID values are of varying lengths and one of the IDs in file2 to be removed from the master file ( file1 ) also appears as a substring of another ID or appears in the NAME field, the suggested code may remove additional lines that have IDs that are not included in file2 . For example, if file1 contained:

ID,NAME
123,Jane Doe
2,John Doe
312,Jack Smith
421,Jim Taylor
567,Fred Zahn

and file2 contained:

ID
2
3

the given code will not only remove the header line, a line with ID 2, and a line with ID 3; it will also remove lines with IDs 12, 21-29, 32, 42, ... and lines with IDs 13, 23, 31, 33-39, 43, ....; i.e., the results would only be:

456,Fred Zahn

not:

ID,NAME
123,Jane Doe
312,Jack Smith
421,Jim Taylor
567,Fred Zahn

With your problem statement, we have no idea whether or not the code drl suggested works for real data you may need to process or just works for the sample data you provided.