sed to delete items in an array from a file

bash_in_my_head · June 22, 2016, 2:30pm

I need to create a shell script to delete multiple items (Strings) at a time from a file.

I need to iterate through a list of strings.

My plan is to create an array and then iterate through the array.
My code is not working

#!/bin/bash -x


declare -a array=(one, two, three, four)

for element in "${array[@]}"
do
    sed "s/"$element"//g" input
    #awk "{gsub("$element", "");print}" input
done

the file input contains
one
two
three
four

MadeInGermany · June 22, 2016, 4:18pm

Why do you want an array?

for element in one two three four
do
...

bash_in_my_head · June 22, 2016, 4:49pm

input file contains
one
two
three
four

I moved away from the array to an input file. The script hangs as shown here.

#!/bin/bash -x


for i in `cat input`
#while true
do
    sed -e 's/'"$i"'/'" "'/g'
#    echo $i
done > output.remove

shell$ ./remove-these.sh

++ cat input
+ for i in '`cat input`'
+ sed -e 's/one/ /g'

#!/bin/bash -x

for i in `cat input`
do
    sed -e 's/'"$i"'/'" "'/g' input > output.remove
#done < input > outfile1
done

MadeInGermany · June 22, 2016, 5:29pm

Where are the items to delete,
and what is the file you want to delete the items from?

If your items are from the file "input" then it does not make sense to delete the items from the file "input". It will become empty. Or do you want to prove just that?

bash_in_my_head · June 22, 2016, 5:44pm

The items to delete are in the input file. This script is just to prove the loop works. input is the file I want to delete from. I could use sed edit in place or not. Some internet posts said the sed -i option was not working in a loop. I did not verify.

I have an 18MB data file with a lot of redundant repeated lines that need to be deleted. Having an outside file containing a list of those lines to be deleted is best for this script. I want to read into the loop the source file and output to a second file. The source file has about 15 MB of repeated useless data to remove so looping over the file seems the best choice. I dont really care what construct I use but I am not having luck with those I have tested.

Don_Cragun · June 22, 2016, 8:32pm

You say you want to remove useless repeated data, but the code you are using replaces every copy of the data (not just repeated data) with a <space> character (even if the data you want to "remove" is at the start, at the end, or in the middle of a longer string.

Please be very clear about:

whether you want to replace occurrences of the strings you find in a file with a <space> character or want to remove occurrences of those strings,
whether you want to replace or remove all occurrences you find or just want to remove duplicate occurrences,
whether you want to replace or remove occurrences even if they are in the middle of larger "words", only if they are separate "words", or only if they are the complete contents of an input line of text, and
whether you want to remove lines that have been turned into blank lines or empty lines by the changes made above or keep those blank or empty lines in your updated file.

If you want to completely remove complete lines of matched text (as in your example), consider using something more like:

grep -Fvx -f file_of_lines_to_remove file_to_be_updated > updated_file

to do the entire job in one pass instead of one pass per word "removed". If you aren't removing complete lines, consider using awk to process each input file once instead of using sed to process each input file once per word to be "removed".