Need help with how to search a file for a variable string and delete that line

newbie_01 · February 23, 2019, 12:21am

Hi,

I have a working script.

It does what I am intending it to but a bit confused whether the sed part is supposed to be working or not. Further down is the script with the sed part that should have been working but not and the grep -v part which is the workaround that I am using at the moment.

I am posting this to get some help on how to use sed to get it to work. I've run out of things to try. OS is Solaris 5.9. Not sure how to check the sed version. I can't use sed -i, it gives error.

list.all is the input file that we want to search and delete line. Below is just an example. The original file is 10K lines long, it contains the absolute file path plus some other commands of sorts in between.

  
  '/one/one/one/one',
  '/two/two/two/two',
  '/three/three/three/three',
  '/four/four/four/four',
  '/five/five/five/five',
  '/six/six/six/six',
  '/seven/seven/seven/seven',
  '/eight/eight/eight/eight',
  '/nine/nine/nine/nine',
  '/ten/ten/ten/ten',

list.err contain the string that we want to search and delete line from list.all. We want to search for the string that is the first field and delete line containing that string.

/one/one/one/one: No such file or directory
/three/three/three/three: No such file or directory
/five/five/five/five: No such file or directory
/seven/seven/seven/seven: No such file or directory
/nine/nine/nine/nine: No such file or directory

Below is the script that I am using at the moment:

$: cat fix.ksh
#!/bin/ksh
#

source="list.all"
cp -p ${source}.save ${source}

echo
echo "------------------------------------------"
echo
echo "- This is the list that we want to cleanse"
echo
cat ${source}
echo
echo "------------------------------------------"
echo

echo ""
echo "============================"
echo "- sed not working :("
echo "============================"
echo ""
count=0
while read line
do
   let count=$count+1
   datafile=`echo $line | awk -F":" '{ print $1 }'`
   echo "- $count // Checking $datafile ... "
   grep -i "${datafile}" ${source}
   sed "#${datafile}#d" ${source} > ${source}.tmp
   cp -p ${source}.tmp ${source}
   echo ""
done < list.err
echo
echo "==> Not working :("
echo "Contents of ${source} is as below:"
echo
cat ${source}

echo
echo "============================"
echo "- Using grep -v "
echo "============================"
echo
cp -p ${source}.save ${source}
count=0
while read line
do
   let count=$count+1
   datafile=`echo $line | awk -F":" '{ print $1 }'`
   echo "- $count // Checking $datafile ... "
   grep -v "${datafile}" ${source} > ${source}.tmp
   cp -p ${source}.tmp ${source}
   echo ""
done < list.err
echo
echo "==> Required Working Output:"
echo "Contents of ${source} is as below:"
echo
cat ${source}

echo
echo "============================"
echo "- Using grep and sed"
echo "============================"
echo
cp -p ${source}.save ${source}
count=0
while read line
do
   let count=$count+1
   datafile=`echo $line | awk -F":" '{ print $1 }'`
   echo "- $count // Checking $datafile ... "
   n=`grep -n "${datafile}" ${source} | awk -F":" '{ print $1 }'`
   sed "${n}d" ${source} > ${source}.tmp
   cp -p ${source}.tmp ${source}
   echo ""
done < list.err
echo
echo "==> Required Working Output:"
echo "Contents of ${source} is as below:"
echo
cat ${source}

Below is an example run of the script:

$: ./fix.ksh

------------------------------------------

- This is the list that we want to cleanse

  '/one/one/one/one',
  '/two/two/two/two',
  '/three/three/three/three',
  '/four/four/four/four',
  '/five/five/five/five',
  '/six/six/six/six',
  '/seven/seven/seven/seven',
  '/eight/eight/eight/eight',
  '/nine/nine/nine/nine',
  '/ten/ten/ten/ten',

------------------------------------------


============================
- sed not working :(
============================

- 1 // Checking /one/one/one/one ...
  '/one/one/one/one',

- 2 // Checking /three/three/three/three ...
  '/three/three/three/three',

- 3 // Checking /five/five/five/five ...
  '/five/five/five/five',

- 4 // Checking /seven/seven/seven/seven ...
  '/seven/seven/seven/seven',

- 5 // Checking /nine/nine/nine/nine ...
  '/nine/nine/nine/nine',


==> Not working :(
Contents of list.all is as below:

  '/one/one/one/one',
  '/two/two/two/two',
  '/three/three/three/three',
  '/four/four/four/four',
  '/five/five/five/five',
  '/six/six/six/six',
  '/seven/seven/seven/seven',
  '/eight/eight/eight/eight',
  '/nine/nine/nine/nine',
  '/ten/ten/ten/ten',

============================
- Using grep -v
============================

- 1 // Checking /one/one/one/one ...

- 2 // Checking /three/three/three/three ...

- 3 // Checking /five/five/five/five ...

- 4 // Checking /seven/seven/seven/seven ...

- 5 // Checking /nine/nine/nine/nine ...


==> Required Working Output:
Contents of list.all is as below:

  '/two/two/two/two',
  '/four/four/four/four',
  '/six/six/six/six',
  '/eight/eight/eight/eight',
  '/ten/ten/ten/ten',

============================
- Using grep and sed
============================

- 1 // Checking /one/one/one/one ...

- 2 // Checking /three/three/three/three ...

- 3 // Checking /five/five/five/five ...

- 4 // Checking /seven/seven/seven/seven ...

- 5 // Checking /nine/nine/nine/nine ...


==> Required Working Output:
Contents of list.all is as below:

  '/two/two/two/two',
  '/four/four/four/four',
  '/six/six/six/six',
  '/eight/eight/eight/eight',
  '/ten/ten/ten/ten',

I used

sed "#${datafile}#d" ${source}

instead of

sed "/${datafile}/d" ${source}

because the latter gives the error

First RE may not be null

For the moment, I stick with the grep -v option because the grep -n and sed "${n}d" option doesn't work if there is multiple match of the variable string. I've tried some of the ones posted to the forum that has a similar question and they doesn't work for me. The sed that I have doesn't allow the sed -i option.

Any advise much appreciated. Thanks in advance.

Don_Cragun · February 23, 2019, 2:07am

You're so close, just off by one character. When you use a character other than / as the delimiter in a BRE search pattern, you have to escape the first use of that character with a backslash (i.e., \ ). So, if you change:

   sed "#${datafile}#d" ${source} > ${source}.tmp

to:

   sed "\#${datafile}#d" ${source} > ${source}.tmp

it works.

Note, however, that there is no need to use awk to split fields from the lines you read from list.err . There are several things you can do to get the same results without needing to use a command substitution invoking awk (which is a rather costly and slow way of doing what you want to do). One easy way is to let read do the field splitting for you since you're already calling it. For example, if you change:

while read line
do
   let count=$count+1
   datafile=`echo $line | awk -F":" '{ print $1 }'`

in all places in your script where it occurs to:

while IFS=':' read datafile line
do
   let count=$count+1

you'll get the same results.

RudiC · February 23, 2019, 5:21am

On top of what Don Cragun already said, your script runs several commands per error list line, which might be considered not too efficient, esp. when dealing with long files. How about condensing the entire script down to a one liner with one grep and one sed only, making use of "process substitution" available in ksh et al.? Like

grep -vf<(sed -n 's/:.*$//; 1h;1!H; ${g;p}' list.err) list.all
'/two/two/two/two',
'/four/four/four/four',
'/six/six/six/six',
'/eight/eight/eight/eight',
'/ten/ten/ten/ten',

newbie_01 · February 25, 2019, 10:27pm

Thanks Don, I feel so small looking at what's missing, I thought when I tried //, that's it so I change it to a # but never thought about the \.

--- Post updated at 10:22 PM ---

Hi RudiC,

I like the one-liner one but it gives me error as below:

Using

grep -vf<(sed -n 's/:.*$//; 1h;1!H; ${g;p}' list.err) list.all

Gives the following error:

sed: command garbled: s/:.*$//; 1h;1!H; ${g;p}
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .

Using

/usr/xpg4/bin/grep -vf<(sed -n 's/:.*$//; 1h;1!H; ${g;p}' list.err) list.all

or

/usr/xpg4/bin/grep -vf<(/usr/xpg4/bin/sed -n 's/:.*$//; 1h;1!H; ${g;p}' list.err) list.all

Gives the following error:

sed: command garbled: s/:.*$//; 1h;1!H; ${g;p}

--- Post updated at 10:27 PM ---

Hi RudiC

FYI, the one-liner as expected does work on Linux but on the Solaris one.

Don_Cragun · February 25, 2019, 10:57pm

Hi newbie_01,
There is no reason to feel small. We're all here to learn. Next time you'll remember what you need to do.

Unfortunately you didn't tell us what operating system you're using. With a ksh with a 1988 vintage from many UNIX systems (including the Solaris system we could now guess that you're using), you can't get process substitution (you need a ksh93 for that). This is why it is so important for you to tell us what operating system and shell (including version numbers) you're using each time you start a thread so we know which operating system and shell extensions have a chance of working in your environment.

Even without process substitution, one could use something like:

#!/bin/ksh
IAm=${0##*/}
TmpFile=$IAm.$$
trap 'rm -rf "$TmpFile"' 0

sed 's/:.*$//' list.err > "$TmpFile"
/usr/xpg4/bin/grep -vf "$TmpFile" list.all

It isn't a one-liner, but it just invokes grep once, sed once, and rm once at the end to remove the temporary file it creates.

This should work on both Solaris and Linux systems (but you'll need to remove the /usr/xpg4/bin/ on the Linux system).

newbie_01 · February 26, 2019, 7:10am

Hi Don

OS Version and ksh version below:

$: uname -a
SunOS <hostname> 5.8 Generic_Virtual sun4v sparc sun4v
$: strings /bin/ksh | grep -i version
@(#)Version M-11/16/88i
$: what /bin/ksh
/bin/ksh:
        Version M-11/16/88i
        SunOS 5.8 Generic 110662-26 Mar 2011

Had tried your script and that works superbly. Love the way you use the trap. I think I should start adding that in all of the scripts especially for SIGKILL=9 when we have to do a kill -9.

Thanks a lot again.

Don_Cragun · February 26, 2019, 8:21am

Hi newbie_01,
I'm glad it is working for you.

Unfortunately, kill -9 can't be caught so a trap won't work in that case. Fortunately, that is the only case that won't work. You can use trap to clean up on normal exit, SIGINT ( kill -2 or ctl- c ), and SIGQUIT ( kill -3 or ctl- \ ). The SIGINT and SIGQUIT cases are significant since they have keyboard shortcuts as well as kill commands that can be used to generate them. The control C and control \ are the usual defaults on American keyboards, but you can change the characters used to generate both of those signals using the stty utility if you want different keyboard shortcuts.

If you use SIGKILL ( kill -9 ) to terminate a job (any job), you'll have to manually go in and delete any temporary files that job may have created. Always try using SIGINT first to gives jobs a chance to clean up after themselves and use SIGKILL only as a last resort.

RudiC · February 26, 2019, 8:50am

I don't have access to a Solaris system so can't test anything, but looking at (and shamelessly stealing from) Don Cragun's post, the following one liner might work without using a temp file:

sed 's/:.*$//' list.err | /usr/xpg4/bin/grep -vf- list.all

Don_Cragun · February 26, 2019, 10:20am

rudic:

I don't have access to a Solaris system so can't test anything, but looking at (and shamelessly stealing from) Don Cragun's post, the following one liner might work without using a temp file:
sed 's/:.*$//' list.err | /usr/xpg4/bin/grep -vf- list.all

Hi RudiC,
That is certainly worth a try.

I no longer have access to a Solaris system either, but I did think of that and tried it on macOS Mojave (version 10.14.3) which uses a BSD derived grep . It doesn't work there.

Furthermore, the standards allow (but do not require) file operands specified as - to be treated by grep as a synonym for reading from standard input. The standards do not say anything about allowing the -f pattern_file option-argument to be treated the same way. Therefore, I doubt that a Solaris /usr/xpg4/bin/grep will accept a - as anything other than the actual name of a real file for that option-argument even if they do treat - that way when used as a file operand.

Hi newbie_01,
Would you like to try that for us on your Solaris system and let us know what happens?

bakunin · February 26, 2019, 12:02pm

Expanding on what Don Cragun has already explained: The difference between "9" and all other signals a process can get is that "9" (like in kill -9 ) is not handled by the process itself. Itis handled by the system (that is: a part of the kernel).

When you issue the command kill -n <PID> (for n being some number - see kill -l for a list of all legal values) this generates a "signal" which is then passed to the process and handled by it. In principle a process can react however it wants (or even not at all) to a specific signal but there are conventions and therefore expected reactions of a process to a certain signal. i.e. kill -1 usually causes a process to re-read its configuration file(s) and then resume operation using this new configuration.

The ultimate signal you can send to a process is "15" which means "terminate immediately". A well-behaved process will then terminate, no matter what it is doing. Still, "terminating" here means not just "exit" but to perform every cleanup possible: it there are temporary files open, then close them and remove them, if there are network connections open close them and free up the sockets, etc..

This is the main difference between signal 15 and 9: with 9 the process will not be notified but immediately (and in the most brutal way possible) removed by the OS. It will simply have no time to clean up, in fact it isn't even aware of it being killed.

Therefore, when you want to stop a process, always try kill -15 at least once (better: two to three times) and only if the process is proven not to be able to terminate itsself any more try a kill -9 - but ONLY then. Your system will remain much more stable because processes will have the chance to clean up (and well-behaved UNIX processes do that!).

I hope this helps.

bakunin

i4ismail · February 26, 2019, 5:41pm

Thank you