Remove a range of lines from a file using sed

Andy82 · January 19, 2012, 10:15am

Hi

I am having some issue editing a file in sed.

What I want to do is, in a loop pass a variable to a sed command. Sed should then search a file for a line that matches that variable, then remove all lines below until it reaches a line starting with a constant.

I have managed to write a sed command that will give me the block of text, but I cannot figure out to return the contents of the excluding the text I have searched for.

Code I have so far is.

sed -n '/'"${var[$count]}"'/,/'"Con       ST"'q/p'  < input | head -5> output

shamrock · January 19, 2012, 10:34am

You have the logic backwards because instead of printing out the unwanted lines just delete them...

sed '/'"${var[$count]}"'/,/'"Con       ST"'q/d' file

Corona688 · January 19, 2012, 10:41am

Running sed many times to process one file is extremely wasteful and slow. What you want to do, if I understand your intention, can be done with one single execution of awk.

# Array of lines to skip, initialized in the BASH way
ARRAY=( 3 4 5 )
# Feed the array into awk in the X variable.
# Then split it into the array L[1]=3, L[2]=4, L[3]=5
# Then turn that into the array S[3]=1, S[4]=1, S[5]=1.
# Then we can use the simple statement !S[NR] to check if the
# line should be skipped or not and print it accordingly.
awk -v X="${ARRAY[*]}" 'BEGIN { split(X,L," "); for(Z in L) S[L[Z]]=1; }; !S[NR]' filename

Andy82 · January 25, 2012, 10:31am

Thanks for the feedback. The command is nearly working. I am still having an issue getting sed to stop searching when it finds the second string in the range.

What it should do is.

Search the file for the first string in the range. This string will have a unique value for each occurrence in the file. So I am using a variable.
Take the lines from the file under the above string until it gets to the second occurrence of the second string in the range. The second string is no unique in the file. This is why I want sed to stop searching when it finds it for the second time after it has found the first string.

File looks like this

1st String. Remove 1
Data
Data
2nd String
2nd String
1st String. Do not remove
Data
Data
2nd String
2nd String
1st String Remove 2
Data
Data
2nd String
2nd String

When I am finished I would want the file to look like

1st String
Data
Data
2nd String
2nd String

When the command runs it removes everything from the file under 1st String. Remove 1

Is this even possible to do in sed? Or should I be looking into using awk. I used sed as I have no experience in Awk and had some in sed.
The code that I am using is as in the second post above

 sed '/'"${var[$count]}"'/,/'"Con       ST"'q/d' file

Corona688 · January 25, 2012, 10:33am

I'd still suggest awk, since it allows you to do rational if/then/else statements in code blocks instead of being completely restricted to inscrutiable line-matching regular expressions. It has those, too, but you get to do what you want with them

But your outline here is still incomplete -- we still have no idea what the contents of your array are, hence, what relation your output has to your input..

Andy82 · January 25, 2012, 10:52am

In the example above the array would have 2 elements.
1st = Remove 1
2nd = Remove 2

These values come from another file that identifies the String 1 records that need to be removed. So on the separate file there would be no line that is equal to "Do not remove"

The array values are read using a loop.

So on the first pass through of the file the block of text
1st String. Remove 1
Data
Data
2nd String
2nd String
Should be removed. On the second pass the block of text
1st String Remove 2
Data
Data
2nd String
2nd String
Should be removed.

Leaving the block of text below untouched.
1st String. Do not remove
Data
Data
2nd String
2nd String

kaaliakahn · January 25, 2012, 11:11am

corona688 can you look at my post as well

THanks

Corona688 · January 25, 2012, 11:18am

Why do this in multiple passes? Why not do it in one pass? Do you want different output to appear in different files?

---------- Post updated at 10:18 AM ---------- Previous update was at 10:16 AM ----------

Crosspost. Oh, I see. That's good to know. Seeing the actual question always, always helps

kaaliakahn · January 25, 2012, 11:28am

Please see the post again i have updated it

Andy82 · January 25, 2012, 11:35am

Is this post for me?

I have to do multiple passes for each different value of the 1st String

Corona688 · January 25, 2012, 11:38am

Do you? Really? If you've got millions of lines as your other thread says, the performance would just be the pits, having to repeat the work 9 times...

Check your other thread.

Andy82 · January 25, 2012, 11:53am

That thread is from a different user with a slightly different problem.

The files that I will be processing will not be that large.

Corona688 · January 25, 2012, 12:01pm

So what? There's still no good reason to do it in 9 passes unless you absolutely have to...

---------- Post updated at 11:01 AM ---------- Previous update was at 10:57 AM ----------

Why is it removing the text that says "do not remove" anyway?

Andy82 · January 25, 2012, 12:14pm

Think there is a miss understanding.

It is not removing the text that says do not remove. It is leaving it untouched.

Can multiple ranges be passed into sed? I know there is the -e option but I did not think this worked with ranges.

It still leaves me with the issue of stoping sed when it finds the second string in the range.

Corona688 · January 25, 2012, 12:28pm

Your output data's not consistent with your input.

1st String. Remove 1

The stuff in red, you seem to want printed for the first match. But next time it happens, you don't?

It would help a lot if you used code tags. Right now it's extremely difficult to tell your input apart from your output.

---------- Post updated at 11:28 AM ---------- Previous update was at 11:20 AM ----------

Here's what I have now:

$ ARR=( "Remove 1" "Remove 2" )
$ OLDIFS="$IFS"
$ IFS="|"
$ cat filter.awk

BEGIN {
        split(I,ARR,"|");
        P=1
}

P {
        for(N in ARR)
        if(match($0, ARR[N]))
        {
                print substr($0, 1, RSTART-1);
                E=N;
                P=0
        }
} P

(!P) && /Do not remove/ { P=1; }

$ cat data

1st String. Remove 1
Data
Data
2nd String
2nd String
1st String. Do not remove
Data
Data
2nd String
2nd String
1st String Remove 2
Data
Data
2nd String
2nd String

$ awk -v I="${ARR[*]}" -f filter.awk data

1st String.
Data
Data
2nd String
2nd String
1st String

$