Grabbing a sub section of a file between 2 specific values

Big_Jeffrey · March 4, 2020, 11:38am

Hi. I have a file that contains one continuous line of output which is made up of multiple repeating sections of code with just some of the values being unique. No spaces and no carriage returns.

Effectively the file could be divided up into multiple repeating blocks of info. I need to grab the sub section of the line around one of these values.
So, I have unique value for HostName (test12-42213.devserver.com) and I need all the info from that value until the first occurrence of a non-unique delimiter called "EndTime".

I have tried the following but it's not filtering anything out. It only seems to work on simple strings:

cat example.out | sed 's/.*"test12-42213.devserver.com"\(.*\)"EndTime"/\1/'

An example of the file with one continuous line would be (no spaces and no carriage returns):

Translation,occurs,if,-d,is,not,given,and,both,SET1,and,SET2,appear.-tmaybeusedonlywhentranslating.SET2,is,extended,to,length,of"SET1",
by,repeating,its,last,character:as:necessary,"HostName":"test12-42213.devserver.com",Excess_characters_of,SET2,are,ignored.Only[:lower:]and[:upper:]are,guaranteed,
to,expand,in,ascending:order;"EndTime":null}Translation,occurs,if,-d,is,not,given,and,both,SET1,and,SET2,appear.-tmaybeusedonlywhentranslating.
SET2,is,extended,to,length,of"SET1",by,repeating,its,last,character:as:necessary,"HostName":"test99-9999.devserver.com",Excess_characters_of,SET2,are,ignored.
Only[:lower:]and[:upper:]are,guaranteed,to,expand,in,ascending:order;"EndTime":null}

Scrutinizer · March 4, 2020, 12:58pm

Unix utilities typically need a linefeed at the end of the line. You could try awk:

awk '/test12-42213.devserver.com/,/EndTime/; END{printf "\n"}' RS=, ORS=, file

or

awk -v s="test12-42213.devserver.com" '$0~s,/EndTime/; END{printf "\n"}' RS=, ORS=, file

Does that produce output?

Big_Jeffrey · March 4, 2020, 1:06pm

Yes it does indeed. Many many thanks for your help!

Scrutinizer · March 4, 2020, 2:14pm

You're welcome..

Awk is somewhat unique in the sense that it allows you to specify a different record separator, other than the typical newline, which is absent in your case.

By specifying a different record separator, a comma in this case ( RS=, ), most awks are able to work around this. They thus chop up the line in smaller pieces, that do not exceed maximum line length, even though strictly speaking a file without a closing newline is not in Unix file format (either this is why the other utilities do not produce output, or because the line-length limit is exceeded*).

By also specify a comma as output separator ( ORS=, ) , the comma-separated records are printed in a single comma-separated line. The necessary closing newline character is then provided in the END section..

S.

--

Strictly speaking, according to the standards, awk is not required to be able to interpret files without a closing newline terminator, but in my experience most, if not all versions do, as long as a different record separator is used and the resulting record length does not exceed line length limitations.

RudiC · March 4, 2020, 4:10pm

Your own approach wasn't too far off, either. You just need to make sure it stops after the first occurrence of "EndTime" and suppresses the rest of the line. Like

sed 's/.*"test12-42213.devserver.com"\([^}]*\)"EndTime".*$/\1\n/' file
,Excess_characters_of,SET2,are,ignored.Only[:lower:]and[:upper:]are,guaranteed,to,expand,in,ascending:order;