awk to combine lines from line with pattern match to a line that ends in a pattern

Wes_Kem · February 20, 2016, 7:39pm

I am trying to combine lines with these conditions:

First line starts with text of "libname VALUE db2 datasrc" where VALUE can be any text.
If condition1 is met then continue to combine lines through a line that ends with a semicolon.
Ignore case when matching patterns and remove any leading spaces from line when joining.

I have tried to code this using awk or sed without success.

Input file:

libname &wrk_schema DB2 database = %sysget( DB2DBDFT ) schema = &wrk_schema read_isolation_level = ur ;
libname schema db2 datasrc=%sysfunc(sysget(DB2DBDFT)) schema=&qmt_schema read_isolation_level=ur;
libname db2lib db2 datasrc=crd_prod ;
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT)) ril=ur;
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT));
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT))
            schema = &build_schema
            access = readonly
            connection = globalread
            read_isolation_level=ur;
libname server2 db2 datasrc=%sysfunc(sysget(DB2DBDFT))
             schema = &build_schema2
             access = readonly
             connection = globalread
             read_isolation_level=ur;

Desired Output file:

libname &wrk_schema DB2 database = %sysget( DB2DBDFT ) schema = &wrk_schema read_isolation_level = ur ;
libname schema db2 datasrc=%sysfunc(sysget(DB2DBDFT)) schema=&qmt_schema read_isolation_level=ur;
libname db2lib db2 datasrc=crd_prod ;
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT)) ril=ur;
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT));
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT)) schema = &build_schema access = readonly connection = globalread read_isolation_level=ur; 
libname server2 db2 datasrc=%sysfunc(sysget(DB2DBDFT)) schema = &build_schema2 access = readonly connection = globalread read_isolation_level=ur;

Don_Cragun · February 21, 2016, 2:12am

What awk and sed code have you tried to solve this problem?

Wes_Kem · February 21, 2016, 9:06pm

This is awk I tried by it is looping theought the last line of the file.

awk 'BEGIN{IGNORECASE=1} /libname/&&/DB2 datasrc/ {print;while($0!~ /:;/){getline;print;}}' file

Don_Cragun · February 21, 2016, 10:17pm

Your code doesn't make an exception for lines containing libname , db2 and datasrc that already end with a semicolon. It doesn't verify that libname is at the start of a line, doesn't verify that db2 is in the 3rd field, and looks for a colon immediately followed by a semicolon (which never appears in your sample input) to end the set of lines being joined.

You might want to try something more like:

#!/bin/ksh
awk '
$1 ~ "^[Ll][Ii][Bb][Nn][Aa][Mm][Ee]$" && $3 ~ "^[Dd][Bb]2$" &&
$4 ~ /^[Dd][Aa][Tt][Aa][Ss][Rr][Cc]=/ && $0 !~ /;$/ {
	printf("%s ", $0)
	j = 1
	next
}
j {	for(i = 1; i < NF; i++)
		printf("%s ", $i)
	printf("%s%s", $NF, (j = ($NF !~ /;$/)) ? " " : "\n")
	next
}
1' file

The awk IGNORECASE variable works in some versions of awk , but it is not in the standards and several standards-conforming versions of awk (including the awk on BSD and OS X systems) do not provide that extension. The code above works with any standards-conforming version of awk , but obviously needs more complicated regular expressions to perform case-insensitive matches.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xgp4/bin/awk or nawk .

RudiC · February 22, 2016, 2:51am

Would this help?

awk '
tolower(T = $0) ~ "^libname [^ ]+ db2 datasrc" \
        {while (! /;$/) {getline X
                         sub (/^ +/, " ", X)
                         $0 = $0 X
                        }
        }
1
' file
libname &wrk_schema DB2 database = %sysget( DB2DBDFT ) schema = &wrk_schema read_isolation_level = ur ;
libname schema db2 datasrc=%sysfunc(sysget(DB2DBDFT)) schema=&qmt_schema read_isolation_level=ur;
libname db2lib db2 datasrc=crd_prod ;
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT)) ril=ur;
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT));
libname server db2 datasrc=%sysfunc(sysget(DB2DBDFT)) schema = &build_schema access = readonly connection = globalread read_isolation_level=ur;
libname server2 db2 datasrc=%sysfunc(sysget(DB2DBDFT)) schema = &build_schema2 access = readonly connection = globalread read_isolation_level=ur;

Wes_Kem · February 23, 2016, 7:11pm

Don -Thanks for the code! It works great!

I did modify to use IGNORECASE=1 and it handled case differences OK.

I also added the code to be in a loop so it will made the code changes to all code beginning with a given prefix.

 
 #!/bin/ksh
 #Combine lines that start with "libname VALUE db2 datasrc" and does not end in semicolon
for f in pre*; do
awk 'BEGIN{IGNORECASE=1} /libname/&&/DB2/ && $0 !~ /;$/ {
 printf("%s ", $0)
 j = 1
 next
}
j { for(i = 1; i < NF; i++)
  printf("%s ", $i)
 printf("%s%s", $NF, (j = ($NF !~ /;$/)) ? " " : "\n")
 next
}
1' "$f" > fifo &&
mv fifo $f
done