Extract multiple line text from log between special words

Hi all,

I have a text file (log), it prints some SQL statements in the log. It is a bit irregular, I need to extract SQL statement into as separate file.

  1. Some statements are printed as just SQL statements, in multiple lines.
  2. Some SQL statements are printed line by line but after log details at the beginning of the line. So somehow I need to ignore the first part of the line but need to keep query.

What I found as a pattern is, all SQL statements have "DISTINCT" keyword in the beginning
and after each statement there are "quit;" statements.
This doesn't have to be at the beginning of the lines, it's enough for me to extract text between DISTINCT and quit.

This is where I confused and couldn't find a way to solve it with sed or awk.

Sample log structure 1:

/* clear syscc before new query */
%let syscc=0;
%let sysrc=0;
/* MAmodel_C */
SELECT  DISTINCT
        table0.SUBSCRIBER_KEY AS SUBSCRIBER_KEY LABEL='Maestro Subscriber Key' FORMAT=F13.0
FROM
        CMD.ACCOUNTS_MAESTRO table0 Inner join CMD.PRODUCT_MAESTRO table1 on  ( ( (table1.ACCOUNT_ID = table0.ACCOUNT_ID AND table1.ACCOUNT_KEY = table0.ACCOUNT_KEY)  AND table1.COMPANY_NUMBER = table0.COMPANY_NUMBER)  AND table1.SUBSCRIBER_KEY = table0.SUBSCRIBER_KEY)
WHERE
        table1.PRODUCT_DISCOUNT_CODE IS NULL
) table2 ;
quit;
-----------------------------------------------------
2020-03-17 09:46:11,531 INFO 

Structure 2: (each line is after log details)


2020-03-17 09:45:57,733 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B: SELECT  DISTINCT
2020-03-17 09:45:57,733 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B:        table0.SUBSCRIBER_KEY AS SUBSCRIBER_KEY LABEL='Maestro Subscriber Key' FORMAT=F13.0
2020-03-17 09:45:57,733 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B: FROM
2020-03-17 09:45:57,733 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B:        CMDM.ACCOUNTS_MAESTRO table0 Inner join CMDM.PRODUCT_MAESTRO table1 on  ( ( (table1.ACCOUNT_ID = table0.ACCOUNT_ID AND table1.ACCOUNT_KEY = table0.ACCOUNT_KEY)  AND table1.COMPANY_NUMBER = table0.COMPANY_NUMBER)  AND table1.SUBSCRIBER_KEY = table0.SUBSCRIBER_KEY)
2020-03-17 09:45:57,734 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B: WHERE
2020-03-17 09:45:57,734 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B:        table1.PRODUCT_DISCOUNT_CODE IS NULL
2020-03-17 09:45:57,734 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B: ) table2 ;
2020-03-17 09:45:57,734 INFO  [Exec-11] [02205e834:-68e0] [user1140] .execution.instructions - 2FD66078-1B0B: quit;

Hi
I have problems with English so answer what is wrong in the output of this code?

sed -n '/DISTINCT/,/quit/s/^.*: //p' file

If you showed your achievements, it would be much clearer

Try this adaption of nezabudka's proposal to support both structures:

sed -n '/DISTINCT/,/quit/{s/^.*: //; p}' file

, or try also

awk '/202[0-9 :,-]*/ {sub ("^.*" $9, "")} /SELECT/, /quit/' file
2 Likes