Shell script to split data with a delimiter having chars and special chars

Hi Team,

I have a file a1.txt with data as follows.

dfjakjf...asdfkasj</EnableQuotedIDs><SQL><SelectStatement modified='1' type='string'><![CDATA[ SELECT

The delimiter string:

<SelectStatement modified='1' type='string'><![CDATA[
dlm="<SelectStatement modified='1' type='string'><![CDATA["
head -1 a1.txt | awk -F"$dlm" '{print $2}'

The above command is not working if we have multiple chars + special chars as delimiter.
Expected output is as follows.

SELECT

Can anyone please me to fix this issue?

Thanks
Krishna

WHAT "is not working"? Any error messages?

If it's about awk not being happy with the field separator try escaping the square brackets:

dlm="<SelectStatement modified='1' type='string'><\!\\[CDATA\\["
awk -F"$dlm" '{print $2}' file
 SELECT

Yes Sir. Still not working. Here's the exec msg's.

-sh-4.2$ dlm="<SelectStatement modified='1' type='string'><\!\\[CDATA\\["
-sh-4.2$ head -1 T24CustAuthSignerRlshpToXfmLoad.sql.txt
<?xml version='1.0' encoding='UTF-16'?><Properties version='1.1'><Common><Context type='int'>1</Context><![CDATA[0]]></EnableQuotedIDs><SQL><SelectStatement modified='1' type='string'><![CDATA[SELECT
-sh-4.2$ head -1 T24CustAuthSignerRlshpToXfmLoad.sql.txt | awk -F"${dlm}" '{print $2}'
awk: warning: escape sequence `\!' treated as plain `!'
awk: warning: escape sequence `\[' treated as plain `['
awk: fatal: Unmatched [ or [^: /<SelectStatement modified='1' type='string'><![CDATA[/
-sh-4.2$
-sh-4.2$ uname -a
Linux  3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
-sh-4.2$
dlm="<SelectStatement modified='1' type='string'><![CDATA["
awk 'NR==1 && index($0, dlm) {print substr($0, index($0, dlm) + length(dlm))}' dlm="$dlm" a1.txt

Thank you for your reply rdxtr1. It still did not work.

Here's the error.

-sh-4.2$ dlm="<SelectStatement modified='1' type='string'><![CDATA["
-sh: ![CDATA[": event not found
-sh-4.2$

In bash - when used interactively - you need to turn off history expansion/substitution:

set +H

to keep the shell from interpreting the ! character within double quotes

1 Like

Thank you Scrutinizer. It worked.

Similarly I have another scenario. The last line of the record is as follows.

  FROM XYZ.[dbo].[STG_PRQ_UVW] ]]><ReadStatementFromFile type='bool'><![CDATA[0]]></ReadStatementFromFile><Tables collapsed='1'></Tables><Parameters collapsed='1'></Parameters><Columns collapsed='1'></Columns></SelectStatement><EnablePartitioning collapsed='1' type='bool'><![CDATA[0]]></EnablePartitioning></SQL><Transaction><RecordCount modified='1' type='int'><![CDATA[20000]]></RecordCount><EndOfWave collapsed='1' type='int'><![CDATA[0]]></EndOfWave></Transaction><Session><IsolationLevel type='int'><![CDATA[1]]></IsolationLevel><AutocommitMode type='int'><![CDATA[0]]></AutocommitMode><ArraySize modified='1' type='int'><![CDATA[20000]]></ArraySize><SchemaReconciliation><FailOnSizeMismatch type='bool'><![CDATA[1]]></FailOnSizeMismatch><FailOnTypeMismatch type='bool'><![CDATA[1]]></FailOnTypeMismatch><FailOnCodePageMismatch type='bool'><![CDATA[0]]></FailOnCodePageMismatch></SchemaReconciliation><PassLobLocator collapsed='1' type='bool'><![CDATA[0]]></PassLobLocator><CodePage collapsed='1' type='int'><![CDATA[0]]></CodePage></Session><BeforeAfter collapsed='1' type='bool'><![CDATA[0]]></BeforeAfter><LimitRows collapsed='1' type='bool'><![CDATA[0]]></LimitRows></Usage></Properties >

The output should be as follows.

  FROM XYZ.[dbo].[STG_PRQ_UVW]

We need to look for the code snippet "]]><ReadStatementFromFile type" and strip out the text before the snippet.

I have tried tweaking the suggested awk command, it did not work. Can you please help me out?

dlm="<SelectStatement modified='1' type='string'><![CDATA["
dlm2="]]><ReadStatementFromFile type"
awk '
NR==1 && index($0, dlm) {print substr($0, index($0, dlm) + length(dlm))}
index($0, dlm2) {print substr($0, 1, index($0, dlm2)-1)}
' dlm="$dlm" dlm2="$dlm2" a1.txt