Extract some characters from lines based on pattern

mad_man · May 17, 2017, 7:34am

Hi All,

i would like to get some help regarding extracting certain characters from a line grepped.

blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah
blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah
blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah

In the above snap shot i am going to grep for tag '{1:F' and i wanted to get the 3 characters after tag '{2:I' the value '103'.

The tags '{1:F' & '{2:I' are fixed so based on this i wanted to get the 3 chars coming after '{2:I'. So my motivation is to get 3 chars after tag '{2:I'.

Either sed or awk wil be better.

I tried something like

 awk '/{2:I/{print $NF}'

sed -n -e '/{1:F/ s/.*\{2:I *//p'

But not fruitful.

Note : I am using AIX 6.0 version UNIX.

Thanks in advance

bakunin · May 17, 2017, 8:19am

Is the format of the lines fixed? (That is, will "{1:F" and "{2:I" always appear at a certain column?) If so, you won't even need to search for it.

First, you won't need grep because sed can do everything grep can do too:

sed -n '/{1:F/p' /your/file

If you need "{1:F" to appear on a certain column, modify the line like this:

sed -n '/^.\{xx\}{1:F/p' /your/file

where the red "xx" is a number representing the number of characters befor the search string.

In the same way you output the characters in question via a so-called "back-reference":

sed -n '/^.\{xx\}{1:F/ s/^\(.\{yy\}{2:I\)\(...\).*/\2/p;' /your/file

Again, enter sensible integer values for the "xx" and "yy" marked red.

I hope this helps.

bakunin

mad_man · May 17, 2017, 8:40am

Hi,

Thanks for your reply. Actually the tags i mentioned are fixed in position but nobody knows when they might change the position for future requirements. So i want to fetch based on the {2:I tag only.

Thanks.

Aia · May 17, 2017, 9:29am

 perl -nle '/\{2:I(\d{3})/ and print $1' mad_man.example

mad_man · May 17, 2017, 10:27am

Hi,

I am going to try both the solutions tomorrow and will let you know the results.

Thanks for the reply.

RudiC · May 17, 2017, 4:35pm

Try also

awk 'match ($0, /{1:F.*2:I.../) {print substr ($0, RSTART+RLENGTH-3,3)}' file
103
103
103

RavinderSingh13 · May 17, 2017, 4:46pm

Hello mad_man,

Could you please try following and let me know if this helps you.
Solution 1st:

awk '/1:F.*2:I/{sub(/.*2:I/,"");print substr($0,1,3)}'   Input_file

Solution 2nd:

awk '/1:F.*2:I/{print substr($0,index($0,"2:I")+3,3)}'   Input_file

Thanks,
R. Singh

MadeInGermany · May 18, 2017, 1:49am

In sed you need to mark the wanted 3 chars in a  and restore them via a back reference.

sed -n '/{1:F/ s/.*{2:I\(...\).*/\1/p'

mad_man · May 18, 2017, 9:44am

bakunin:

Is the format of the lines fixed? (That is, will "{1:F" and "{2:I" always appear at a certain column?) If so, you won't even need to search for it.

First, you won't need grep because sed can do everything grep can do too:
sed -n '/{1:F/p' /your/file
If you need "{1:F" to appear on a certain column, modify the line like this:
sed -n '/^.\{xx\}{1:F/p' /your/file
where the red "xx" is a number representing the number of characters befor the search string.

In the same way you output the characters in question via a so-called "back-reference":
sed -n '/^.\{xx\}{1:F/ s/^$.\{yy\}{2:I$$...$.*/\2/p;' /your/file
Again, enter sensible integer values for the "xx" and "yy" marked red.

I hope this helps.

bakunin

Hi I tried this command like below

document_type=`sed -n '/^.\{30\}{1:F/ s/^\(.\{30\}{2:I\)\(...\).*/\2/p;' $eachfile`

eachfile was the file which has the file name path. '30' was the position of the required value tag {2:I. This gave me empty value in the document type field. I am not sure the value in command i entered correct or not.

Thanks for your reply.

---------- Post updated at 06:30 PM ---------- Previous update was at 06:28 PM ----------

Hi,

I tried this command as like below it gave me the exact value '103' i was looking for.

document_type=`perl -nle '/\{2:I(\d{3})/ and print $1' $eachfile | sort | uniq | sed -e "s/^[ ]*//g" | sed -e "s/[ ]*$//g`

Thanks.

---------- Post updated at 07:14 PM ---------- Previous update was at 06:30 PM ----------

Hi,

I tried this command as like below it gave me the exact value '103' i was looking for.

Code:

document_type=`awk 'match ($0, /{1:F.*2:I.../) {print substr ($0, RSTART+RLENGTH-3,3)}' $eachfile | sort | uniq | sed -e "s/^[ ]*//g" | sed -e "s/[ ]*$//g`

Thanks.

RudiC · May 18, 2017, 9:50am

Why the loooong pipe? If you want to remove leading spaces, do so in awk . If you want to remove trailing spaces, do so in awk . If you want unique values, do so in awk .

mad_man · May 18, 2017, 9:56am

ravindersingh13:

Hello mad_man,

Could you please try following and let me know if this helps you.
Solution 1st:
awk '/1:F.*2:I/{sub(/.*2:I/,"");print substr($0,1,3)}'   Input_file
Solution 2nd:
awk '/1:F.*2:I/{print substr($0,index($0,"2:I")+3,3)}'   Input_file
Thanks,
R. Singh

Hi

I tried both of these commands i got the value '103' required.

Thanks.

---------- Post updated at 07:26 PM ---------- Previous update was at 07:21 PM ----------

Hi All,

Thanks for your help. I got the solutions.

Thanks a ton.