Extract the text between the nth occurrence of square brackets

Subhadeep_Sahu · June 12, 2013, 10:14am

Please can someone help with this?

I have a file with lines as follows:

word1 word2 word3 [11231] word4 word5 word6 [242] word7 word8
word1 word2 word3 [34534] word4 word5 word6 [34] word7 word8
word1 word2 word3 [345] word4 word5 word6 [2354] word7 word8
word1 word2 word3 [2342354123] word4 word5 word6 [23441] word7 word8

When I use the command

perl -lne 'print $1 while (/\[(.*?)\]/g)

'

I get the results as follows.

How do I modify my script to get only the numbers inside the first set of square brackets, giving an output as follows?

Also, how can I get only the numbers inside the second set of square brackets, giving an output as follows?

It isn't necessary to use perl, I can do with sed/awk, or anything else.

Yoda · June 12, 2013, 10:25am

Using awk:

awk -F'[][]' -v n=1 '{ print $(2*n) }' file

Change n value as per your requirement (1 for 1st set, 2 for 2nd set...)

jim_mcnamara · June 12, 2013, 10:32am

If you example is correct - meaning the first [123] is always field 4 and the second [123] is always field 8:

tr  -d '[:punct:]' < inputfile | awk '{print $4 > "outfile1";print $8 > "outfile2"}'

This creates two files: outfile1 with the first brackets value, outfile2 with the second brackets output value.

Subhadeep_Sahu · June 12, 2013, 11:14am

Thanks Yoda. I am getting the following error.
awk: syntax error near line 1
awk: bailing out near line 1
Any clue what that means?

---------- Post updated at 10:14 AM ---------- Previous update was at 10:13 AM ----------

Sorry Jim, the fields in square brackets are not in the same position on every line.

pamu · June 12, 2013, 11:22am

try

use var=1 - for print first square bracket values.
use var=2 - for print second square bracket values.

nawk -F "[][]" -v var="2" '{print $(var*2)}'  file

Yoda · June 12, 2013, 11:26am

Use /usr/xpg4/bin/awk instead on SunOS or Solaris.

Subhadeep_Sahu · June 12, 2013, 11:33am

Thanks a lot, Yoda. It's working now.

elixir_sinari · June 12, 2013, 12:57pm

Using Perl:

perl -lne 'print +(/\[(.*?)\]/g)[0]' file

Change the subscript to print the required match. Remember that array/list subscripting begins with 0.