Hello,
My initial thought was that this would be straightforward, and something like this one line of awk
would work:
$ cat test.txt
1234 84 test1 file:about user_detials(bankUser) :jpeg 10 40 xys
1356 2 branches dir:list of files(dirlisting:1) :directory 20 80 abc
$ awk '{ gsub("\\(", " ") ; gsub("\\)", " ") ; print $1,$2,$3,$6,$8,$9,$10}' test.txt
1234 84 test1 bankUser 10 40 xys
1356 2 branches files :directory 20 80
$
So the idea is that we use awk
, first replacing the brackets with spaces, and then just do a straightforward print of the fields we need.
But as you'll have seen, the output we get doesn't quite match what you want. It turns out that the reason for this is that the number of fields is inconsistent, and changes from one line of your test data to the other.
To be specific, the first line has nine fields, whereas the second line has ten:
$ head -1 test.txt | awk '{print NF}'
9
$ tail -1 test.txt | awk '{print NF}'
10
Now others with greater familiarity with awk
than myself may well be able to still come up with an elegant one-line way to deal with this, but personally I would think that you have to make sure that the only spaces in your data are used strictly as field separators, or alternatively that any fields using spaces are at least quoted.
The following script would seem to do what you need:
$ cat script.sh
#!/bin/bash
file=test.txt
while read -r line
do
type1=`echo "$line" | awk -F\( '{print $2}' | awk -F\) '{print $1}'`
num1=`echo "$line" | awk -F\) '{print $2}' | awk '{print $2}'`
num2=`echo "$line" | awk -F\) '{print $2}' | awk '{print $3}'`
text1=`echo "$line" | awk -F\) '{print $2}' | awk '{print $4}'`
echo $line | awk -v type1=$type1 -v num1=$num1 -v num2=$num2 -v text1=$text1 '{print $1,$2,$3,type1,num1,num2,text1}'
done < "$file"
$ ./script.sh
1234 84 test1 bankUser 10 40 xys
1356 2 branches dirlisting:1 20 80 abc
$
So we use what field separators we can always count on (the brackets, or so I'm hoping !) to strip out the fields we need one at a time, then we build up our awk
line at the end.
Hope this helps. If not, then if you can let me know in what respects it fails we can take things from there. The main thing in general with this type of problem is to make your input as consistently- and reliably-formatted as you possibly can, always.