$ cat infile
this is spam [i need this] and i need this too
this is spam [i need this][this is spam] and i need this too
$ perl -nwe '$_ =~ /[^\]]+ \[([^\]]+)\]\[?[^\]]*\]? ([^\]\[]+)$/; print "$1 - $2\n";' infile
i need this - too
i need this - and i need this too
I am not sure how many occurences of these square brackets will show up, at the moment I assume it is 1 for sure, maybe 2, but I always need the 1st and the complete text behind the last closing square bracket. As you can see for the 1st line, this doesn't work.
I'm not familiar with perl regex, but you could do it it sed by using 2 substitutions:
$ cat ~/tmp/file.txt
this is spam [i need this] and i need this too
this is spam [i need this][but not this] and i need this too
this is spam [i need this][but not this] [or this ] and i need this too
$ sed 's/^[^[]*\[\([^]]*\)\]/\1/g; s/\[[^]]*\]//g' ~/tmp/file.txt
i need this and i need this too
i need this and i need this too
i need this and i need this too
@neutronscott:
Because it needs to be run on different platforms which includes Windows, I will have to use Perl - so I can't use awk, sorry I missed to point this out.
@CarloM:
I am curious if I can solve it in one expression. Maybe other gods of RegExp shed some light I am stuck at the moment. If there is no other way, I will use the 2 separate statements.
perl -nwe '@a=split(/\[|\]/,$_); print "$a[1] - $a[$#a]\n";' infile
i need this - and i need this too
i need this - and i need this too
i need this - and i need this too
infile
this is spam [i need this] and i need this too
this is spam [i need this][this is spam] and i need this too
this is spam [i need this][this is spam][this is spam too] and i need this too
I decided to use neutronscott's solution, which I understand except the effect of these two expressions:
sed 's/[^[]*[[]\([^]]*\)]\([[][^]]*]\)*\([^]]*\)$/\1 -- \3/g' infile
^^^ ^^^
A group which consists of a single square bracket? I would have written the single square bracket without the enclosure but this does not work obviously.
If you understood the meaning of a repetition of a non-matching bracket expression (such as [^[]* which matches zero or more occurrences of any character except [ ), I'm surprised that you didn't understand the meaning of the matching bracket expression [[] which matches one occurrence of the [ character. Similarly, [^]] matches any character other than ] and []] matches a ] .
You have to use the bracket expression [[] or escape the opening bracket \[ to distinguish it as a character to be matched (rather than the start of a bracket expression). In some contexts, you do not need to use a bracket expression or an escape to specify a closing bracket, but the meaning is is the same if you use []] and it is symmetric with the [[] if you have an editor that pairs up opening and closing parentheses, braces, and brackets.
Ok, I would have written it escaped. So far I never used the grouping to avoid escaping - I was not aware this is an "allowed" usage of the square brackets.
It's now clear to me, thanks.
Both suggestions could be reduced a bit, when taking into account sed's greedy matchin property, notably \[.*\] matches everything between the first square bracket until the last, from the point of where sed's is looking at that moment. Thus:
CarloM's approach, with the two dashes inserted:
sed 's/[^[]*\[\([^]]*\)\]/\1 --/; s/\[.*\]//' file
And NeutronScott's approach..
sed 's/[^[]*[[]\([^]]*\)]\(\[.*\]\)*/\1 --/' file
(the original will fail with more than two square bracket episodes)
Yes, sorry. It's just clever escaping. Like one would ps | grep omething . I began to use this more because in awk, depending on quoting/context, often times you need to escape your escapes since they're really processed twice, and it gets ugly so I tend to avoid \ when possible now.
The * after the grouping allows it to repeat. But I see mine doesn't perform correctly in the last two cases.
mute@thedoctor:~$ cat input
this is spam [i need this][this is spam][another one][last] and i need this too
this is spam [i need this] probably everything [here] too
this is spam [i need this] and probably i need ] everything here too?
mute@thedoctor:~$ sed 's/[^[]*[[]\([^]]*\)]\([[][^]]*]\)*\([^]]*\)$/\1 -- \3/g' input
i need this -- and i need this too
this is spam [here -- too
this is spam [i need this] and probably i need ] everything here too?
mute@thedoctor:~$ sed 's/[^[]*[[]\([^]]*\)]\(\[.*\]\)*/\1 --/' input
i need this -- and i need this too
i need this -- probably everything [here] too
i need this -- and probably i need ] everything here too?
I also didn't think of not needing to match & sub the last part. That's definitely shorter.
Note that that isn't just slightly shorter, it also corrects the output to match Zaxxon's requirement - my original produced different output since it left any non-leading text not inside brackets.