Hello..
I am currently learning sed and have found myself in some trouble..
I wrote this command:
sed -ne 's/[^-<>]*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*/\1\2/p'
and some of the output i get is :
->stockholm->paris<-stockholmpi<-tokyo->paris<-stockholmpi
->stockholm<-stockholm->tokyo<-tokyo<-paris->stockholmtao
->paris<-stockholm<-tokyo<-paris<-tokyo->stockholm
<-tokyo<-stockholm->tokyo<-tokyo->stockholm->paris
As you can see, at the very end, it does not end with stockholm/paris/tokyo, because it still matches those extra letters because of my patter, now, how would I change my pattern to avoid these troubles ?
I tried (stockholm|tokyo|paris) but then I dont get the last city, stockholmpi for example (it should be stockholm only).
EDIT: Here is some of the data I use:
Wed3.14153<-paris<-stockholm->tokyo'->paris<-stockholm->parisphi$$
fubartao<-tokyo<-stockholm<-tokyoJul->paris->tokyo<-parisRed3.14153
$chi<-tokyo<-paris<-stockholmMar->tokyo<-stockholm->tokyoGreen
Feb3.14153<-tokyo->tokyo<-parisBLACK<-paris<-tokyo->tokyoMar
1011102.8<-stockholm<-tokyo<-tokyoblah<-stockholm<-stockholm<-tokyo3.14153001111
taoBLACK<-tokyo->paris->paris ->stockholm<-paris->stockholmThu3.14153
MayJun<-paris->paris<-stockholmSun->stockholm->tokyo->stockholm011011Green
NILLNULL->tokyo<-paris<-parisSep->stockholm->tokyo<-parisJunFri
AugFeb->stockholm<-stockholm->parisBLACK<-tokyo<-paris<-tokyoVOIDpi
<-paris->paris->parisfoo->stockholm->paris->stockholm$NULL
chi3.14153<-paris<-paris<-tokyofoo<-stockholm<-paris->stockholm`100110
foo$$<-tokyo<-stockholm<-stockholm101101<-paris<-tokyo<-tokyo"Purple
fubarPurple->tokyo<-paris->paris ->tokyo<-paris<-tokyo`3.14
BlueMay->paris->stockholm<-stockholmVOID->stockholm->paris<-tokyoYellowphi
0101002.8<-tokyo->paris<-tokyotao<-tokyo<-tokyo->stockholmfooNULL
RedWed->paris->paris<-stockholmNILL<-tokyo<-paris->tokyoPurple
100100$$$->paris->paris<-tokyo001011<-paris->paris->tokyoMonSep
Jan010001->paris->paris<-stockholmAug->tokyo<-paris->stockholmPurpleSep
->paris->paris<-tokyoblah<-stockholm<-stockholm<-paris010001tao
Purplefubar->stockholm<-paris->tokyoDec->paris->stockholm->tokyo$3.1415
010001->paris<-stockholm->tokyoVOID->tokyo<-stockholm<-tokyoMarFeb
SunFri->tokyo->paris<-tokyoJan->paris<-stockholm->tokyoWHITEMon
EDIT After RudiC's post:
Okay so the logic behind this pattern is,
- It starts with a '->' or a '<-' followed by a city, example; ->tokyo.
- After the city comes another arrow followed by another city, example; ->tokyo->paris.
- Then again, an arrow, followed by a city, example; ->tokyo->paris<-tokyo.
- Then some random texts come between, if you look at the last line in the data ive posted, you can see that after " ->tokyo->paris<-tokyo" comes "Jan" which is random text, we dont want this.
- Then we meet our pattern again, same pattern as the previous.
This is the ideal result: ->tokyo->paris<-tokyo->paris<-stockholm->tokyo
Which I do get on this specific line, but on some other lines I get output like this:
->stockholm->paris<-stockholmpi<-tokyo->paris<-stockholmpi
And we see that the third city has two extra letters (pi) and the last city, has two extra letters (pi), that is because in my pattern i write :
[stockholm,paris,tokyo]*
which in turn matches 'p' and 'i' from paris.
Now how would I force sed to choose between the exact strings I provided, which is stockholm,paris and tokyo ?
EDIT: Solved it by using parantheses. Here is the solution:
sed -ne 's/[^-<>]*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}
\(stockholm\|paris\|tokyo\)\{1\}\).*/{Phil}2053,\1{5872Phil}\2[->->]/p' datasets/q14target.txt