Grep for multiple string

ahfze · February 9, 2018, 3:09am

im learning grep and i need some help.

i have input word file like this

fish map.this is
go to.that is

i want sed , awk or grep command to
extract the following in this format

someword SPACE someword.

so output will be

fish map.
go to.

RudiC · February 9, 2018, 3:19am

No hard criteria given... try

sed 's/\..*$/./' file
fish map.
go to.

ahfze · February 9, 2018, 3:26am

thanks.
ok ill be a bit more specific.
input is a huge text file with lots of sentences.

fish map.this is (Kubernetes, Mesos DC/OS, and Docker Swarm) and Azure Service Fabric are indispensable for any
production-ready microservice-based application and for any multi-container applica
go to.that is needs are moving you toward complex containerized apps, you will find it useful to seek out
additional resources for learning more about orchestrators

now i want output to be

fish map.
go to.

with your command the output is now

fish map.
production-ready microservice-based application and for any multi-container applica
go to.
additional resources for learning more about orchestrators

can you do anything about this?

RudiC · February 9, 2018, 3:50am

You see the importance of a careful, detailed, and precise specification. Try

sed -n '/\..*$/s//./p' file

or even

sed -n 's/\..*$/./p' file

abdulbadii · February 9, 2018, 8:52am

use extension regular expression option:

sed -E 's/^(\w+)\s+(\w+\.).*$/\1 \2/g' file

ahfze · February 12, 2018, 11:36am

guys i tried all the commands you gave me but it fails if the input file is a bit long and complex(longer sentences).

i am trying for this input

A circumferential abdominoplasty is an extended abdominoplasty plus a d lift. The resulting scar runs all the way around the body, and the operation is also called a Belt Lipectomy or lower body lift. This operation is most appropriate for patients who have undergone massive weight loss.
      -only Pharmacokinetic dataProtein binding
Esterases, CYP3A4, SULT2A1[1]Biological half-life
12 ± 5 hours[1]Excretion

output should be

d lift.
body lift.
weight loss.

please note the output should be in format

someword SPACE someword.

can you guys help?

RudiC · February 12, 2018, 11:47am

The commands don't fail, esp. NOT for long or complex lines, they behave exactly as specified previously. This is the second time you change your mind. Will this now be the final version? Phrased like: find the second last and last word before any full stop and print them separated by a space character.

ahfze · February 12, 2018, 12:18pm

yes this perfect.
sorry im not changing my mind but just trying to get the correct command.
im new here so apologies for any inconvenience caused.

RudiC · February 12, 2018, 12:22pm

I only can repeat what I said in post#4. Your chances to get reasonable help increase with the amount of carefully gathered details in your spec.
Try

sed -rn 's/([^ ]* [^ ]*\.)/\n\n\1\n/; T; s/\n$//; s/^.*\n\n//; P; D;' file
d lift.
body lift.
weight loss.

ahfze · February 12, 2018, 12:26pm

yes this is exactly what i want.
sorry about the confusion and thanks for helping again.

---------- Post updated at 12:26 PM ---------- Previous update was at 12:22 PM ----------

ok checking the command now.

abdulbadii · February 12, 2018, 6:10pm

@ahfze .. cause you said firstly in example, word is at first line, then differ it. Sed not works, use grep PCRE regex instead:

$ echo 'A circumferential abdominoplasty is an extended abdominoplasty plus a d lift. The resulting scar runs all the way around the body, and the operation is also called a Belt Lipectomy or lower body lift. This operation is most appropriate for patients who have undergone massive weight loss.' >string

$ grep -Po -e '(\w+)\s+(\w+\.)' string
d lift.
body lift.
weight loss.

ahfze · February 13, 2018, 3:26am

thanks abdulbadii
the grep code you provided works better.
much appreciated