How to delete everything present on left (or right) of a substring?

Hello,

I have list of lines from which i am trying to pick a sub string and want to put that into a csv file, the sub string i want to extract is at the middle of the line, i was wondering how can i delete everything that is present left/right of a sub string. I have tried sed, cut and awk, but i couldn't get desired results.

Below are few lines, from which i am trying to pick PXCNUMBER_1 and append them to a csv file.

> grep -hrs "PROTOBUF_DIRS" --include=*.{spec,mk} $REPOROOT | grep PXC | sort | uniq                              
PROTOBUF_DIRS += $(GIT_TOP)/AIS_MSGS_PXC2010286_1/inc
PROTOBUF_DIRS += $(GIT_TOP)/C2RCI_PXC1106956_1/ifModel
PROTOBUF_DIRS += $(GIT_TOP)/FRUPLI_PXC1107046_1/ifModel
PROTOBUF_DIRS += $(GIT_TOP)/GPBEXTENSIONS_PXC2010263_1/ifModel
PROTOBUF_DIRS += $(GIT_TOP)/ICEUI_PXC2010238_1/ifModel
PROTOBUF_DIRS += $(GIT_TOP)/L1PMI_PXC1107130_1/inc
PROTOBUF_DIRS += $(GIT_TOP)/UHLI_PXC2010327_1/ifModel
PROTOBUF_DIRS += $(GIT_TOP)/URI_PXC2010247_1/ifModel

using awk and cut i got this result, due to some inconsistency in the lines i failed to pick the PXCNUMBERS_1, precisely as you could see there is MSGS_ in the last line before PXCNUMBERS_1 .

> grep -hrs "PROTOBUF_DIRS" --include=*.{spec,mk} $REPOROOT | grep PXC | awk -F '/' '{print $2}' | cut -d"_" -f2- | sort | uniq
PXC1106956_1
PXC1107046_1
PXC1107130_1
PXC2010238_1
PXC2010247_1
PXC2010263_1
PXC2010327_1
MSGS_PXC2010286_1

Desired results are

PXC1106956_1
PXC1107046_1
PXC1107130_1
PXC2010238_1
PXC2010247_1
PXC2010263_1
PXC2010327_1
PXC2010286_1

I know i could use sed to replace MSGS in the last line to empty, but there are more lines than i have showed above, and i was also curious to know if there is any command to cut everything left/right of a sub string. any pointers would be great help :slight_smile:

Linux distrubution: SUSE
Shell: Bash

Thank you!

How about

awk '/PROTOBUF_DIRS/ && match ($0, /PXC[^_]*_1/) {print substr ($0, RSTART, RLENGTH)}' *.spec *.mk

awk unfortunately lacks both the -r ( --recursive ) and the --include options that grep provides, but find could help in either case.

1 Like

@Rudic

Thank you it worked.

find . -type f \( -name \*.spec -o -name \*.mk \) | xargs grep -E 'PROTOBUF_DIRS|PXC' |awk '/PROTOBUF_DIRS/ && match ($0, /PXC[^_]*_1/) {print substr ($0, RSTART, RLENGTH)}' | sort | uniq

can you please explain ($0, /PXC[^_]*_1/) this part in the command above, how it found the sub string of interest briefly.

man awk :

The match searches $0 for your target string PCX..._1 (the ellipsis replaced by [^_]* , i.e. non-"_" characters), and, if found, sets RSTART and RLENGTH accordingly for immediate use by the substr function.

BTW, that awk scriptlet doesn't need the upfront grep (as it checks lines for PROTOBUF_DIRS already and on its own), and it can be adapted to also make the uniq redundant.

2 Likes