Extract filepath names between two strings

John_K · May 25, 2018, 12:52pm

OS : Fedora Linux 26
Shell : bash

I have a file with around 5000 lines like below.

file /usr/share/icons/Papirus/16x16/actions/papirus-icon-theme-20180501-1.noarch conflicts with file ... 
file /usr/share/icons/Papirus/16x16/actions/align-horizontal-left-to-anchor.svg conflicts between .... 
file /usr/share/icons/Papirus/16x16/actions/align-horizontal-left.svg conflicts between ... 
file /usr/share/icons/Papirus/16x16/actions/align-horizontal-right-out.svg conflicts between ... 
file /usr/share/icons/Papirus/16x16/actions/align-horizontal-right-to-anchor.svg conflicts between ..

I want to extract the filepath between the words " file " and " conflicts "

So, the output would be like below without any leading and trailing empty spaces

/usr/share/icons/Papirus/16x16/actions/papirus-icon-theme-20180501-1.noarch  
/usr/share/icons/Papirus/16x16/actions/align-horizontal-left-to-anchor.svg
/usr/share/icons/Papirus/16x16/actions/align-horizontal-left.svg
/usr/share/icons/Papirus/16x16/actions/align-horizontal-right-out.svg 
/usr/share/icons/Papirus/16x16/actions/align-horizontal-right-to-anchor.svg

Anyway I could do this ?

RudiC · May 25, 2018, 12:55pm

Any attempts / ideas / thoughts from your side?

John_K · May 26, 2018, 8:03am

awk '{print $2}' <fileName> did the trick
I didn't know that a pathname like /usr/share/icons/Papirus/16x16/actions/address-book-new.svg will be considered a single column

So, i just got the second column printed using the above awk command

RudiC · May 26, 2018, 8:14am

Brilliant!
Yes, it will do as long as there are NO SPACES in the file names.
And, the file name is considered a single column as long as it doesn't have field separators in it, i.e. you don't define awk 's FS to be (or contain) / .

There are of course several other approaches as well, e.g.:

cut -d" " -f2 file
 sed 's/^[^ ]* //; s/ .*$//' file
grep -o '/[^ ]*' file

Of course, if you need to fulfill your full-blown spec, this

sed 's/^file //; s/ conflict.*$//' file

or

sed 's/^file \| conflict.*$//g' file

were the way to go.

John_K · May 26, 2018, 9:48am

Thank You Rudic
The 2 solutions you've provided for the full-blown spec have one minor issue. The word "file" is also printed as the shown the output
But, I just need the filepath name

sed 's/^file //; s/ conflict.*$//' <filename>

and

sed 's/^file \| conflict.*$//g' <filename>

---- Output of above commands -----

file /usr/share/icons/Papirus/24x24/actions/zoom-fit-drawing.svg
file /usr/share/icons/Papirus/24x24/actions/zoom-fit-height.svg
file /usr/share/icons/Papirus/24x24/actions/zoom-fit-page.svg
file /usr/share/icons/Papirus/24x24/actions/zoom-fit-selection.svg
.
.
.

RudiC · May 26, 2018, 11:00am

Not with the sed and file versions that I have:

sed 's/^file \| conflict.*$//g' file
/usr/share/icons/Papirus/16x16/actions/papirus-icon-theme-20180501-1.noarch
/usr/share/icons/Papirus/16x16/actions/align-horizontal-left-to-anchor.svg
/usr/share/icons/Papirus/16x16/actions/align-horizontal-left.svg
/usr/share/icons/Papirus/16x16/actions/align-horizontal-right-out.svg
/usr/share/icons/Papirus/16x16/actions/align-horizontal-right-to-anchor.svg

Any special chars (or e.g. a <TAB> in lieu of space) in your file? What be your sed version?

bakunin · May 26, 2018, 1:37pm

I think the same as obviously RudiC does: perhaps the empty spaces are not what they seem to be (tabs instead of spaces or the like). Modify your code above to:

sed 's/^file[[:space:]]*//; s/[[:space:]]*conflict.*$//' <filename>

to get some additional variability. Notice that the regexp "[[:space:]]*" will cover for tabs (or any other form of whitespace) instead of spaces and the asterisk will make sure that these are completely removed even if there are several (instead of the expected one) of them. i.e. this line:

file  /path/to/file  conflicts...
file /this/is/a/normal/line conflicts...

(notice the multiple spaces) would produce the output:

 /path/to/file 
/this/is/a/normal/line

with leading/trailing blanks without the asterisks.

I hope this helps.

bakunin

Scrutinizer · May 26, 2018, 5:56pm

Note: The use of \| for alternation is a GNU extension to BRE (Basic Regular Expressions) and is not supported in standard sed .