awk to print filename words along with delimiter

Hi,
I have filename as:

010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937

I need first 5 words of my filename along with the respective delimiters:
I tried this:

f=010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937
echo $f | awk -F '[_-]' '{print $1$2$3$4$5}'
010020001SFORSortSYEXC

But i want output with delimiters like:

010020001_S-FOR-Sort-SYEXC

I dont want to add those delimiters in print manually because _,- is not fixed i may get all delimiters as _ or - or both.
Is there any way to print first 5 words of filename along with the delimiters?

TIA

Hello gnnsprapa,

Could you please try following and do let me know if this helps you.

awk 'END{split(FILENAME, array,"_");print array[1] array[2]}' 010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937

Thanks,
R. Singh

thanks for suggesting the solution: I got this

010020001S-FOR-Sort-SYEXC

its missing the first underscore(_) between first 2 words of filename

Hello gnnsprapa,

Apologies, in hurry I missed it. Please try following and let me know if this helps you.

awk 'END{split(FILENAME, array,"_");print array[1] "_"  array[2]}' 010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937

Thanks,
R. Singh

1 Like

Try also shell parameter expansion

echo ${f%[-_]${f#*[_-]*[-_]*[_-]*[_-]*[_-]*}}
010020001_S-FOR-Sort-SYEXC
1 Like

@RudiC
while trying this.i got output as

f=010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937
echo ${f%[-_]${f#*[_-]*[-_]*[_-]*[_-]*[_-]*}}
010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937

I dont want words after 5th word of filename .i.e i want only

010020001_S-FOR-Sort-SYEXC

@Ravinder
as i told i dont want to hardcore any delimiter, in your solution

awk 'END{split(FILENAME, array,"_");print array[1] "_"  array[2]}' 010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937

i will get correct result but if my filename has "-" instaed of "" then also it will give me "" as i have hardcoded it:
for eg: i have 2 files as

f=010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937
f1=010020001-S-FOR-Sort-SYEXC_20180109_062320_0100.x937

i want output as

010020001_S-FOR-Sort-SYEXC
010020001-S-FOR-Sort-SYEXC

appreciate your help so far, thanks

What's your shell version? Does it offer "parameter expansion / Remove matching prefix/suffix pattern"?

my shell version:

echo $BASH_VERSION
4.1.2(1)-release

Then it should work:

$ f=010020001_S-FOR-Sort-SYEXC_20180109_062320_0100.x937
$ echo "${f%_*_*_*}"
010020001_S-FOR-Sort-SYEXC
$ echo "${f%[-_]*[-_]*[-_]*}"
010020001_S-FOR-Sort-SYEXC
$ echo "${f%[-_]"${f#*[_-]*[-_]*[_-]*[_-]*[_-]*}"}"
010020001_S-FOR-Sort-SYEXC

Are you sure that it was executed in that bash4 shell?

1 Like

The following should give you what you want even if the filename contains white-space characters:

printf '%s\n' "$f" | awk -F'[-_]' '{print substr($0, 1, length($1$2$3$4$5)+4)}'

Note that echo on many systems will interpret some sequences of characters as escape sequences and perform various transformations that you don't want. Using printf as shown above instead of echo avoids that issue.

Scrutinizer,
I think gnnsprapa is saying that any of the underscores (not just the first one) could be hyphens instead of underscores. Your suggestion assumes that the number of words stored in the variable is a constant (which has not been stated) and that the last five words are separated by underscores only.

As long as the given value assigned to the variable f contains at least six words, the following should work:

f_tail=${f#*[-_]*[-_]*[-_]*[-_]*[-_]}
printf '%s\n' "${f:1:$((${#f} - ${#f_tail} - 2))}"

with his version of bash or a recent ksh . The awk suggestion above should work even if there are only five words in the value assigned to f and should work with any shell using Bourne shell syntax.

1 Like

Hey RudiC,Don,scrutnizer thanks a lot for your solutions...every solution worked so i am using

echo "${f%[-_]"${f#*[_-]*[-_]*[_-]*[_-]*[_-]*}"}"

It parse all the types of filename i receive, and also easy to understand. if possible can u make me understand this code:

printf '%s\n' "$f" | awk -F'[-_]' '{print substr($0, 1, length($1$2$3$4$5)+4)}'

Thanks, your solution worked every time for me:b:

The awk command extracts count characters (sum of lenghts of the first 5 fields) plus 4 field separators from $0, the unmodified input line, starting at the begin of line.
Try also

awk -vf=$f -F'[_-]' 'BEGIN {$0 = f; sub (FS $6 ".*$", ""); print}'
010020001_S-FOR-Sort-SYEXC