Extract substring specif position and length from file line

Hi gurus,
I am trying to figure out how to extract substring from file line (all lines in file), as specified position and specified legth.

Example input (file lines)

dhaskjdsa dsadhkjsa dhsakjdsad hsadkjh
dsahjdksahdsad sahkjd sahdkjsahd sajkdh adhjsak

I want to extract substring on position 10, length 4

Example output:

 dsad
hdsa

I am figuring out how to do that with sed, grep etc, with no luck.
Any help appreciated.

try cut

Another option is using bash sub-string expansion:

Syntax:

${parameter:offset}
${parameter:offset:length}

E.g.

#!/bin/bash
while read line
do
   echo ${line:10:4}
done < filename

Check bash manual for further reference:

man bash

No offense intended, but that's a terrible solution. There are quite a few bugs in that short script.

First of all, we don't know anything about the data, so we can't make any assumptions.

If there is leading or trailing whitespace, the field splitting done by read will discard them. This will affect the results of the parameter expansion, yielding characters that begin later in the line than desired, and/or we could miss characters at the end of the substring if they were discarded whitespace.

The read does backslash escaping. If there are backslashes in the data, again, an incorrect substring is the result.

If the correct substring is extracted, it could still fail to print properly if it looks to echo like a valid option or valid escape sequences.

What if there's an asterisk, a question mark, or a bracketed expression? Those may trigger pathname expansion (aka file globbing) since the parameter expansion is unquoted.

Troublesome sample data:

1234567890-n a
     678901234
\2\4\6\8\01234
1234567890* *?

If you wanted to do this correctly with bash builtins and parameter expansion, the following is the way:

while IFS= read -r line; do
    printf '%s\n' "${line:10:4}"
done < filename

Ygor's suggestion is probably simplest and best.

Regards,
Alister

2 Likes

No offence taken. It is my bad that I really didn't consider scenarios like backslash escaping & globbing. I really appreciate your feedback.

Thank you! :b:

Leading and trailing spaces are not preserved in read, but internal ones are OK:

$ echo ' a b   c d '|while read l ;do echo ">$l<";done
>a b   c d<
$
 
 
sed is happy to do this:
 
$ sed 's/^.\{9\}\(.\{4\}\).*/\1/' in_file > out_file