Split string by position

Hello,
I have a file, where the strings in the lines are with fixed length. Like this

1234ABXX234ABC123456

And I want to split that line as it is colored. The output to be like this:

1234A;BX;X234;ABC1234;56

I am trying now with substring, but I believe there is a better way.
Can you help please!
Thanks in advance!

If people knew what the positions for the ; were, and how/where they are retrieved... not by the colours, I guess. Any preferred tools?

OK, Sorry. Here are the length of the fileds

str1=5
str2=2
str3=4
str4=7
str5=2

Are those in a file? A variable? on paper, your mind ... just kidding.

Okay, so you can use variable substitution to slice up the variables in the current shell, i.e. you don't need to call cut etc.

If you know the field lengths are fixed, then perhaps something like this will work:-

c=0
while read record
do
   ((c=$c+1))                             # Increment record counter
   remainder="${record#?????}"            # 5x? to cut off first five characters
   p1="${record%${remainder}}"            # Get the first part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#??}"               # 2x? to cut off first two characters
   p2="${record%${remainder}}"            # Get the next part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#????}"             # 4x? to cut off first four characters
   p3="${record%${remainder}}"            # Get the next part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#???????}"          # 7x? to cut off first seven characters
   p4="${record%${remainder}}"            # Get the next part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#??}"               # 2x? to try to cut off first two characters
   if [ "${remainder" = "${record}" ]
   then
      p5="${record}"                      # Use the whole of the remaining record if there are no other characters
   else
      printf "Error line %d\n" "${c}" >&2 # Write to standard error
   fi
   
   printf "%s;%s;%s;%s;%s\n" "$p1" "$p2" "$p3" "$p4" "$p5"
done < in_file > out_file  2> err_file

Does that sort of structure help? There may be a neater way in awk to read and split the record and I'd be happy to learn.

Robin

1 Like

Given the substring lengths are found in a file structured like above, and an empty field separator splits a line into single characters,try

awk '
NR == FNR       {P[MX = NR] = $2
                 next
                }
                {POS = 0
                 for (i=1; i<MX; i++)   {POS+=P; $POS = $POS ";"
                                        }
                }
1
' FS="=" file2 FS="" OFS="" file1
1234A;BX;X234;ABC1234;56

If the lengths are in a variable, a slight adaption will yield the same result.

1 Like

Hi,
At the end I did it with substr

while read string
do
str1=${string:0:5}
str2=${string:5:2}
str3=${string:7:4}
str4=${string:11:7}
str5=${string:18:2}
done < input.txt

But the 2 examples above are very usefull and helpfull.
Thanks for the help!

With GNU awk:

awk '{$1=$1}1' FIELDWIDTHS="5 2 4 7 2" OFS=\; file

--
Regular sed:

sed 's/./&;/18; s/./&;/11; s/./&;/7; s/./&;/5' file