Split string by position

apenkov · February 10, 2017, 3:56am

Hello,
I have a file, where the strings in the lines are with fixed length. Like this

1234ABXX234ABC123456

And I want to split that line as it is colored. The output to be like this:

1234A;BX;X234;ABC1234;56

I am trying now with substring, but I believe there is a better way.
Can you help please!
Thanks in advance!

RudiC · February 10, 2017, 4:58am

If people knew what the positions for the ; were, and how/where they are retrieved... not by the colours, I guess. Any preferred tools?

apenkov · February 10, 2017, 5:04am

OK, Sorry. Here are the length of the fileds

str1=5
str2=2
str3=4
str4=7
str5=2

RudiC · February 10, 2017, 5:19am

Are those in a file? A variable? on paper, your mind ... just kidding.

rbatte1 · February 10, 2017, 7:30am

Okay, so you can use variable substitution to slice up the variables in the current shell, i.e. you don't need to call cut etc.

If you know the field lengths are fixed, then perhaps something like this will work:-

c=0
while read record
do
   ((c=$c+1))                             # Increment record counter
   remainder="${record#?????}"            # 5x? to cut off first five characters
   p1="${record%${remainder}}"            # Get the first part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#??}"               # 2x? to cut off first two characters
   p2="${record%${remainder}}"            # Get the next part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#????}"             # 4x? to cut off first four characters
   p3="${record%${remainder}}"            # Get the next part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#???????}"          # 7x? to cut off first seven characters
   p4="${record%${remainder}}"            # Get the next part of the record dropping the remainder
   record="${remainder}"

   remainder="${record#??}"               # 2x? to try to cut off first two characters
   if [ "${remainder" = "${record}" ]
   then
      p5="${record}"                      # Use the whole of the remaining record if there are no other characters
   else
      printf "Error line %d\n" "${c}" >&2 # Write to standard error
   fi
   
   printf "%s;%s;%s;%s;%s\n" "$p1" "$p2" "$p3" "$p4" "$p5"
done < in_file > out_file  2> err_file

Does that sort of structure help? There may be a neater way in awk to read and split the record and I'd be happy to learn.

Robin

RudiC · February 10, 2017, 7:49am

Given the substring lengths are found in a file structured like above, and an empty field separator splits a line into single characters,try

awk '
NR == FNR       {P[MX = NR] = $2
                 next
                }
                {POS = 0
                 for (i=1; i<MX; i++)   {POS+=P; $POS = $POS ";"
                                        }
                }
1
' FS="=" file2 FS="" OFS="" file1
1234A;BX;X234;ABC1234;56

If the lengths are in a variable, a slight adaption will yield the same result.

apenkov · February 10, 2017, 11:03am

Hi,
At the end I did it with substr

while read string
do
str1=${string:0:5}
str2=${string:5:2}
str3=${string:7:4}
str4=${string:11:7}
str5=${string:18:2}
done < input.txt

But the 2 examples above are very usefull and helpfull.
Thanks for the help!

Scrutinizer · February 10, 2017, 2:27pm

With GNU awk:

awk '{$1=$1}1' FIELDWIDTHS="5 2 4 7 2" OFS=\; file

--
Regular sed:

sed 's/./&;/18; s/./&;/11; s/./&;/7; s/./&;/5' file