Cut output to same byte position

HealthyGuy · November 21, 2006, 3:08pm

Hi folks

I have a file with thousands of lines with fixed length fields:

sample (assume x is a blank space)

111333xx444TTTLKOPxxxxxxxxx

I need to make a copy of this file but with only some of the field positions, for example I'd like to copy the sample to the follwing: so I'd like to print bytes 4-5 and 15-16 and they be in the same character positions in the new file.

xxx33xxxxxxxxTLxxxxxxxxxxxxxxx

I started looking at cut -b4-5,15-16 but my output is in position 1-4 instead of the same 4-5 and 15-16 with blank spaces everywhere there was one in the original.

Any help would be appreciated.

Manish_Jha · November 21, 2006, 3:59pm

try with awk. Use the substr($0,Starting_Position,Length) function to cut specific byte from the file.

e.g.
111333xx444TTTLKOPxxxxxxxxx

awk '
{
filler1=substr($0,0,3)
fild1=substr($0,4,2)
filler2=substr($0,6,8)
fild2=substr($0,15,2)
filler3=substr($0,17,11)
printf("%s%s%s%s%s",filler1,fild1,filler2,fild2,filler3)
}' file1 > out_file

Note: Adjuct the filed position as per your correct file layout.

--Manish Jha

HealthyGuy · November 22, 2006, 7:50am

Hi Manish

This does print the correct positions but does not fill in the spaces between with the same number of bytes turned to blanks that were in the original. For example, if we take the following 2 lines.

ABC123DEF
GEH456JKL
Say I want to print position 1-2 and 6-8 and want everything in between turned to blank spaces so the byte positions from my input are in the same positions as my output.

Say x is a blank space the output should look like:
ABxxx3DEx
GExxx6JKx

The suggested awk with substr gives me the right substrings but in the wrong position in the new file:
AB3DE
GE6JK

Thanks for any suggestions

anbu23 · November 22, 2006, 8:36am

echo "ABC123DEF" | sed "s/\(.\{2\}\)\(.\{3\}\)\(.\{3\}\).*/\1   \3 /"

sed "s/$.\{2\}$$.\{3\}$$.\{3\}$.*/\1 \3 /"

Match the first two char

sed "s/$.\{2\}$$.\{3\}$$.\{3\}$.*/\1 \3 /"

Match next three char

sed "s/$.\{2\}$$.\{3\}$$.\{3\}$.*/\1 \3 /"

Match next three char followed by the above three char

.* match till end of the line

I added three blanks between \1 and \3 and one blank after \3 to replace the respective char in input with blanks

matrixmadhan · November 22, 2006, 8:41am

Is this ok ?

awk ' { printf "%sxxx%sx\n", substr($0, 1, 2),  substr($0, 6, 3) }'  filename

ABxxx3DEx
GExxx6JKx

HealthyGuy · November 22, 2006, 9:03am

Thanks everyone.

my samples are simplified so what I actually have are lines 300 bytes long.

I need to print bytes 1-3, 203-240, and 260-289, with blank spaces between so the output remains in the same position. This presents a problem with sed because putting 200 spaces in the replace segment doesn't make sense. The suggestions above are fine for my samples which are 10-12 bytes long but not for 300 byte long lines where are only need to print a few bytes across the line.

Interesting issue I have here I never thought would be so tricky to figure out when I started

Thanks for all suggestions

ghostdog74 · November 22, 2006, 9:51pm

If you have Python, an alternative
Sample input:
-------------------
111333 444TTTLKOP
122333 444DDDLKOP
422333 4445DDLTlR

#!/usr/bin/python
start1,end1 = 3,6 #position 4-6
start2,end2 = 14,17 #position 15-17
for lines in open("test.txt"):
        lines = list(lines.strip())
        lines[start1:end1] = " " * (end1 - start1) #sub space at positoin 4-5
        lines[start2:end2] = " " * (end2 - start2)
        print ''.join(lines)

output:

111     444TTT   P
122     444DDD   P
422     4445DD   R

matrixmadhan · November 23, 2006, 12:31am

try this,

though not efficient, some pointers to proceed with

#! /bin/zsh

gen()
{
 appVar="x"
 cnt1=$1
 cnt2=$2
 var1=""
 var2=""

 while [ $cnt1 -gt 0 ]
 do
   cnt1=$(($cnt1 - 1))
   var1=$var1$appVar
 done

 while [ $cnt2 -gt 0 ]
 do
gen 3 1

awk '{ printf "%s %s\n", substr($0, 1, 2), substr($0, 6, 3) }' filename | while read line1 line2
do
echo $line1$var1$line2$var2
done

exit 0

   cnt2=$(($cnt2 - 1))
   var2=$var2$appVar
 done
}

Ygor · November 23, 2006, 1:23am

Try...

awk '{for(i=1;i<=length;i++)
        printf ((i>=1&&i<=3||i>=203&&i<=240||i>=260&&i<=289)?substr($0,i,1):"x")
      printf ORS}' file1 > file2

HealthyGuy · November 23, 2006, 12:29pm

Thanks everyone, this last post from Ygor looks like the easiest option. I've been playing with it this morning but getting a syntax error I can't get past.

awk: syntax error near line 2
awk: illegal statement near line 2

Any ideas?

Thanks all!

Ygor · November 23, 2006, 7:12pm

On Solaris, use nawk.

HealthyGuy · November 24, 2006, 9:14am

Nawk works better but still not 100%, the code is now:

nawk '{for(i=1;i<=length;i++)
printf((i>=1&&i<=3||i>=203&&i<=240||i>=260&&i<=289), substr($0,i,1))}' file_in > file_out

but my file in looks like
ABC SOLARIS 2200 MAIN STREET

My out file is getting 1's in place of the characters I want to print and 0's in place of the blank spaces I want:
1110000000000011111111100000111111111

I'm sure it's something simple but I'm not seeing it

Thanks for any input

Ygor · November 26, 2006, 7:55pm

You have altered the awk script. Why not try the code as posted?

HealthyGuy · November 27, 2006, 7:21am

You are correct, thanks Ygor! I was getting a syntax error as awk before so I altered it then changed it to nawk without replacing the changes i had made in awk. It does exactly what I need with a space in place of the "x".

Thanks very much!!