AWK print and retain original format

I have a file with very specific column spacing formatting,

I wish to do the following:

awk '{print $1, $2, $3, $4, $5, $6, $19-$7, $20-$8, $21-$9, $10, $11, $12}' merge.pdb > vector.pdb

but the format gets ruined.

I have tried with print -f but to no avail....

Please post sample before and after data, making it clear whether this is a fixed-length record and whether fields are left or right justified.

ATOM      1  N   ALA B   1      -9.995  -6.835   2.255  0.00  0.00      BH      ATOM      1  N   ALA B   1     -13.079 -16.435   0.105  0.00  0.00      BH
ATOM      2  HT2 ALA B   1     -10.828  -7.444   2.585  0.00  0.00      BH      ATOM      2  HT2 ALA B   1     -12.045 -16.716  -0.054  0.00  0.00      BH
ATOM      3  HT3 ALA B   1     -10.119  -6.623   1.230  0.00  0.00      BH      ATOM      3  HT3 ALA B   1     -13.318 -16.970   0.909  0.00  0.00      BH
ATOM      4  CA  ALA B   1     -10.201  -5.652   3.107  0.00  0.00      BH      ATOM      4  CA  ALA B   1     -12.997 -14.938   0.408  0.00  0.00      BH
ATOM      5  HA  ALA B   1     -10.804  -4.989   2.587  0.00  0.00      BH      ATOM      5  HA  ALA B   1     -13.464 -14.356  -0.413  0.00  0.00      BH
ATOM      6  CB  ALA B   1     -10.761  -6.301   4.361  0.00  0.00      BH      ATOM      6  CB  ALA B   1     -13.865 -14.850   1.756  0.00  0.00      BH
ATOM      7  HB1 ALA B   1     -11.677  -6.864   4.178  0.00  0.00      BH      ATOM      7  HB1 ALA B   1     -14.845 -15.340   1.646  0.00  0.00      BH
ATOM      8  HB2 ALA B   1     -10.016  -7.019   4.786  0.00  0.00      BH      ATOM      8  HB2 ALA B   1     -13.341 -15.381   2.572  0.00  0.00      BH
ATOM      9  HB3 ALA B   1     -11.114  -5.604   5.113  0.00  0.00      BH      ATOM      9  HB3 ALA B   1     -14.123 -13.823   2.072  0.00  0.00      BH
ATOM     10  C   ALA B   1      -8.943  -4.890   3.362  0.00  0.00      BH      ATOM     10  C   ALA B   1     -11.715 -14.231   0.505  0.00  0.00      BH
ATOM     11  O   ALA B   1      -8.904  -3.716   3.057  0.00  0.00      BH      ATOM     11  O   ALA B   1     -10.624 -14.820   0.363  0.00  0.00      BH
ATOM     12  N   ALA B   2      -7.940  -5.583   3.917  0.00  0.00      BH      ATOM     12  N   ALA B   2     -11.810 -12.904   0.758  0.00  0.00      BH
ATOM     13  HN  ALA B   2      -8.114  -6.501   4.241  0.00  0.00      BH      ATOM     13  HN  ALA B   2     -12.700 -12.511   0.709  0.00  0.00      BH
ATOM     14  CA  ALA B   2      -6.616  -5.044   4.344  0.00  0.00      BH      ATOM     14  CA  ALA B   2     -10.704 -12.044   0.971  0.00  0.00      BH
ATOM     15  HA  ALA B   2      -6.828  -4.512   5.228  0.00  0.00      BH      ATOM     15  HA  ALA B   2     -10.083 -12.526   1.731  0.00  0.00      BH

Yes, fields are left-justified.

---------- Post updated at 09:51 AM ---------- Previous update was at 09:50 AM ----------

My output loses the format:

ATOM 1 N ALA B 1 -3.084 -9.6 -2.15 0.00 0.00 BH
ATOM 2 HT2 ALA B 1 -1.217 -9.272 -2.639 0.00 0.00 BH
ATOM 3 HT3 ALA B 1 -3.199 -10.347 -0.321 0.00 0.00 BH
ATOM 4 CA ALA B 1 -2.796 -9.286 -2.699 0.00 0.00 BH
ATOM 5 HA ALA B 1 -2.66 -9.367 -3 0.00 0.00 BH
ATOM 6 CB ALA B 1 -3.104 -8.549 -2.605 0.00 0.00 BH
ATOM 7 HB1 ALA B 1 -3.168 -8.476 -2.532 0.00 0.00 BH
ATOM 8 HB2 ALA B 1 -3.325 -8.362 -2.214 0.00 0.00 BH
ATOM 9 HB3 ALA B 1 -3.009 -8.219 -3.041 0.00 0.00 BH
ATOM 10 C ALA B 1 -2.772 -9.341 -2.857 0.00 0.00 BH
ATOM 11 O ALA B 1 -1.72 -11.104 -2.694 0.00 0.00 BH
ATOM 12 N ALA B 2 -3.87 -7.321 -3.159 0.00 0.00 BH
ATOM 13 HN ALA B 2 -4.586 -6.01 -3.532 0.00 0.00 BH
ATOM 14 CA ALA B 2 -4.088 -7 -3.373 0.00 0.00 BH
ATOM 15 HA ALA B 2 -3.255 -8.014 -3.497 0.00 0.00 BH
ATOM 16 CB ALA B 2 -4.279 -5.691 -5.06 0.00 0.00 BH
ATOM 17 HB1 ALA B 2 -3.994 -5.943 -4.604 0.00 0.00 BH
ATOM 18 HB2 ALA B 2 -5.97 -5.762 -6.094 0.00 0.00 BH

---------- Post updated 03-29-12 at 02:02 AM ---------- Previous update was 03-28-12 at 09:51 AM ----------

Here is an example of a script that deals with this kind of format:

# This file is fixpdb.awk.
# Useage awk -f fixpdb.awk [segid=wxyz] [chainID=X]   <pdbfile.in >file.out
#                                       [resname=abc] 
# Extracts segments from pdb files and converts to a format acceptable by charmm.
# In command line can specify up to a four character segid with wxyz, e.g. prot. This 
#  field is ignored by current CHARMM versions, but needed for older versions. 
# Can specify a one character chainID. If is specified on command line, extracts
#  only lines whose character in column 22 matches chainID X. Use to extract specific 
#  subunit from pdb file.
# Instead, can specify a three character resname to select HOH or ligands like ARA.
# If resname is specified, extracts only lines whose resname in columns 18-20 
#  matches resname abc value.
# Writes header line as a remark.
# Ignores all other lines not beginning with ATOM or HETATM.
# If a single coordinate value for an atom is present, takes that. 
# If multiple coordinates are present, signified by A, B, .. in column 17, takes only A.
# If protein and HOH lines are present and protein lacks a chainID, takes the 
#  protein lines only.
# Converts HOH to TIP and adds a 3, making TIP3, HIS to HSD, CD1 to CD_ for ILE, 
#  adds the segid in columns 73-76. Converts OXT or OCT1 to OT1 and OCT2 to OT2.
# Renumbers atoms starting from 1.
# Fields: Atom, Atom No, Space, Atom name, Alt Conf indic, Resname, Space, 
#  Chain Ident, Res Seq No, Spaces, x, y, z, Occup, Temp fact, Spaces, Segment ID

BEGIN {FIELDWIDTHS=" 6 5 1 4 1 3 1 1 4 1 3 8 8 8 6 6 6 4"} 
{
	if ($1 == "HEADER")
		print "REMARK" substr($0, 7, 69)
	if ($1 != "ATOM  " && $1 != "HETATM")
		endif	
	else if ($5 != " " && $5 != "A")
		endif
	else if ($6 == resname || $8 == chainID || ($8 == " " && $1 != "HETATM")) 
	{
		atomno++
		if ($6 == "HOH")
		{	$4 = " OH2"
			$6 = "TIP"
			$7 = "3"
		}
		if ($1 == "HETATM")
			$1 = "ATOM  "
		if ($6 == "HIS")
			$6 = "HSD"
		if ($6 == "ILE" && $4 == " CD1")
			$4 = " CD "
		if ($4 == " OXT" || $4 == "OCT1") 
			$4 = " OT2"
		if ($4 == "OCT2")
			$4 = " OT1"
		printf "%6s",$1
		printf "%5d", atomno
		printf "%1s", " "
		printf "%4s", $4
		printf "%1s", " "
		printf "%3s", $6
		printf "%1s", $7
		printf "%1s", " "
		printf "%4s", $9
		printf "%4s", "    "
		printf "%8s", $12
		printf "%8s", $13
		printf "%8s", $14
		printf "%6s", $15
		printf "%6s", $16
		printf "%6s", "      "
		printf "%4s\n", segid
	}

}
END {printf "%3s\n", "END"}

Why not use printf, as in your example script?

why dont you try another program...

---------- Post updated at 02:57 AM ---------- Previous update was at 02:56 AM ----------

or ask other people that has specialty in IT.

---------- Post updated at 02:57 AM ---------- Previous update was at 02:57 AM ----------

I'm sure they can help you..:):slight_smile:

Ok I tried writing the script:

BEGIN {FIELDWIDTHS=" 6 5 1 4 1 3 1 1 4 1 3 8 8 8 6 6 6 13 6 5 1 4 1 3 1 1 4 1 3 8 8 8 6 6 6 4"}
{ printf "%6s",$1
                printf "%6d", $1
                printf "%5d", $2
                printf "%1s", $3
                printf "%4s", $4
                printf "%1s", $5
                printf "%3s", $6
                printf "%1s", $7
                printf "%1s", $8
                printf "%4s", $9
                printf "%1s", $10
                printf "%3s", $11
                printf "%8s", $12-$30
                printf "%8s", $13-$31
                printf "%8s", $14-$32
                printf "%6s", $15
                printf "%6s", $16
                printf "%6s", $17
                printf "%13s\n", $18
}
END {printf "%3s\n", "END"}

But the output is still poor:

  ATOM     0    1N ALAB  1-9.995-6.8352.2550.000.00       0       0       1     N   ALA     B            1
  ATOM     0    2HT2 ALAB  1-10.828-7.4442.5850.000.00       0       0       2   HT2   ALA     B            1
  ATOM     0    3HT3 ALAB  1-10.119-6.6231.2300.000.00       0       0       3   HT3   ALA     B            1
  ATOM     0    4CA ALAB  1-10.201-5.6523.1070.000.00       0       0       4    CA   ALA     B            1
  ATOM     0    5HA ALAB  1-10.804-4.9892.5870.000.00       0       0       5    HA   ALA     B            1
  ATOM     0    6CB ALAB  1-10.761-6.3014.3610.000.00       0       0       6    CB   ALA     B            1
  ATOM     0    7HB1 ALAB  1-11.677-6.8644.1780.000.00       0       0       7   HB1   ALA     B            1
  ATOM     0    8HB2 ALAB  1-10.016-7.0194.7860.000.00       0       0       8   HB2   ALA     B            1
  ATOM     0    9HB3 ALAB  1-11.114-5.6045.1130.000.00       0       0       9   HB3   ALA     B            1
  ATOM     0   10C ALAB  1-8.943-4.8903.3620.000.00       0       0      10     C   ALA     B            1
  ATOM     0   11O ALAB  1-8.904-3.7163.0570.000.00       0       0      11     O   ALA     B            1
  ATOM     0   12N ALAB  2-7.940-5.5833.9170.000.00       0       0      12     N   ALA     B            2
  ATOM     0   13HN ALAB  2-8.114-6.5014.2410.000.00       0       0      13    HN   ALA     B            2
  ATOM     0   14CA ALAB  2-6.616-5.0444.3440.000.00       0       0      14    CA   ALA     B            2
  ATOM     0   15HA ALAB  2-6.828-4.5125.2280.000.00       0       0      15    HA   ALA     B            2
END

Alternatively, try this:

awk 'sub(" *"$7" *"$8" *"$9,sprintf("%12.3f%8.3f%8.3f",$19-$7, $20-$8, $21-$9))' infile | cut -c-74

That's because your field widths don't match the data - e.g. $3 is being written with a width of 1, but some rows are 3 characters (plus a space, if you want to pad it).