Remove a block of Text at regular intervals

Hello all,

I have a text files that consists of blocks of text. Each block of text represents a set of Cartesian coordinates for a molecule. Each block of text starts with a line that has a only a number, which is equal to the total number of atoms in the molecule. After this number is a line with "Frame #" where the # symbol is the number of the frame i.e. 1, 2, 3... This is line are the Cartesian coordinates, one set (x, y, z) per line. There are the same number of Cartesian coordinates as the number of atoms in the molecule.

Here is a sample:

14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 2
Ir 0.4490 -1.2978 -0.2903
P 1.8738 -2.1613 1.4076
P -1.2367 -0.8359 -1.9047
H -2.5634 -0.7722 -1.3897
H -1.3955 -1.8409 -2.9008
H -1.2083 0.3415 -2.7087
H 2.5186 -3.3989 1.1226
H 1.2155 -2.4815 2.6289
H 2.9753 -1.3923 1.8863
C 1.3731 0.2542 -0.5394
O 1.9771 1.2723 -0.7122
Cl -0.8091 -3.4129 0.0288
H -0.9491 -0.6267 2.0929
H -1.2309 -0.4996 2.7795
.
.
.
14
Frame 72
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427

I would like to be able to remove every other frame (which corresponds to a block of text that starts with number of atoms in the molecule, followed by the "Frame #" followed by the coordinates.

The trick is that not every file has the same number of atoms, so the script has to have a variable such that the user can tell it how many atoms are in the coordinates section and then it should remove every other block of data.

So I am trying to remove Frames 2, 4, 6, 8 and so on.

Thus the above sample would become:

14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
.
.
.
Frame 71
Ir 0.2798 -1.1422 -0.0743
P 1.5833 -2.0636 1.6848
P -1.4955 -0.6824 -1.5761
H -2.5237 0.2149 -1.1727
H -2.2322 -1.8577 -1.8839
H -1.2099 -0.1819 -2.8797
H 2.8810 -2.5379 1.3438
H 0.9977 -3.2312 2.2359
H 1.8739 -1.3017 2.8539
C 1.7345 -0.5673 -1.1991
O 2.6175 -0.1439 -1.8621
Cl 0.2052 -3.5805 -0.9056
H 0.1693 0.2983 0.6159
H -0.9714 -1.5096 0.9427

Also, if possible, I would like also to be able to remove every 3rd frame as well.

i.e. remove frame 3, 6, 9, 12, 15... and so on.

To print the blocks 2, 4, 6 and so on:

awk 'END { 
  if (!(rc % v) && rc != prc)
    print r 
  }
NF == 1 { 
  if (!(rc % v)) {
    print r; prc = rc
    }
  r = x; rc ++
  }
{ r = r ? r RS $0 : $0 }
  ' v=2 infile

To print 3, 6, 9 and so on, set v to 3: ... v=3 infile

1 Like

Try:

awk '/^[0-9]+$/{a=$1;p=0} $1=="Frame"&&$2%m{print a;p=1}p' m=3 infile

m=2,3,4 etc...

1 Like

By setting "Frame" as record separator , i must get rid off the leading line "^14$", i must adjust the offset of record, this is why the NR-1 instead of NR

awk -F"\n" 'BEGIN {RS="Frame" ;ORS="Frame";printf "%s","Frame"} ; ((NR-1)%2!=0)&&((NR-1)%3!=0)' infile

or

awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%2!=0)&&((NR-1)%3!=0)' infile
# cat infile
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
H -0.8380 -0.5636 2.1141
...
H -1.0869 -0.4380 2.8141
14
Frame 2
Ir 0.4490 -1.2978 -0.2903
P 1.8738 -2.1613 1.4076
...
H -1.2309 -0.4996 2.7795
14
Frame 3
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 4
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 5
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 6
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 7
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 8
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 9
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 10
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 11
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
.
.
.
14
Frame 72
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427
# awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%2!=0)&&((NR-1)%3!=0)' infile
Frame 1
Ir 0.4482 -1.2980 -0.2902
H -0.8380 -0.5636 2.1141
...
H -1.0869 -0.4380 2.8141
14
Frame 5
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 7
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 11
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
.
.
.
14
#
1 Like

This works exactly as you said, but is there a way to make it print blocks 1,3,5, 7 and 1,4,7,9 etc..?

---------- Post updated at 01:36 PM ---------- Previous update was at 01:29 PM ----------

Thanks for your reply but this doesn't work.

It seems that this command adds spaces into certain spots of the file at the begining but does remove the correct blocks towards the end. Here is the the first few blocks:

Frame 1
Rh -0.3793 -0.3188 -0.3617
C -1.4181 0.5185 -1.7491
P 1.8313 -0.3365 -1.4708
H 2.0493 0.2251 -2.7751
H 2.9159 0.3283 -0.8096
H 2.5143 -1.5807 -1.7041
P -1.0881 -2.6725 -0.6945
H -0.1078 -3.7030 -0.9106
H -1.7533 -3.3073 0.4056
H -2.0033 -3.0952 -1.7185
H 0.4760 -0.9847 0.8641
O -2.0655 1.0426 -2.5964
C -1.5647 0.4833 1.2803
C -0.4855 1.3783 1.0023
H -2.5722 0.7321 0.9603
H -1.5075 -0.1757 2.1410
H -0.6724 2.3044 0.4670
H 0.3854 1.3885 1.6488
18

Frame 2
Rh -0.3799 -0.3194 -0.3618
C -1.4163 0.5199 -1.7496
P 1.8296 -0.3352 -1.4732
H 2.0452 0.2282 -2.7771
H 2.9144 0.3296 -0.8125
H 2.5134 -1.5785 -1.7089
P -1.0872 -2.6741 -0.6913
H -0.1063 -3.7039 -0.9069
H -1.7499 -3.3073 0.4112
H -2.0037 -3.0998 -1.7128
H 0.4765 -0.9847 0.8635
O -2.0594 1.0478 -2.5979
C -1.5655 0.4814 1.2805
C -0.4871 1.3774 1.0028
H -2.5732 0.7295 0.9607
H -1.5077 -0.1779 2.1409
H -0.6748 2.3035 0.4679
H 0.3839 1.3880 1.6492

and here are the blocks at the end:

Frame 287
Rh -0.4360 -0.2696 -0.6220
C -1.1646 1.0045 -1.7330
P 1.7166 -0.5355 -1.8645
H 1.8842 0.0740 -3.1515
H 2.8434 0.0882 -1.2285
H 2.3789 -1.7764 -2.1787
P -0.6673 -2.6176 0.1277
H 0.3305 -3.6308 0.3709
H -1.4676 -2.8685 1.2952
H -1.4465 -3.3795 -0.8014
H -0.3658 -0.4659 2.3069
O -1.7130 1.6942 -2.5406
C -1.8697 0.1962 0.8475
C -0.9955 0.4085 2.0867
H -2.4370 1.0967 0.5998
H -2.5883 -0.6253 0.9828
H -0.3321 1.2698 1.9625
H -1.6058 0.5920 2.9832
18

Frame 289
Rh -0.4362 -0.2701 -0.6236
C -1.1618 1.0091 -1.7307
P 1.7142 -0.5392 -1.8686
H 1.8803 0.0675 -3.1571
H 2.8442 0.0827 -1.2365
H 2.3725 -1.7824 -2.1820
P -0.6607 -2.6166 0.1343
H 0.3410 -3.6262 0.3757
H -1.4536 -2.8644 1.3074
H -1.4437 -3.3837 -0.7873
H -0.3808 -0.4699 2.3130
O -1.7083 1.7048 -2.5345
C -1.8742 0.1934 0.8435
C -1.0063 0.4059 2.0871
H -2.4418 1.0934 0.5946
H -2.5926 -0.6289 0.9752
H -0.3391 1.2644 1.9635
H -1.6207 0.5944 2.9797
18
Frame 291
Rh -0.4365 -0.2705 -0.6252
C -1.1590 1.0137 -1.7283
P 1.7117 -0.5429 -1.8727
H 1.8763 0.0608 -3.1628
H 2.8450 0.0772 -1.2446
H 2.3659 -1.7885 -2.1852
P -0.6541 -2.6156 0.1410
H 0.3512 -3.6217 0.3813
H -1.4405 -2.8599 1.3191
H -1.4403 -3.3880 -0.7734
H -0.3944 -0.4731 2.3185
O -1.7034 1.7154 -2.5282
C -1.8788 0.1901 0.8395
C -1.0170 0.4034 2.0873
H -2.4472 1.0893 0.5894
H -2.5964 -0.6335 0.9678
H -0.3474 1.2601 1.9648
H -1.6358 0.5956 2.9761

As you can see, it does remove every second block at the end but it also adds in spaces for some reason...

---------- Post updated at 01:39 PM ---------- Previous update was at 01:36 PM ----------

Thanks for your suggestion. This does remove blocks of text but it seems it is removing 2 or 3 or 4 blocks rather than removing every second block.

Here is a sample:

Frame 1
Rh -0.3793 -0.3188 -0.3617
C -1.4181 0.5185 -1.7491
P 1.8313 -0.3365 -1.4708
H 2.0493 0.2251 -2.7751
H 2.9159 0.3283 -0.8096
H 2.5143 -1.5807 -1.7041
P -1.0881 -2.6725 -0.6945
H -0.1078 -3.7030 -0.9106
H -1.7533 -3.3073 0.4056
H -2.0033 -3.0952 -1.7185
H 0.4760 -0.9847 0.8641
O -2.0655 1.0426 -2.5964
C -1.5647 0.4833 1.2803
C -0.4855 1.3783 1.0023
H -2.5722 0.7321 0.9603
H -1.5075 -0.1757 2.1410
H -0.6724 2.3044 0.4670
H 0.3854 1.3885 1.6488
18
Frame 5
Rh -0.3814 -0.3213 -0.3621
C -1.4104 0.5238 -1.7517
P 1.8239 -0.3291 -1.4809
H 2.0315 0.2434 -2.7820
H 2.9094 0.3353 -0.8209
H 2.5109 -1.5685 -1.7278
P -1.0864 -2.6784 -0.6804
H -0.1049 -3.7078 -0.8957
H -1.7414 -3.3064 0.4296
H -2.0081 -3.1123 -1.6940
H 0.4791 -0.9850 0.8613
O -2.0384 1.0622 -2.6046
C -1.5674 0.4749 1.2816
C -0.4913 1.3740 1.0054
H -2.5758 0.7210 0.9626
H -1.5076 -0.1858 2.1409
H -0.6814 2.3004 0.4719
H 0.3800 1.3854 1.6513
18
Frame 7
Rh -0.3823 -0.3227 -0.3624
C -1.4064 0.5263 -1.7531
P 1.8201 -0.3247 -1.4859
H 2.0225 0.2538 -2.7852
H 2.9058 0.3396 -0.8263
H 2.5096 -1.5612 -1.7402
P -1.0862 -2.6812 -0.6731
H -0.1046 -3.7105 -0.8884
H -1.7358 -3.3054 0.4422
H -2.0117 -3.1204 -1.6809
H 0.4809 -0.9852 0.8597
O -2.0243 1.0716 -2.6091
C -1.5686 0.4703 1.2824
C -0.4940 1.3715 1.0071
H -2.5775 0.7151 0.9639
H -1.5074 -0.1914 2.1408
H -0.6858 2.2982 0.4746
H 0.3775 1.3835 1.6527
18
Frame 11
Rh -0.3843 -0.3255 -0.3630
C -1.3984 0.5314 -1.7560
P 1.8126 -0.3158 -1.4960
H 2.0050 0.2739 -2.7918
H 2.8983 0.3492 -0.8369
H 2.5076 -1.5464 -1.7639
P -1.0859 -2.6867 -0.6582
H -0.1046 -3.7161 -0.8740
H -1.7243 -3.3027 0.4680
H -2.0199 -3.1361 -1.6538
H 0.4848 -0.9857 0.8561
O -1.9959 1.0901 -2.6179
C -1.5709 0.4610 1.2839
C -0.4995 1.3664 1.0108
H -2.5808 0.7031 0.9666
H -1.5070 -0.2029 2.1405
H -0.6944 2.2936 0.4804
H 0.3725 1.3793 1.6557
.
.
.

For removing every second block only (2,4,6,8, ...) :

awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%2!=0)' infile

But you initially stated :

That is why i also included the code to remove block 3,6,9,12,15 ...

Hi on my system my suggestion prints
m=2: 1,3,5,7 . so 2,4,6,.. are deleted
m=3: 1,2,4,5,7 .. so 3,6,9.... are deleted
etc..
Does it not do that on your system? What platform are you on?

OK.

Sorry - I was unclear in my intial post.

I meant to remove 2,4,6,8.... and then output this as only file. This essentially cuts the total number of frames by half.

Another option would be to remove frames 3,6,9,12... and then output this as a different file. This essentially cuts the total number of frames by a third.

By the way, I tried the new code you suggested:

awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%2!=0)' infile

and it works perfectly!

Is there a way to modify it to take out every third frame only i.e. frames 3,6,9,12...?

You have to tell more about what you want to keep or exclude :
Do you want to keep Frames that are multiple of 3 ( Frame 6 and Frame 9 ) ???
or you want to display Frame 9 but exclude Frame 6 ???
Of course if we use a formula that filter out all multiple of 3 , the Frame 9 will be filtered out ...

# awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%3==1)' infile
Frame 1
Ir 0.4482 -1.2980 -0.2902
H -0.8380 -0.5636 2.1141
...
H -1.0869 -0.4380 2.8141
14
Frame 4
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 7
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 10
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
# 

---------- Post updated at 07:23 PM ---------- Previous update was at 07:20 PM ----------

To have the 3,6,9,12 :

 awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%3==0)' infile
# awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",(($0==14)?14:RS)};(NR!=1)&&((NR-1)%3==0)' infile
Frame 3
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 6
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 9
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 72
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427
#

Here is the output from "uname -a":

Linux 2.6.18-194.11.4.e15

---------- Post updated at 02:26 PM ---------- Previous update was at 02:24 PM ----------

I want to remove all the frames that are a multiple of 3.

That is, I want to remove, 3,6,9,12,15,....until the end.

Then I want the output to contain frames 1,2,4,5,7,8,10,11, etc.

Hope that helps.

It does exactly that when I use m=3. This should work on your platform, does it not?

To filter out the 3,6,9,12,15 ... :

# awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%3!=0)' infile
Frame 1
Ir 0.4482 -1.2980 -0.2902
H -0.8380 -0.5636 2.1141
...
H -1.0869 -0.4380 2.8141
14
Frame 2
Ir 0.4490 -1.2978 -0.2903
P 1.8738 -2.1613 1.4076
...
H -1.2309 -0.4996 2.7795
14
Frame 4
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 5
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 7
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 8
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 10
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 11
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
.
.
.
14
#

... to secure the the display of the leading 14

awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",14"\n"RS};(NR!=1)&&((NR-1)%3!=0)' infile

A Perl solution follows -

$
$
$ # show the content of the input data file "f1"
$ cat f1
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 2
Ir 0.4490 -1.2978 -0.2903
P 1.8738 -2.1613 1.4076
P -1.2367 -0.8359 -1.9047
H -2.5634 -0.7722 -1.3897
H -1.3955 -1.8409 -2.9008
H -1.2083 0.3415 -2.7087
H 2.5186 -3.3989 1.1226
H 1.2155 -2.4815 2.6289
H 2.9753 -1.3923 1.8863
C 1.3731 0.2542 -0.5394
O 1.9771 1.2723 -0.7122
Cl -0.8091 -3.4129 0.0288
H -0.9491 -0.6267 2.0929
H -1.2309 -0.4996 2.7795
14
Frame 3
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427
14
Frame 4
Ir 0.1111 -1.1111 -0.1111
P 1.2222 -2.2222 1.2222
P -1.3333 -0.3333 -1.3333
H -2.4444 0.4444 -1.4444
H -2.5555 -1.5555 -1.5555
H -1.6666 -0.6666 -2.6666
H 2.7777 -2.7777 1.7777
H 0.8888 -3.8888 2.8888
H 1.9999 -1.9999 2.9999
C 1.1010 -0.1010 -1.1010
O 2.1111 -0.1111 -1.1111
Cl 0.1212 -3.1212 -0.1212
H 0.1313 0.1313 0.1313
H -0.1414 -1.1414 0.1414
14
Frame 5
Ir 0.1515 -1.1515 -0.1515
P 1.1616 -2.1616 1.1616
P -1.1717 -0.1717 -1.1717
H -2.1818 0.1818 -1.1818
H -2.1919 -1.1919 -1.1919
H -1.2020 -0.2020 -2.2020
H 2.2121 -2.2121 1.2121
H 0.2222 -3.2222 2.2222
H 1.2323 -1.2323 2.2323
C 1.2424 -0.2424 -1.2424
O 2.2525 -0.2525 -1.2525
Cl 0.2626 -3.2626 -0.2626
H 0.2727 0.2727 0.2727
H -0.2828 -1.2828 0.2828
14
Frame 6
Ir 0.2929 -1.2929 -0.2929
P 1.3030 -2.3030 1.3030
P -1.3131 -0.3131 -1.3131
H -2.3232 0.3232 -1.3232
H -2.3333 -1.3333 -1.3333
H -1.3434 -0.3434 -2.3434
H 2.3535 -2.3535 1.3535
H 0.3636 -3.3636 2.3636
H 1.3737 -1.3737 2.3737
C 1.3838 -0.3838 -1.3838
O 2.3939 -0.3939 -1.3939
Cl 0.4040 -3.4040 -0.4040
H 0.4141 0.4141 0.4141
H -0.4242 -1.4242 0.4242
$
$
$ # show the content of the Perl program "f1.pl"
$
$ cat -n f1.pl
     1  #!perl -w
     2  print "Enter number of atoms: ";
     3  chomp($x = <STDIN>);
     4  print "Enter <n> if every nth frame is to be removed: ";
     5  chomp($y = <STDIN>);
     6  $file = "f1";
     7  open (F, "f1") or die "Can't open $file: $!";
     8  while (<F>) {                                  # iterate through the records
     9    chomp;                                       # chomp newline
    10    if (/^\d+$/) {                               # if the number is here all by itself
    11      $atoms = $_;                               # then it is the number of atoms
    12      $iter = 1;                                 # initialize the iterator
    13      $show = 0;                                 # don't show/print the records just yet
    14    } elsif (/^Frame (\d+)$/ and $1%$y != 0) {   # if the frame number is not a multiple of <n>
    15      print $atoms,"\n";                         # then print the number of atoms
    16      print $_,"\n";                             # and print the current line
    17      $show = 1;                                 # and set the flag so that the block could be printed
    18    } elsif ($show and $iter <= $x) {            # if flag is set and we haven't reached the upper limit
    19      print $_,"\n";                             # then keep printing the record
    20      $iter++;                                   # and incrementing the iterator
    21    }
    22  }
    23  close (F) or die "Can't close $file: $!";      # clean up after ourselves
$
$
$ # test the Perl program
$
$ perl f1.pl
Enter number of atoms: 14
Enter <n> if every nth frame is to be removed: 2
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 3
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427
14
Frame 5
Ir 0.1515 -1.1515 -0.1515
P 1.1616 -2.1616 1.1616
P -1.1717 -0.1717 -1.1717
H -2.1818 0.1818 -1.1818
H -2.1919 -1.1919 -1.1919
H -1.2020 -0.2020 -2.2020
H 2.2121 -2.2121 1.2121
H 0.2222 -3.2222 2.2222
H 1.2323 -1.2323 2.2323
C 1.2424 -0.2424 -1.2424
O 2.2525 -0.2525 -1.2525
Cl 0.2626 -3.2626 -0.2626
H 0.2727 0.2727 0.2727
H -0.2828 -1.2828 0.2828
$
$
$ # once more
$
$ perl f1.pl
Enter number of atoms: 14
Enter <n> if every nth frame is to be removed: 3
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 2
Ir 0.4490 -1.2978 -0.2903
P 1.8738 -2.1613 1.4076
P -1.2367 -0.8359 -1.9047
H -2.5634 -0.7722 -1.3897
H -1.3955 -1.8409 -2.9008
H -1.2083 0.3415 -2.7087
H 2.5186 -3.3989 1.1226
H 1.2155 -2.4815 2.6289
H 2.9753 -1.3923 1.8863
C 1.3731 0.2542 -0.5394
O 1.9771 1.2723 -0.7122
Cl -0.8091 -3.4129 0.0288
H -0.9491 -0.6267 2.0929
H -1.2309 -0.4996 2.7795
14
Frame 4
Ir 0.1111 -1.1111 -0.1111
P 1.2222 -2.2222 1.2222
P -1.3333 -0.3333 -1.3333
H -2.4444 0.4444 -1.4444
H -2.5555 -1.5555 -1.5555
H -1.6666 -0.6666 -2.6666
H 2.7777 -2.7777 1.7777
H 0.8888 -3.8888 2.8888
H 1.9999 -1.9999 2.9999
C 1.1010 -0.1010 -1.1010
O 2.1111 -0.1111 -1.1111
Cl 0.1212 -3.1212 -0.1212
H 0.1313 0.1313 0.1313
H -0.1414 -1.1414 0.1414
14
Frame 5
Ir 0.1515 -1.1515 -0.1515
P 1.1616 -2.1616 1.1616
P -1.1717 -0.1717 -1.1717
H -2.1818 0.1818 -1.1818
H -2.1919 -1.1919 -1.1919
H -1.2020 -0.2020 -2.2020
H 2.2121 -2.2121 1.2121
H 0.2222 -3.2222 2.2222
H 1.2323 -1.2323 2.2323
C 1.2424 -0.2424 -1.2424
O 2.2525 -0.2525 -1.2525
Cl 0.2626 -3.2626 -0.2626
H 0.2727 0.2727 0.2727
H -0.2828 -1.2828 0.2828
$
$
$

tyler_durden

sure, replace the %2 with a %3

I don't know if I am doing something wrong, but it doesn't work on my end.

Any suggestions to try and log what is going wrong?

---------- Post updated at 07:57 AM ---------- Previous update was at 07:54 AM ----------

Would it be possible to modify any of the suggestions so far such that frames 2,3,5,6,8,9,11,12, etc would be removed and the output file would have only frames 1,4,7,10,13,16,19,...etc?

This would remove 2 frames at a time instead of one.

Many many thanks to all of you for your help with this.:slight_smile:

You could pass a comma-delimited list of frame numbers that should not be displayed. Frame numbers not in the list would be displayed by default.

$
$
$ cat -n frames1.pl
     1  #!perl -w
     2  print "Enter number of atoms: ";
     3  chomp($x = <STDIN>);
     4  print "Enter comma-delimited list of frame nos. that should be removed: ";
     5  chomp($y = <STDIN>);
     6  @noshow = split(",", $y);                         # store non-displaying frames in array @noshow
     7  $file = "frames";
     8  open (F, "frames") or die "Can't open $file: $!";
     9  while (<F>) {                                     # iterate through the records
    10    chomp;                                          # chomp newline
    11    if (/^\d+$/) {                                  # if the number is here all by itself
    12      $atoms = $_;                                  # then it is the number of atoms
    13      $iter = 1;                                    # initialize the iterator
    14      $show = 0;                                    # don't show/print the records just yet
    15    } elsif (/^Frame (\d+)$/) {                     # if this line has the Frame number
    16      if (grep(/^$1$/,@noshow) == 0) {              # that should be displayed
    17        print $atoms,"\n";                          # then print the number of atoms
    18        print $_,"\n";                              # and print the current line
    19        $show = 1;                                  # and set the flag so that the block could be printed
    20      }
    21    } elsif ($show and $iter <= $x) {               # if flag is set and we haven't reached the upper limit
    22      print $_,"\n";                                # then keep printing the record
    23      $iter++;                                      # and incrementing the iterator
    24    }
    25  }
    26  close (F) or die "Can't close $file: $!";         # clean up after ourselves
$
$ perl frames1.pl
Enter number of atoms: 14
Enter comma-delimited list of frame nos. that should be removed: 2,3,5
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 4
Ir 0.1111 -1.1111 -0.1111
P 1.2222 -2.2222 1.2222
P -1.3333 -0.3333 -1.3333
H -2.4444 0.4444 -1.4444
H -2.5555 -1.5555 -1.5555
H -1.6666 -0.6666 -2.6666
H 2.7777 -2.7777 1.7777
H 0.8888 -3.8888 2.8888
H 1.9999 -1.9999 2.9999
C 1.1010 -0.1010 -1.1010
O 2.1111 -0.1111 -1.1111
Cl 0.1212 -3.1212 -0.1212
H 0.1313 0.1313 0.1313
H -0.1414 -1.1414 0.1414
14
Frame 6
Ir 0.2929 -1.2929 -0.2929
P 1.3030 -2.3030 1.3030
P -1.3131 -0.3131 -1.3131
H -2.3232 0.3232 -1.3232
H -2.3333 -1.3333 -1.3333
H -1.3434 -0.3434 -2.3434
H 2.3535 -2.3535 1.3535
H 0.3636 -3.3636 2.3636
H 1.3737 -1.3737 2.3737
C 1.3838 -0.3838 -1.3838
O 2.3939 -0.3939 -1.3939
Cl 0.4040 -3.4040 -0.4040
H 0.4141 0.4141 0.4141
H -0.4242 -1.4242 0.4242
$
$

tyler_durden

---------- Post updated at 11:46 AM ---------- Previous update was at 11:13 AM ----------

In the script posted earlier, it might be a tad cumbersome to input a comma-delimited list of all frame numbers that should not be displayed.

If "S" means "Show" and "NS" means "(Do) Not Show", then you could input the smallest non-repeating display pattern to the script. And the script can keep on implementing the pattern for all the frames.

For example, if the input pattern is "S,NS,NS", then it means:

(1)  Show block 1.
(2)  Do not show block 2.
(3)  Do not show block 3.
(4)  Repeat the above pattern (i.e. "S,NS,NS") for each block of 3 frames that follows.

You could come up with more complex patterns for varying block counts.
For example, an input pattern of "S,NS,NS,NS,S" means -

(1)  Show block 1.
(2)  Do not show block 2.
(3)  Do not show block 3.
(4)  Do not show block 4.
(5)  Show block 5.
------
(6)  Show block 6.
(7)  Do not show block 7.
(8)  Do not show block 8.
(9)  Do not show block 9.
(10)  Show block 10.
------
and so on...

The boundary condition - "S" means "show all blocks" and "NS" means "do not show any block".

Here's the code -

$
$
$ cat -n frames2.pl
     1  #!perl -w
     2  print "Enter number of atoms: ";
     3  chomp($x = <STDIN>);
     4  print "Enter comma-delimited display pattern: ";
     5  chomp($y = <STDIN>);
     6  @action = split(",", $y);                         # store the "show/noshow" action in an array
     7  @acopy  = @action;                                # copy the array because we'll modify the original
     8  $file = "frames";
     9  open (F, "frames") or die "Can't open $file: $!";
    10  while (<F>) {                                     # iterate through the records
    11    chomp;                                          # chomp newline
    12    if (/^\d+$/) {                                  # if the number is here all by itself
    13      $atoms = $_;                                  # then it is the number of atoms
    14      $iter = 1;                                    # initialize the iterator
    15      $show = 0;                                    # don't show/print the records just yet
    16    } elsif (/^Frame (\d+)$/) {                     # if this line has the Frame number
    17      $act = shift @action;                         # then pick up an element from array @action
    18      if ($act eq "S") {                            # if this action is "Show"
    19        print $atoms,"\n";                          # then print the number of atoms
    20        print $_,"\n";                              # and print the current line
    21        $show = 1;                                  # and set the flag so that the block could be printed
    22      }
    23      if ($#action == -1) {                         # if the original array @action is empty
    24        @action = @acopy;                           # then replenish it
    25      }
    26    } elsif ($show and $iter <= $x) {               # if flag is set and we haven't reached the upper limit
    27      print $_,"\n";                                # then keep printing the record
    28      $iter++;                                      # and incrementing the iterator
    29    }
    30  }
    31  close (F) or die "Can't close $file: $!";         # clean up after ourselves
$
$
$ perl frames2.pl
Enter number of atoms: 14
Enter comma-delimited display pattern: S,NS,NS
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 4
Ir 0.1111 -1.1111 -0.1111
P 1.2222 -2.2222 1.2222
P -1.3333 -0.3333 -1.3333
H -2.4444 0.4444 -1.4444
H -2.5555 -1.5555 -1.5555
H -1.6666 -0.6666 -2.6666
H 2.7777 -2.7777 1.7777
H 0.8888 -3.8888 2.8888
H 1.9999 -1.9999 2.9999
C 1.1010 -0.1010 -1.1010
O 2.1111 -0.1111 -1.1111
Cl 0.1212 -3.1212 -0.1212
H 0.1313 0.1313 0.1313
H -0.1414 -1.1414 0.1414
$
$
$ perl frames2.pl
Enter number of atoms: 14
Enter comma-delimited display pattern: S,NS
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 3
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427
14
Frame 5
Ir 0.1515 -1.1515 -0.1515
P 1.1616 -2.1616 1.1616
P -1.1717 -0.1717 -1.1717
H -2.1818 0.1818 -1.1818
H -2.1919 -1.1919 -1.1919
H -1.2020 -0.2020 -2.2020
H 2.2121 -2.2121 1.2121
H 0.2222 -3.2222 2.2222
H 1.2323 -1.2323 2.2323
C 1.2424 -0.2424 -1.2424
O 2.2525 -0.2525 -1.2525
Cl 0.2626 -3.2626 -0.2626
H 0.2727 0.2727 0.2727
H -0.2828 -1.2828 0.2828
$
$
$ perl frames2.pl
Enter number of atoms: 14
Enter comma-delimited display pattern: NS
$
$
$
$

HTH,
tyler_durden