Hello,
I have some files in a directory and a short list of strings. I want to loop through the files and remove lines containing the string and renumber.
There are some issues. The first is the strings that can contain troublesome characters like single quotes and parenthesis. Here is one list of strings,
1-[3-(3,3-dimethyl-2-oxobutylidene)(1,4-diazaperhydroin-2-ylidene)]-3,3-dimethylbutan-2-one
N-{(1E)-2-[4-(methylethyl)phenyl]-1-azaprop-1-enyl}-2-[(2-methylphenyl)amino]acetamide
1-acetyl-3-(5,6-dimethylisoindolin-2-yl)benzene
2-[(6-hydroxy-4,4-dimethyl-2-oxocyclohex-1(6)-enyl)(4-methylphenyl)methyl]-5,5-dimethylcyclohexane-1,3-dione
1,4,5-triphenyl-4-imidazoline-2-thione
1-(2-naphthylmethyl)-2-(naphthylmethyl)benzimidazole
1-(2-naphthyl)-2-({2-[(2-(2-naphthyl)-2-oxoethyl)piperidyl]ethyl}piperidyl)ethan-1-one_bromide_bromide
1-(2-hydroxyphenyl)-2,6-dimethyl-5-phenylhydropyrimidin-4-one
1-[3-(3,3-dimethyl-2-oxobutylidene)(1,4-diazaperhydroin-2-ylidene)]-3,3-dimethylbutan-2-one
4-(1,3-dioxobenzo[c]azolidin-2-yl)-N-methyl-N-(1,2,2,6,6-pentamethyl(4-piperidyl))butanamide
It is very likely that the list will contain the same string more than once. I either need to clean that up or have the script allow for instances where the string is not found.
The other complexity is that the line numbering doesn't start until the 15th line of the file.
I was thinking of something like,
#!/bin/bash
REMOVE_LIST=(
'1-[3-(3,3-dimethyl-2-oxobutylidene)(1,4-diazaperhydroin-2-ylidene)]-3,3-dimethylbutan-2-one' \
'N-{(1E)-2-[4-(methylethyl)phenyl]-1-azaprop-1-enyl}-2-[(2-methylphenyl)amino]acetamide' \
'1-acetyl-3-(5,6-dimethylisoindolin-2-yl)benzene' \
'2-[(6-hydroxy-4,4-dimethyl-2-oxocyclohex-1(6)-enyl)(4-methylphenyl)methyl]-5,5-dimethylcyclohexane-1,3-dione' \
'1,4,5-triphenyl-4-imidazoline-2-thione' \
'1-(2-naphthylmethyl)-2-(naphthylmethyl)benzimidazole' \
'1-(2-naphthyl)-2-({2-[(2-(2-naphthyl)-2-oxoethyl)piperidyl]ethyl}piperidyl)ethan-1-one_bromide_bromide' \
'1-(2-hydroxyphenyl)-2,6-dimethyl-5-phenylhydropyrimidin-4-one' \
'1-[3-(3,3-dimethyl-2-oxobutylidene)(1,4-diazaperhydroin-2-ylidene)]-3,3-dimethylbutan-2-one' \
'4-(1,3-dioxobenzo[c]azolidin-2-yl)-N-methyl-N-(1,2,2,6,6-pentamethyl(4-piperidyl))butanamide'
)
# collect list of files
FILE_LIST=($(ls './'*'out.txt' ))
# loop on files
for FILE in ${FILE_LIST[@]}
do
echo $FILE
# loop on strings to remove
for REMOVE_STRING in ${REMOVE_LIST[@]}
do
echo $REMOVE_STRING
# remove string, change cp to mv when this is working
grep -v "$REMOVE_STRING" $FILE > TEMP && mv TEMP $FILE'_tmp'
done
done
This code works for the line removal but is rather inefficient since it has to make separate calls to grep for each item in the remove list and do that for every file. This does not have to be particularly fast, but I would prefer if it was not quite so moronic.
As for the line renumbering starting with the 15th line, I have no idea.
Suggestions would be appreciated.
---------- Post updated at 07:17 PM ---------- Previous update was at 06:22 PM ----------
This is part of one of the files. You can see that the numbering starts on the line following forder. If it helps, the numbers start on the first line that begins with a number. The forder field can have value from f0-f9. The number of columns and rows in the files vary. This example shows the first 8 columns and 10 data rows.
f0order CVorder Name f0 RI_7 E99 E199 E299
NA NA NA NA R_r2 0.796 0.831 0.848
NA NA NA NA R_MeAE 88.54 80.06 76.27
NA NA NA NA R_MdAE 72.24 63.66 61.66
NA NA NA NA R_SE 104.44 96.49 92.37
NA NA NA NA T_r2 0.794 0.821 0.827
NA NA NA NA T_MeAE 108.38 105.79 99.11
NA NA NA NA T_MdAE 88.95 91.94 86.61
NA NA NA NA T_SE 107.44 105.46 104.84
NA NA NA NA V_r2 0.83 0.847 0.857
NA NA NA NA V_MeAE 108.36 103.86 97.23
NA NA NA NA V_MdAE 96.69 90.04 79.31
NA NA NA NA V_SE 102.58 103.24 102.13
f0order CVorder Name f0 RI_7 E99 E199 E299
1 2 2-ethylpyridine R 519 683 653 638
2 3 3-ethylpyridine R 535 675 646 631
3 4 2,6-lutidine R 506 632 614 608
4 5 2,5-lutidine R 517 620 605 598
5 6 2,3-lutidine R 518 612 598 592
6 7 3,4-lutidine R 528 600 589 583
7 8 3,5-lutidine R 532 569 560 559
8 9 2,4,6-collidine R 544 585 586 590
9 10 4-(methylamino)pyridine R 511 450 429 417
10 12 4-dimethylaminopyridine R 533 500 487 481
The only thing I can think of at the moment would be to copy the first 14 lines to a temp file and then delete them. Then I would renumber the rest of the file and then cat the file back together.
LMHmedchem