Hello,
I have some tab delimited text data,
file: final_temp1
aname val
NAME;r'(1,) 3.28584
r'(2,)<tab>
NAME;r'(3,) 6.13003
NAME;r'(4,) 4.18037
r'(5,)<tab>
You can see that the data is incomplete in some cases. There is a trailing tab after the first column for each incomplete row. I have added the notation above to make that clear
I also have a list of the incomplete cases.
file: incomplete_case_list
r'(2,)
r'(5,)
What I need to do is to work through the list of incomplete cases to find the matching row in my file and alter it. I need to add "NAME;" as a prefix to the first column value, followed by tab, followed by the word "failed"
aname val
NAME;r'(1,) 3.28584
NAME;r'(2,) failed
NAME;r'(3,) 6.13003
NAME;r'(4,) 4.18037
NAME;r'(5,) failed
I thought I could just loop through the incomplete file list and make sed substitutions,
# loop through incomplete file list
while read line; do
# remove tab from end of line
clean_line=$(echo $line | sed "s/\t//1")
# create new line
new_line='NAME;'$clean_line'\t''failed'
# find original line and replace with modified version
sed "s/$line/$new_line/1" final_temp1 > final_temp2
# overwrite original file with modified file to propagate changes forward
mv final_temp2 final_temp1
done < incomplete_case_list
I am getting a sed error,
sed: -e expression #1, char 160: Invalid range end
sed: -e expression #1, char 168: Invalid range end
sed: -e expression #1, char 134: Invalid range end
I don't think this is from the first sed command (substituting the tab) but the error is not very clear to me. In my real files, the values in the name column can have a number of characters like comma, unmatched single quotes, parenthesis, square brackets, and curly braces. I am wondering if sed is rejecting some of these characters. I tried putting double quotes around $line and $new_line in the second sed command, but that doesn't help
I tried replacing the sed line with awk,
awk -v var1="$line" -v var2="$new_line" '{gsub(var1, var2, $0); print}' final_temp1 > final_temp2
This gives me the error,
awk: cmd. line:1: (FILENAME=final_temp1 FNR=1) fatal: Invalid range end: /1-[10-(4-amino-2-methylquinolyl)decyl]-2-methyl-4-quinolylamine_4Np.mol/
The is one of the messy names from actual data. Is there something in this string that needs to be handled differently. I frequently use both sed an awk with data like the this and I have not seen this error before.
I am not sure if sed will find the pattern because the line terminates with a tab and I am not sure that is being read into "line" during the while loop. I also don't know if there is still and end of line character there or not. I suppose I could strip out all trailing whitespace character first.
The repetitive overwriting of the files is also expensive but it is unlikely that there will ever by very many entries in the incomplete_case_list.
Are there any comments on what I am doing wrong here, or a better method all together?
Thanks,
LMHmedchem