After reformatting your code so we can see the structure, getting rid of the subshell issue Scrutinizer mentioned, adding missing <dollar-sign> characters, changing <single-quote> characters to <double-quote> characters, and adding missing <double-quote> characters to get around syntax errors:
OUT_FILE=out
FLAG=0
while read CUR_LINE
do
if [[ $FLAG -ne 0 ]]
then
if [[ `echo ${CUR_LINE} | awk -F "�" '{print NF -1}'` -le 0 ]]
then
PREV_LINE="${PREV_LINE} ${CUR_LINE}"
NEW_LINE=`echo ${PREV_LINE} | tr -d '\n' | tr -d '^M'`
PREV_LINE="${NEW_LINE}"
else
echo ${PREV_LINE} >> ${OUT_FILE}
PREV_LINE="${CUR_LINE}"
fi
else
PREV_LINE="${CUR_LINE}"
FLAG=1
fi
done < filename
echo ${PREV_LINE} >> ${OUT_FILE}
we can see that this is grossly inefficient code. Having a while loop is not your problem, executing awk
once for each of your 1.7 million input lines (except the 1st ) and tr
twice for both empty lines and lines with only one field (especially since one of those invocations of tr
is always a no-op) is going to be extremely slow.
Your code seems to be trying to remove <carriage-return> characters from your input (which you never mentioned were present before). And, we can't tell if you're trying to remove <carriage-return> or circumflex and upper-case M characters. (The above code removes all circumflex and upper-case M characters from your input.)
It also converts all sequences of one or more adjacent <space> and <tab> characters to a single <space> character (which again was not mentioned as a requirement until now). Is this intentional, or an accident? Or does your input contain no <tab> characters and no occurrences of multiple adjacent <space> characters?
It gets rid of backslash characters at the ends of input lines and joins lines that end with <backslash> characters no matter how many fields are on the joined lines. Is this intentional, or an accident? Or, are you sure that none of your input lines end with a <backslash> character just before a <newline> character?
And, depending on what shell you're using and what operating system you're using, any other <backslash> characters in your input could be deleted or converted to other characters by your uses of echo
.
Please show us the code you are really using. Please also upload a SMALL sample input file (not more than 50 lines) that contains examples of all of the transformations that need to take place while removing characters, joining lines, and squeezing blanks, AND upload the desired output corresponding to that input. I explicitly say upload because we need to be sure that we will be able to see the difference between spaces and tabs in your desired input and output and see the <carriage-return> characters in your input.