In a CSV file, say that a given column has been extracted. In that column, information is missing (i.e. blank lines appear). I would like to replace the blank lines by the last valid line (not blank) previously read.
For example, consider the extract below:
123
234
543
111
234
198
I would like this extract to be replaced by:
123
234
234
543
543
543
111
234
234
234
198
198
I wrote a simple script to do the job that reads myfile.txt and writes the result in out.txt:
MEMVAL=""
LINENUM=0
while read LINE
do
LINENUM=$(($LINENUM+1));
# Test if the line is blank
CURLINE=`echo $LINE | grep -E '^[:space]*$'`
TEST="$?"
if [ $TEST == 0 ]; then
if [ $LINENUM == 1 ]; then # The first line cannot be blank!!
exit 1
else
# Replace current blank line by memorised info in the output data file
echo "$MEMVAL" >> out.txt
fi
else
MEMVAL="$LINE"
# At first init output data file
if [ $LINENUM == 1 ]; then
echo "$LINE" > out.txt
fi
fi
done < myfile.txt
This script does the job, but it takes ages.
Also I guess there are far more efficient ways to do this by using either sed of awk (maybe csvfix in my case).
Could you please let us know if this is a typo or it is real, your all Output sample looks fine apart from last digit 198 which comes into a space which is before it? All other digits are filling spaces which are coming after them, so could you please confirm on this?
Also if this is not a typo then kindly do let us know what all are conditions where you need this kind of output please.
I acknowledge the fact that this point was not clear in my question. Indeed, it is not a typo in my current example (I considered having a blank like after the final: "198").
Cheers,
Bagvian
---------- Post updated at 05:53 PM ---------- Previous update was at 05:29 PM ----------
Dear All,
I have only one word following you answers: "smashing!"
Many thanks to all of you. I tried all the suggested solutions and they all perfectly solve the problem.
(I am not even mentioning the time your methods take compared to my original script. While on a more complex example my script would take a minute, your solutions take a fraction of a second!)
May I make it more complicated?
As I mentioned in the original post, the example I gave comes from a multi-column CSV file using ";" as a separator. The example I gave is the extraction of a given column.
To process this file my idea was to extract columns one by one, apply your suggested solutions to fill up the blank lines and glue the newly generated columns with, for instance: a "paste".
Again, starting from your suggestions, is there a more efficient way to to it?
As an illustration of a CSV file, the idea would be to start for example from: