In a file, replace blank line by the last line not blank above

bagvian · February 24, 2017, 2:33pm

Dear All,

In a CSV file, say that a given column has been extracted. In that column, information is missing (i.e. blank lines appear). I would like to replace the blank lines by the last valid line (not blank) previously read.

For example, consider the extract below:

I would like this extract to be replaced by:

I wrote a simple script to do the job that reads myfile.txt and writes the result in out.txt:

MEMVAL=""
LINENUM=0
while read LINE
do
  LINENUM=$(($LINENUM+1));

  # Test if the line is blank
  CURLINE=`echo $LINE | grep -E '^[:space]*$'`
  TEST="$?"
  if [ $TEST == 0 ]; then 
    if [ $LINENUM == 1 ]; then # The first line cannot be blank!!
      exit 1
    else
      # Replace current blank line by memorised info in the output data file
      echo "$MEMVAL" >> out.txt
    fi
  else
    MEMVAL="$LINE"

    # At first init output data file
    if [ $LINENUM == 1 ]; then 
      echo "$LINE" > out.txt
    fi
  fi
done < myfile.txt

This script does the job, but it takes ages.
Also I guess there are far more efficient ways to do this by using either sed of awk (maybe csvfix in my case).

Would anybody have a suggestion?
Cheers, Bagvian

RudiC · February 24, 2017, 2:59pm

How about

awk '/^ *$/ {$0 = PRV} {PRV = $0} 1' file
123
234
234
543
543
543
111
234
234
234
198

RavinderSingh13 · February 24, 2017, 3:43pm

bagvian:

Dear All,
In a CSV file, say that a given column has been extracted. In that column, information is missing (i.e. blank lines appear). I would like to replace the blank lines by the last valid line (not blank) previously read.
For example, consider the extract below:
123
234

543


111
234


198
I would like this extract to be replaced by:
123
234
234
543
543
543
111
234
234
234
198
198
I wrote a simple script to do the job that reads myfile.txt and Would anybody have a suggestion?
Cheers, Bagvian

Hello bagvian,

Could you please let us know if this is a typo or it is real, your all Output sample looks fine apart from last digit 198 which comes into a space which is before it? All other digits are filling spaces which are coming after them, so could you please confirm on this?
Also if this is not a typo then kindly do let us know what all are conditions where you need this kind of output please.

Thanks,
R. Singh

Aia · February 24, 2017, 8:03pm

perl -pe '/^\s+?$/ and $_=$p; $p=$_' myfile.txt > out.txt

MadeInGermany · February 25, 2017, 3:52am

Certainly the *last* blank line is missing in the given input file.
Another solution

awk 'NF {s=$0} {print s}' file

sed '/[^ ]/h; g' file

bagvian · February 26, 2017, 11:53am

Hi R. Singh,

I acknowledge the fact that this point was not clear in my question. Indeed, it is not a typo in my current example (I considered having a blank like after the final: "198").
Cheers,
Bagvian

---------- Post updated at 05:53 PM ---------- Previous update was at 05:29 PM ----------

Dear All,

I have only one word following you answers: "smashing!"
Many thanks to all of you. I tried all the suggested solutions and they all perfectly solve the problem.
(I am not even mentioning the time your methods take compared to my original script. While on a more complex example my script would take a minute, your solutions take a fraction of a second!)

May I make it more complicated?
As I mentioned in the original post, the example I gave comes from a multi-column CSV file using ";" as a separator. The example I gave is the extraction of a given column.

To process this file my idea was to extract columns one by one, apply your suggested solutions to fill up the blank lines and glue the newly generated columns with, for instance: a "paste".

Again, starting from your suggestions, is there a more efficient way to to it?

As an illustration of a CSV file, the idea would be to start for example from:

123;AZE;
212;;
;;
;ZER;
345;;
;;
;;

and after processing obtain:

123;AZE;
212;AZE;
212;AZE;
212;ZER;
345;ZER;
345;ZER;
345;ZER;

Any suggestion would be welcome.
Cheers,
Bagvian

RudiC · February 26, 2017, 12:09pm

Any chance you came up with an idea of your own based on what you learned in this thread?

EDIT:
Howsoever, try

awk '$1 {PRV1 = $1}  $2 {PRV2 = $2} {print PRV1, PRV2, _}' FS=";" OFS=";" file
123;AZE;
212;AZE;
212;AZE;
212;ZER;
345;ZER;
345;ZER;
345;ZER;

Scrutinizer · February 26, 2017, 12:45pm

General solution:

awk '{for(i=1; i<=NF; i++) if($i=="") $i=p; else p=$i}1' FS=\; OFS=\;  file

--
Note: one needs to test if the field is empty, otherwise it will fail if the one of the fields is the number "0"