remove blank lines and merge lines in shell

dvah · March 8, 2011, 12:45pm

Hi,

I'm not a expert in shell programming, so i've come here to take help from u gurus.

I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command.

I've a datatable csv of the below format -

--in file format

xx,xx,xx   ,xx , ,  , ,  ,,xx,
xxxx,, ,, xxx,
 
yyy,yy,yy, , , ,yyy,yy,
yyy,yy,,,
 
....

--our file format

xx,xx,xx   ,xx , ,  , ,  ,,xx,xxxx,, ,, xxx,
yyy,yy,yy, , , ,yyy,yy,yyy,yy,,,

As you can see above, each row of record is seperated by a line gap and also a row record is split into multiple lines. I've to make it in a format as shown below so that i can insert them into my informix db.

So i'm trying acheive two things here

Remove blank lines
Collide a data row splitted across multiple lines into one(by using the blank line as the end of a record)

Many Thanks!

yinyuemi · March 8, 2011, 12:56pm

awk '{printf NF?$0:"\n"}'

Franklin52 · March 8, 2011, 12:56pm

Try this:

awk '$1=$1' RS=  infile > outfile

dvah · March 8, 2011, 1:06pm

This really works good, but i cudnt exactly understand the logic behind it. I'm aware NF is for no. of fields in each line and \n is for newline.

But i cudnt understand the whole logic, if u cud explain me, that be really helpful for me to make changes to this in future.

Grateful to your comments.

Many Thanks!

---------- Post updated at 11:36 PM ---------- Previous update was at 11:31 PM ----------

Thanks for all ur comments. It works weel and good.

I just need another help here.

I need to add a extra delimiter at end of each line. I'll go with the same ex that i posted above.

-in format
xx,xx,xx ,xx , , , , ,,xx,xxxx,
xxxx,, ,, xxx,,

yyy,yy,yy, , , ,yyy,yy,,
yyy,yy,,,yyyyy,

....

Corona688 · March 8, 2011, 1:16pm

How about this:

awk  'BEGIN { RS="" ; FS="\n" ; OFS="," } { print $1, $2 }' < csvrec.csv > out.csv

That works by

Splitting records based on blank lines (RS="")
Using the newline as the field separator, so $1 is line 1 and $2 is line 2
Using "," as the output field separator, so it adds an "," between the two when you print a,b

---------- Post updated at 12:16 PM ---------- Previous update was at 12:07 PM ----------

If you need an extra "," on the end you can just do print $1, $2, ""

dvah · March 8, 2011, 1:22pm

corona688:

How about this:
awk  'BEGIN { RS="" ; FS="\n" ; OFS="," } { print $1, $2 }' < csvrec.csv > out.csv
That works by

Splitting records based on blank lines (RS="")

Using the newline as the field separator, so $1 is line 1 and $2 is line 2

Using "," as the output field separator, so it adds an "," between the two when you print a,b

---------- Post updated at 12:16 PM ---------- Previous update was at 12:07 PM ----------

If you need an extra "," on the end you can just do print $1, $2, ""

It works partially good.

i've to implement this shell for hundreds of csv's, so i'm not quite sure on the number of lines for each record in these files. so print $1, $2 works well for two lines of record followed by a blank line. But this has to be dynamic.
It doesn't seem to add a extra delimiter to the end of each record, i.e before each blank line

Thanks!

Corona688 · March 8, 2011, 1:26pm

Good to know. How about awk 'BEGIN { RS="" ; FS="\n" } { for(N=1; N<=NF; N++) printf("%s,", $N); printf("\n"); }' < csvrec.csv then?

dvah · March 8, 2011, 2:16pm

This works perfect. Thanks a lot:)

---------- Post updated at 12:46 AM ---------- Previous update was at 12:02 AM ----------

I'm facing a error for few files.

Here is the error that i got -

awk: Input line xxxxx cannot be longer than 3,000 bytes.
The source line number is xxxxx.

Do u have any idea wat the problem here wud be?

Thanks!

Corona688 · March 8, 2011, 3:12pm

Please don't bump this up with a new thread if we don't respond fast enough.

The error means exactly what it says: the input data is too long. Some implementations of awk have this annoying limit. If your system has nawk or gawk that should work better.

freegnu · March 8, 2011, 5:53pm

printf print without a newline
NF?$0:"\n" a conditional expression.
test_condition ? true_result : false_result
NF return 0 for blank lines. Zero is considered false.
$0 returns the whole line/record being considered
"\n" adds a newline to the output when NF is zero

---------- Post updated at 05:53 PM ---------- Previous update was at 04:46 PM ----------

I wrote my own long winded versions of the concise:

awk '{printf NF?$0:"\n"}'

This one is just a verbose version of the above version:

awk 'NF > 0{printf $0};NF == 0{printf "\n"}'

This one actually breaks the fields down properly in case you need to process them but outputs field and record separators manually:

awk 'BEGIN{RS="";FS=",[\n]*";OFS="";ORS=""}{for(i=1;i<NF;i++) print $i ",";print $NF "\n"}'

dvah · March 9, 2011, 5:55am

My system doesn't seem to support gawk. I tried the same command with nawk, and i got the same error. Is there any other way to handle this situation. Please help me on this. If im thru with this, my whole script would be ready to deploy.

Thanks!

---------- Post updated at 04:25 PM ---------- Previous update was at 03:21 PM ----------

freegnu:

printf print without a newline
NF?$0:"\n" a conditional expression.
test_condition ? true_result : false_result
NF return 0 for blank lines. Zero is considered false.
$0 returns the whole line/record being considered
"\n" adds a newline to the output when NF is zero

---------- Post updated at 05:53 PM ---------- Previous update was at 04:46 PM ----------

I wrote my own long winded versions of the concise:
awk '{printf NF?$0:"\n"}'
This one is just a verbose version of the above version:
awk 'NF > 0{printf $0};NF == 0{printf "\n"}'
This one actually breaks the fields down properly in case you need to process them but outputs field and record separators manually:
awk 'BEGIN{RS="";FS=",[\n]*";OFS="";ORS=""}{for(i=1;i<NF;i++) print $i ",";print $NF "\n"}'

I'm facing a error with few files.

awk: There are not enough parameters in printf statement

I tried to print the whole file(without doing any tailoring) with this statement - awk '{printf $0}' < xxx I got the same error, and i found the error reported line has a character '%' in it.

So, i guess printf considers the % symbol in file as a special character. But im not sure this wud be the problem.

Thanks

alister · March 9, 2011, 8:45am

The correct way to print an arbitrary string with printf is printf "%s", $0