Removing blank columns from a file

sumirmehta · August 18, 2008, 4:21pm

Hi,

I have a sample file as shown below --

id parent name dba
----------------------------------- ----------------------------------- ---------------------------------------- ----------------------------------------
0367169 BANK OF LITHUANIA
0108949 BRATTON LTD
0214359 HALYARD RE BV
0216856 VIKING CABLE AS
0400088 DEUTSCHE BANK AG
0212256 AXA SA
0020503 O/S COUPON SETTLEMENTS
0167528 TT UK LTD

The column under the headers parent and dba are fully blank. I want to convert this file into a csv. tried using sed s/" "*" "/,/g but that removes the whole blank columns as well... any ideas ??

sudhamacs · August 18, 2008, 4:40pm

Is it a fixed length format file? If so what is the length of the fields? You want to convert it into csv file, including the blank columns right?
If not any sample input and output?

sumirmehta · August 18, 2008, 4:47pm

the file is basically the dump of a query/stored proc , assuming it is fixed format (each column width being n characters, say 10), how could it be done then....

another example of what is required

input

a b c
---------- ---------- ----------
1234 xyz
3456 ABC
5325 DEF

(it is not visible somehow over here, but there are blank spaces between a-b and b-c and similarly between values in other columns,
thus for row1, a=1234, b= , c=xyz , and similarly for the rest of the file)

and output would be something like this --
a,b,c
----------,----------,----------
1234,,xyz
3456,,ABC
5325,,DEF

The first two rows are not necessary and could be ignored in the final output (if that helps in any way)

gnsxhj · August 18, 2008, 5:08pm

well, as u said, if the format of the number is fixed ,that should be easy :

sed 's/\([0-9]\{4\}\) /\1,,/' filename

sumirmehta · August 18, 2008, 5:15pm

there are certain variations to the file mentioned above

the width of a column is fixed (say 10) but that includes the numbers and padded spaces, so it could be "1234 " and "123456 "
there are other columns as well (3 shown here for simplicity), and any of the columns in between could be blank.

The real problem here is that as there are spaces in the columns which have values (padded spaces) as well as there are columns which are totally blank, it is difficult to identify the diff between them..

era · August 18, 2008, 5:16pm

You can use cut -c to specify the character columns to keep, but cut can't add commas without help. A regular expression matching the columns you want to keep might also be an option.

You can use code tags to post snippets with their whitespace preserved.

sumirmehta · August 18, 2008, 10:48pm

the cut -c thing does work... but is a little bit crude ... any other ideas?

era · August 20, 2008, 9:58pm

perl -nle 'print join (",", substr($_, 0, 4), "", substr ($_, 15,42))' file

You'll have to find out the correct offsets to use in the substr calls; without a proper sample, we can't know what they are.

If you want to trim trailing blanks, then perhaps something like

perl -nle '$a = substr($_, 0, 4); $a =~ s/\s+$//;
  $c = substr ($_, 15, 42); $c =~ s/\s+$//;
  print join (",", $a, "", $c)' file

If your data doesn't contain any field-internal blanks, you could simply substitute runs of blanks with a comma, then extract the fields you want.

tr -s ' ' , < file | cut -f1,3

But your original example did contain blanks, so I guess this is out of the question.