help with file formatting

prvnrk · September 17, 2009, 2:37pm

Hi,

I have a file with below contents:

AAA
pqr,jkl,mnop,abcd

BBB
abc,pqrs,xyz,uvw,
efgh,uvw,
rpk

CCC
123,456,789

Need output file as below:

AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Pl advise.

TIA
Prvn

durden_tyler · September 17, 2009, 3:23pm

What have you tried ? Post your attempts over here.

tyler_durden

vidyadhar85 · September 17, 2009, 3:56pm

simple awk can do this..

awk 'ORS=(NF)?",":"\n"' filename

prvnrk · September 17, 2009, 4:03pm

I didn't try anything as all i know this can be done using "awk".

Its actually the CSV output (spooled) from few SQL statements and there's lot of data beyond 80 columns which gets wrapped to next line (and so on.). Now i need to keep entire data of a record into 1 line.

Prvn

---------- Post updated at 03:03 PM ---------- Previous update was at 02:57 PM ----------

Thank you vidyadhar85,

It worked great.

summer_cherry · September 18, 2009, 3:10am

sed -n '/^$/!{
      $!{H;}
      ${H;x;s/\n/,/2;s/\n//g;p;}
      }
    /^$/{x;s/\n/,/2;s/\n//g;p;}'

rdcwayx · September 18, 2009, 3:40am

Not really, some unexpected "," was added .

$ awk 'ORS=(NF)?",":"\n"' filename
AAA,pqr,jkl,mnop,abcd,
BBB,abc,pqrs,xyz,uvw,,efgh,uvw,,rpk,
CCC,123,456,789,

radoulov · September 18, 2009, 6:31am

Yes,
but the OP seems satisfied. With awk it should be something like this:

awk -F, 'END { print r }
NR > 1 && /[A-Z]/ { 
  print r; r = "" 
  }
{ r = r ? r $0 : $0 FS }
' infile

On Solaris gawk, nawk or /usr/xpg4/bin/awk should be used.

vgersh99 · September 18, 2009, 6:49am

 nawk 'BEGIN{FS=RS="";OFS=","} $1=$1' myFile

---------- Post updated at 06:49 AM ---------- Previous update was at 06:48 AM ----------

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

Franklin52 · September 18, 2009, 8:20am

The solution of vidyadhar85 gives:

$ awk 'ORS=(NF)?",":"\n"' file
AAA,pqr,jkl,mnop,abcd,
BBB,abc,pqrs,xyz,uvw,,efgh,uvw,,rpk,
CCC,123,456,789,$

The solution of summer_cherry gives:

$ sed -n '/^$/!{
      $!{H;}
      ${H;x;s/\n/,/2;s/\n//g;p;}
      }
    /^$/{x;s/\n/,/2;s/\n//g;p;}' file
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
$

The solution of radoulov gives:

$ awk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' file
AAA,
pqr,jkl,mnop,abcd,
BBB,
abc,pqrs,xyz,uvw,,
efgh,uvw,,
rpk,
CCC,123,456,789
$

The solution of vgersh99 gives:

$ nawk 'BEGIN{FS=RS="";OFS=","} $1=$1' file
A,A,A,
,p,q,r,,,j,k,l,,,m,n,o,p,,,a,b,c,d
B,B,B,
,a,b,c,,,p,q,r,s,,,x,y,z,,,u,v,w,,,
,e,f,g,h,,,u,v,w,,,
,r,p,k
C,C,C,
,1,2,3,,,4,5,6,,,7,8,9
$

The expected output is:

AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Only the solution of summer_cherry gives me the right output. Am I missing something?

My approach:

awk '{$1=$1;gsub(",,",",")}1' OFS="," RS="\n\n" file

Output:

$ awk '{$1=$1;gsub(",,",",")}1' OFS="," RS="\n\n" file
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
$

Regards

vgersh99 · September 18, 2009, 8:45am

I get the following output:

$ nawk 'BEGIN{FS=RS="";OFS=","} $1=$1' pr.txt
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,,efgh,uvw,,rpk
CCC,123,456,789

Sure, not exactly what the OP wanted. Here's another version:

nawk 'BEGIN{FS=RS="";OFS=","} {for(i=1;i<=NF;i++) sub(",$", "", $i)}$1=$1' pr.txt
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Franklin52, the potential issue with your code is that you assume that there're no embedded 'empty' fields in any of the lines: ',,':

BBB
abc,pqrs,,xyz,uvw,
efgh,,uvw,
rpk

radoulov · September 18, 2009, 9:09am

Franklin52's post is important. What version of awk using my code produces the output you mentioned?

% nawk --version
awk version 20070501
% nawk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' infile        
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
% gawk --version |head -1
GNU Awk 3.1.7
% gawk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' infile        
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

On Solaris:

$ nawk -F, 'END { print r }
> NR > 1 && /[A-Z]/ {
>   print r; r = ""
>   }
> { r = r ? r $0 : $0 FS }
> ' infile
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
$ /usr/xpg4/bin/awk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' infile
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Input file used:

AAA
pqr,jkl,mnop,abcd

BBB
abc,pqrs,xyz,uvw,
efgh,uvw,
rpk

CCC
123,456,789

---------- Post updated at 02:56 PM ---------- Previous update was at 02:53 PM ----------

Franklin52's code assumes an awk implementation that supports multi character record separator (RS). So it will work only with GNU awk or tawk (?) I suppose.

---------- Post updated at 02:59 PM ---------- Previous update was at 02:56 PM ----------

Franklin52 , could you please post the sample data used in your examples? Is it different from the OP example?

---------- Post updated at 03:09 PM ---------- Previous update was at 02:59 PM ----------

vgersh99's solutions are old nawk specific, because setting FS to an empty string has a different meaning in the other awk implementations (not sure about mawk and tawk, though).

Franklin52 · September 18, 2009, 9:19am

Strange, I am using GNU Awk 3.1.5 and I use the same input file as the OP.... .

Regards

radoulov · September 18, 2009, 9:30am

Thank you very much for the info. It's because of the locale (you're using UTF-8 or similar)
The correct command should be:

LANG=C awk -F, 'END { print r }
NR > 1 && /[A-Z]/ { 
  print r; r = "" 
  }
{ r = r ? r $0 : $0 FS }
' infile

The problematic part is [A-Z].

---------- Post updated at 03:30 PM ---------- Previous update was at 03:27 PM ----------

Another workaround is to use [[:upper:]] (if supported) instead of [A-Z].

danmero · September 18, 2009, 9:32am

Only summer_cherry and radoulov solution work as expected on my OS.

# /usr/bin/awk --version
awk version 20070501 (FreeBSD)

Franklin52 · September 18, 2009, 9:41am

radoulov,

You're right, I have LANG=en_ZA.UTF-8 on a Debian system. With "LANG=C" your solution works for me, but I don't get the right output with the solution of vgersh99.

Regards

radoulov · September 18, 2009, 9:44am

To make it work you need to leave the FS:

awk 'BEGIN{RS="";OFS=","} {for(i=1;i<=NF;i++) sub(",$", "", $i)}$1=$1' infile

Franklin52 · September 18, 2009, 9:50am

radoulov, you're right, thanks!

I think danmero has a similar problem.

Regards

danmero · September 18, 2009, 1:29pm

I just change vgersh99 solution for:

awk 'BEGIN{RS="";OFS=","}{$1=$1;gsub(",,",",")}1' file

vgersh99 · September 18, 2009, 1:41pm

file:

AAA
pqr,jkl,mnop,abcd

BBB
abc,,pqrs,xyz,uvw,
efgh,uvw,
rpk

CCC
123,456,,,789

$ nawk 'BEGIN{RS="";OFS=","}{$1=$1;gsub(",,",",")}1' pr.txt
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,,789