help with file formatting

Hi,

I have a file with below contents:

AAA
pqr,jkl,mnop,abcd

BBB
abc,pqrs,xyz,uvw,
efgh,uvw,
rpk

CCC
123,456,789

Need output file as below:

AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Pl advise.

TIA
Prvn

What have you tried ? Post your attempts over here.

tyler_durden

simple awk can do this..

awk 'ORS=(NF)?",":"\n"' filename

I didn't try anything as all i know this can be done using "awk".

Its actually the CSV output (spooled) from few SQL statements and there's lot of data beyond 80 columns which gets wrapped to next line (and so on.). Now i need to keep entire data of a record into 1 line.

Prvn

---------- Post updated at 03:03 PM ---------- Previous update was at 02:57 PM ----------

Thank you vidyadhar85,

It worked great.

sed -n '/^$/!{
      $!{H;}
      ${H;x;s/\n/,/2;s/\n//g;p;}
      }
    /^$/{x;s/\n/,/2;s/\n//g;p;}'

Not really, some unexpected "," was added .

$ awk 'ORS=(NF)?",":"\n"' filename
AAA,pqr,jkl,mnop,abcd,
BBB,abc,pqrs,xyz,uvw,,efgh,uvw,,rpk,
CCC,123,456,789,

Yes,
but the OP seems satisfied. With awk it should be something like this:

awk -F, 'END { print r }
NR > 1 && /[A-Z]/ { 
  print r; r = "" 
  }
{ r = r ? r $0 : $0 FS }
' infile

On Solaris gawk, nawk or /usr/xpg4/bin/awk should be used.

 nawk 'BEGIN{FS=RS="";OFS=","} $1=$1' myFile

---------- Post updated at 06:49 AM ---------- Previous update was at 06:48 AM ----------

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

The solution of vidyadhar85 gives:

$ awk 'ORS=(NF)?",":"\n"' file
AAA,pqr,jkl,mnop,abcd,
BBB,abc,pqrs,xyz,uvw,,efgh,uvw,,rpk,
CCC,123,456,789,$

The solution of summer_cherry gives:

$ sed -n '/^$/!{
      $!{H;}
      ${H;x;s/\n/,/2;s/\n//g;p;}
      }
    /^$/{x;s/\n/,/2;s/\n//g;p;}' file
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
$

The solution of radoulov gives:

$ awk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' file
AAA,
pqr,jkl,mnop,abcd,
BBB,
abc,pqrs,xyz,uvw,,
efgh,uvw,,
rpk,
CCC,123,456,789
$

The solution of vgersh99 gives:

$ nawk 'BEGIN{FS=RS="";OFS=","} $1=$1' file
A,A,A,
,p,q,r,,,j,k,l,,,m,n,o,p,,,a,b,c,d
B,B,B,
,a,b,c,,,p,q,r,s,,,x,y,z,,,u,v,w,,,
,e,f,g,h,,,u,v,w,,,
,r,p,k
C,C,C,
,1,2,3,,,4,5,6,,,7,8,9
$

The expected output is:

AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Only the solution of summer_cherry gives me the right output. Am I missing something?

My approach:

awk '{$1=$1;gsub(",,",",")}1' OFS="," RS="\n\n" file

Output:

$ awk '{$1=$1;gsub(",,",",")}1' OFS="," RS="\n\n" file
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
$

Regards

I get the following output:

$ nawk 'BEGIN{FS=RS="";OFS=","} $1=$1' pr.txt
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,,efgh,uvw,,rpk
CCC,123,456,789

Sure, not exactly what the OP wanted. Here's another version:

nawk 'BEGIN{FS=RS="";OFS=","} {for(i=1;i<=NF;i++) sub(",$", "", $i)}$1=$1' pr.txt
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Franklin52, the potential issue with your code is that you assume that there're no embedded 'empty' fields in any of the lines: ',,':

BBB
abc,pqrs,,xyz,uvw,
efgh,,uvw,
rpk

Franklin52's post is important. What version of awk using my code produces the output you mentioned?

% nawk --version
awk version 20070501
% nawk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' infile        
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
% gawk --version |head -1
GNU Awk 3.1.7
% gawk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' infile        
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

On Solaris:

$ nawk -F, 'END { print r }
> NR > 1 && /[A-Z]/ {
>   print r; r = ""
>   }
> { r = r ? r $0 : $0 FS }
> ' infile
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789
$ /usr/xpg4/bin/awk -F, 'END { print r }
NR > 1 && /[A-Z]/ {
  print r; r = ""
  }
{ r = r ? r $0 : $0 FS }
' infile
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,789

Input file used:

AAA
pqr,jkl,mnop,abcd

BBB
abc,pqrs,xyz,uvw,
efgh,uvw,
rpk

CCC
123,456,789

---------- Post updated at 02:56 PM ---------- Previous update was at 02:53 PM ----------

Franklin52's code assumes an awk implementation that supports multi character record separator (RS). So it will work only with GNU awk or tawk (?) I suppose.

---------- Post updated at 02:59 PM ---------- Previous update was at 02:56 PM ----------

Franklin52 , could you please post the sample data used in your examples? Is it different from the OP example?

---------- Post updated at 03:09 PM ---------- Previous update was at 02:59 PM ----------

vgersh99's solutions are old nawk specific, because setting FS to an empty string has a different meaning in the other awk implementations (not sure about mawk and tawk, though).

Strange, I am using GNU Awk 3.1.5 and I use the same input file as the OP.... :eek:.

Regards

Thank you very much for the info. It's because of the locale (you're using UTF-8 or similar) :slight_smile:
The correct command should be:

LANG=C awk -F, 'END { print r }
NR > 1 && /[A-Z]/ { 
  print r; r = "" 
  }
{ r = r ? r $0 : $0 FS }
' infile

The problematic part is [A-Z].

---------- Post updated at 03:30 PM ---------- Previous update was at 03:27 PM ----------

Another workaround is to use [[:upper:]] (if supported) instead of [A-Z].

Only summer_cherry and radoulov solution work as expected on my OS.

# /usr/bin/awk --version
awk version 20070501 (FreeBSD)

radoulov,

You're right, I have LANG=en_ZA.UTF-8 on a Debian system. With "LANG=C" your solution works for me, but I don't get the right output with the solution of vgersh99.

Regards

To make it work you need to leave the FS:

awk 'BEGIN{RS="";OFS=","} {for(i=1;i<=NF;i++) sub(",$", "", $i)}$1=$1' infile

radoulov, you're right, thanks!

I think danmero has a similar problem.

Regards

I just change vgersh99 solution for:

awk 'BEGIN{RS="";OFS=","}{$1=$1;gsub(",,",",")}1' file

file:

AAA
pqr,jkl,mnop,abcd

BBB
abc,,pqrs,xyz,uvw,
efgh,uvw,
rpk

CCC
123,456,,,789
$ nawk 'BEGIN{RS="";OFS=","}{$1=$1;gsub(",,",",")}1' pr.txt
AAA,pqr,jkl,mnop,abcd
BBB,abc,pqrs,xyz,uvw,efgh,uvw,rpk
CCC,123,456,,789