Removing end of line to merge multiple lines

tink · October 14, 2008, 10:44am

I'm sure this will be an easy question for you experts out there, but I have been searching the forum and working on this for a couple hours now and can't get it right.

I have a very messy data file that I am trying to tidy up - one of the issues is some records are split into multiple lines:

999999000 "Name" "this is text for line one
line two
line three"

And I've been trying all sorts of version of sed to get it to look like this:
999999000 "Name" "this is text for line one line two line three"

and yes, I have tried things like sed 's/$/ /' file1 > file2... the problem is not every line has an issue, so I'm trying to figure out how to only remove line feeds for problematic lines, not all lines

the problem lines will begin with alpha characters not numeric, so I've been trying to do something with that but to no avail

thanks

ShawnMilo · October 14, 2008, 11:16am

cat temp.txt  | perl -pe 's/\n/ /'

joeyg · October 14, 2008, 11:19am

> cat file31
999999000 "Name" "this is text for line one
line two
line three"
888888000 "Yep" "All on one line"
777777111 "Yes" "Another good text"
555555999 "Name" "this is other text for line one
line two
line three"

> cat calc_file31
rm file32
while read line
  do
  if [ `echo "$line" | tr -d " " | grep '"$'` ]
   then
    echo "$line""~" >>file32
   else
    echo "$line" >>file32
  fi
done <file31

cat file32 | tr "\n" " " | tr "~" "\n"

> calc_file31
999999000 "Name" "this is text for line one line two line three"
 888888000 "Yep" "All on one line"
 777777111 "Yes" "Another good text"
 555555999 "Name" "this is other text for line one line two line three"
>

tink · October 14, 2008, 12:27pm

bloody marvelous joeyg - thanks!

This also worked for me in the end:
sed 's/"$/"|/g' file1 > file2

because the double quote was valid for the last column... so replace double quote and line end with double quote and pipe...

Thanks again

freelong · October 14, 2008, 2:28pm

awk '/^[0-9]/ { print ""; printf $0}
     !/^[0-9]/ {printf $0}
     END {print ""}' filename