awk - need to remove unwanted newlines on match

Context:
I need to remove unwanted newlines from a data file listing books and associated data. Here is a sample listing ( line numbers included ):

1 360762| Skip-beat! 14 /| 9781421517544| nb        | 2008.| Nakamura, Yoshiki.| NAKAMUR | Kyoko Mogami followed 
2 her true love Sho to Tokyo to support him while he made it big as an idol. But he's casting her out now that he's famous. 
Kyoko won't suffer in silence--she's going to get her sweet revenge by beating Sho in show biz.
3 361018| Angel numbers 101 : the meaning of 111, 123, 444, and other number sequences /| 1401920012| b         | 2008.| 
Virtue, Doreen, 1958-| 133.3359 VIRTUE | 

I am using the following, found in these forums, for removing unwanted newlines:

awk 'NR==1{s=$0;next} /^[a-zA-Z]|^;/{s=s$0;next} {print s;s=$0} END{if(s)print s}' $RAW_DATA > $UNSPLIT

However, it is inexact and leaves some lines with punctuation and dates unresolved.

It needs to:
Find lines in which the first field DOES NOT contain precisely 6 digits and append them to the line above.

Thanks ~

Bub

When you say "( line numbers included ):", do you mean you added for readability? If you just post what the output shoud look like,it will be easier.

# awk -F\| '{if(NR==1){printf}else{if($1*1){printf "\n%s",$0}else{printf " %s",$0}}}' file

Similar problem : to get two almost identical rows into one - The UNIX and Linux Forums

I was using gvim and the line numbers didn't copy over so I added them. I mentioned that to let people know it wasn't part of the data.

Sorry for the confusion.

---------- Post updated at 10:16 AM ---------- Previous update was at 09:59 AM ----------

Thanks Danmero ...but I get this error.

awk: (FILENAME=All_Items.out FNR=1) fatal: printf: no arguments

---------- Post updated at 10:58 AM ---------- Previous update was at 10:16 AM ----------

Thanks for the link to the other post Danmero. That actually turned out to
be what I looking for. I adjusted it to my situation as follows:

awk -F\| --posix '{if(/^[0-9]{6}/){if(NR>1){printf "%s\n",$0}else{printf}}}' All_Items.out > tester

I'm not sure I understand how your example on this thread was supposed to work though.

As a bit of an aside:
Is there a better way to describe the regex above ...i.e. without the --posix
option?

Bub

  1. Use GNU awk (gawk), New awk (nawk) or POSIX awk (/usr/xpg4/bin/awk) on Solaris.
  2. Works for me
  3. To keep the forums high quality for all users, please take the time to format your posts correctly.
    [list=i]
  4. Use Code Tags when you post any code or data samples so others can easily read your code.
    You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags and by hand.)
  5. Avoid adding color or different fonts and font size to your posts.
    Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
  6. Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.
    [/list]
    Thank You.

    The UNIX and Linux Forums
    Reply With Quote

Use:

if ($1 >= 100000 && $1 < 1000000)

instead of:

if(/^[0-9]{6}/)

Regards

Thanks!

Bub