Context:
I need to remove unwanted newlines from a data file listing books and associated data. Here is a sample listing ( line numbers included ):
1 360762| Skip-beat! 14 /| 9781421517544| nb | 2008.| Nakamura, Yoshiki.| NAKAMUR | Kyoko Mogami followed
2 her true love Sho to Tokyo to support him while he made it big as an idol. But he's casting her out now that he's famous.
Kyoko won't suffer in silence--she's going to get her sweet revenge by beating Sho in show biz.
3 361018| Angel numbers 101 : the meaning of 111, 123, 444, and other number sequences /| 1401920012| b | 2008.|
Virtue, Doreen, 1958-| 133.3359 VIRTUE |
I am using the following, found in these forums, for removing unwanted newlines:
awk 'NR==1{s=$0;next} /^[a-zA-Z]|^;/{s=s$0;next} {print s;s=$0} END{if(s)print s}' $RAW_DATA > $UNSPLIT
However, it is inexact and leaves some lines with punctuation and dates unresolved.
It needs to:
Find lines in which the first field DOES NOT contain precisely 6 digits and append them to the line above.
Thanks ~
Bub
When you say "( line numbers included ):", do you mean you added for readability? If you just post what the output shoud look like,it will be easier.
# awk -F\| '{if(NR==1){printf}else{if($1*1){printf "\n%s",$0}else{printf " %s",$0}}}' file
Similar problem : to get two almost identical rows into one - The UNIX and Linux Forums
I was using gvim and the line numbers didn't copy over so I added them. I mentioned that to let people know it wasn't part of the data.
Sorry for the confusion.
---------- Post updated at 10:16 AM ---------- Previous update was at 09:59 AM ----------
Thanks Danmero ...but I get this error.
awk: (FILENAME=All_Items.out FNR=1) fatal: printf: no arguments
---------- Post updated at 10:58 AM ---------- Previous update was at 10:16 AM ----------
Thanks for the link to the other post Danmero. That actually turned out to
be what I looking for. I adjusted it to my situation as follows:
awk -F\| --posix '{if(/^[0-9]{6}/){if(NR>1){printf "%s\n",$0}else{printf}}}' All_Items.out > tester
I'm not sure I understand how your example on this thread was supposed to work though.
As a bit of an aside:
Is there a better way to describe the regex above ...i.e. without the --posix
option?
Bub
Use:
if ($1 >= 100000 && $1 < 1000000)
instead of:
if(/^[0-9]{6}/)
Regards