File is pipe delimited with 17 fields. We may get \n char (1 or more \n in one field or multi fileds) in data in any field.
Need to replace \n in data with space and not the Ture \n that is line separator.
I tried below awk command it did not work as expected.
Hmmm - how (by which algorithm / rule) has the EEEE in the first input line turned into Surwald in the expected Output? And how the third line's 44444 into 99456 ?
Try also
awk -F'|' '
{while (NF<17) {getline X
$0 = $0 " " X
}
}
1
' file
1) If you tell us how to tell separator pipes from in-field-pipes, then someone could come up with some smart algorithm to handle that.
2) That little command keeps reading / appending new lines until the field count is 17; then: print (default action after "1" (= TRUE)).
If your field delimiter is sometimes a field delimiter and sometimes data, you need to be able to very clearly identify each occurrence of that character as either data or delimiter. If you can't specify a clear rule that unambiguously determines whether a given character is a delimiter or data, there is no way to identify field boundaries.
And when you have field delimiters that are sometimes data AND record delimiters that are sometimes data, you have a real mess.
Your best choice would be to choose a different field delimiter that cannot ever appear as data.
awk -F'|' '$1 ~ /^[0-9]+$/ { if(T) print T; T=$0; next; }
{ T = T " " $0; }
END { if(T) print T; }' allnum.txt
...but cannot be 100% reliable as Don Cragun says. It relies on the first field being all numbers, and if the broken line ever manages to imitate that, it will be fooled. And if | ever appears in a record nothing good will happen.
Thanks Rudi C,
In my file total fields are 17 and expected pipes are 16
Your command is working fine in case of extra pipes also i.e more than 16 pipes. Can you please help me with expalanation how its working in case of extra pipes in data.
Please find below input and output after applying your command.
I will be very Thankful to you !!!!
Input:
Below rows have extra pipes than expected:
1st row (19 pipes),2nd row (18 pipes, \n in data ),3rd row (19 pipes)
Below rows have no Extra pipes i.e 16 pipes as expected.
4th (row has \n in data ),5th row has no extra pipes i.e 16 pipes
Not sure I understand your question. Additional lines will be read and appended to $0 until there are 17 fields in $0. No distinction is made between pipe field separators and "extra pipes". Should your input have many "extra pipes" in early fields, that method may fail and still leave you with truncated lines.
Should that become a problem, see posts #5 and #7.