2 carriage return within a record

Hi all,

need your help in replacing carriage return in a record.

Input:

col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee
eee|ff|ggggg|hh
hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr

Output:

col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr

Currently i execute script to get rid of carriage return, but it concatenates records start from line2 into one whole record

grep -v '^$' Input.txt | nawk -F\| 'FNR==1{n=NF}NF<n{for (i=1;i<=n;i++) if (i<n) {l=$0;getline;$0=l":::"$0}}1'

col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii:::2|zz|yy|xx|ww|vv|uu|tt|ss|rr

Your help is much appreciate.

You have already a lot of posts - you should be familiar with using code tags.

awk -F\| 'NF<10 {l=l? l OFS $0: l $0; next} l {print l} 1 ' OFS=":::" infile
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr

I think I saw the similar request somewhere else before.

awk '{printf /^col1/||/^[0-9]/?RS $0:":::" $0}' infile

Yes, rdcwayx, i have been posting the carriage-return-related topic as i yet resolve the issue :frowning:

awk '{printf /^col1/||/^[0-9]/?RS $0:":::" $0}' infile

I executed the above code, running fine with the given sample file.

As the actual source file is too lengthy, i avoid to post it here.

When i applied the code above, it doesnt work for my actual source file :frowning:

The actual source file has 16 columns in total, first column name is VALUE_ID and thus i modified the code as

/usr/xpg4/bin/awk '{printf /^VALUE_ID/||/^[0-15]/?RS $0:":::" $0}' KP_IU_KP_FT_KP_VALUE_new.txt 

it hits error:
/usr/xpg4/bin/awk: line 0 (NR=5): insufficient arguments to printf or sprintf

Thanks in advance for your help.

Did you try out my example?

Yea, zaxxon. I have tested with your code also.
It works with sample file, but not with my actual file.

Modified to be

/usr/xpg4/bin/awk -F\| 'NF<16 {l=l? l OFS $0: l $0; next} l {print l} 1 ' OFS=":::" KP_IU_KP_FT_KP_VALUE.txt

It chopped off all the records with carriage return, left header and one data row which has no carriage return.

Could you please provide a longer and more accurate snippet of your input please? Use code tags when doing so, thanks.

the actual input file,

VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence of
diabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|Testing
Remarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitus
aged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test Test
Test||
HPB_001_V_04|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |2010|R|11.3%|11.30|U|A||20101201000000|HSID|||

Ok, and I assume that every line but the 1st should be appended so that all lines start with "HPB"?

$> awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l $0; next}1' infile
VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence ofdiabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|TestingRemarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitusaged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test TestTest||

If you want the ":::" as delimeter for the pasted parts, use:

$> awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l ":::" $0; next}1' infile
VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence of:::diabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|Testing:::Remarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitus:::aged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test Test:::Test||

Thanks zaxxon for your reply, sad to say that it is not working...

How is it not working? Post any error messagees and/or describe how the output deviates from your expectations. Also, you posted a sample input file in post #8 but not the corresponding desired output. You're making it very difficult to provide effective assistance.

Regards,
Alister

It is not working as if the output swapped between the first chunk of the carriage return with second chunk of the carriage return.

input sample:

a|b|c|d
e|e|f|g|h

Output:

e|e|f|g|h
a|b|c|d

I have compensated the actual input file thus zaxxon provided the solution as

awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l $0; next}1' infile

or

awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l ":::" $0; next}1' infile