2 carriage return within a record

agathaeleanor · August 18, 2011, 12:25am

Hi all,

need your help in replacing carriage return in a record.

Input:

col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee
eee|ff|ggggg|hh
hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr

Output:

col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr

Currently i execute script to get rid of carriage return, but it concatenates records start from line2 into one whole record

grep -v '^$' Input.txt | nawk -F\| 'FNR==1{n=NF}NF<n{for (i=1;i<=n;i++) if (i<n) {l=$0;getline;$0=l":::"$0}}1'

col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii:::2|zz|yy|xx|ww|vv|uu|tt|ss|rr

Your help is much appreciate.

zaxxon · August 18, 2011, 1:14am

You have already a lot of posts - you should be familiar with using code tags.

awk -F\| 'NF<10 {l=l? l OFS $0: l $0; next} l {print l} 1 ' OFS=":::" infile
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr

rdcwayx · August 18, 2011, 1:15am

I think I saw the similar request somewhere else before.

awk '{printf /^col1/||/^[0-9]/?RS $0:":::" $0}' infile

agathaeleanor · August 18, 2011, 2:21am

Yes, rdcwayx, i have been posting the carriage-return-related topic as i yet resolve the issue

awk '{printf /^col1/||/^[0-9]/?RS $0:":::" $0}' infile

I executed the above code, running fine with the given sample file.

As the actual source file is too lengthy, i avoid to post it here.

When i applied the code above, it doesnt work for my actual source file

The actual source file has 16 columns in total, first column name is VALUE_ID and thus i modified the code as

/usr/xpg4/bin/awk '{printf /^VALUE_ID/||/^[0-15]/?RS $0:":::" $0}' KP_IU_KP_FT_KP_VALUE_new.txt

it hits error:
/usr/xpg4/bin/awk: line 0 (NR=5): insufficient arguments to printf or sprintf

Thanks in advance for your help.

zaxxon · August 18, 2011, 2:23am

Did you try out my example?

agathaeleanor · August 18, 2011, 2:41am

Yea, zaxxon. I have tested with your code also.
It works with sample file, but not with my actual file.

Modified to be

/usr/xpg4/bin/awk -F\| 'NF<16 {l=l? l OFS $0: l $0; next} l {print l} 1 ' OFS=":::" KP_IU_KP_FT_KP_VALUE.txt

It chopped off all the records with carriage return, left header and one data row which has no carriage return.

zaxxon · August 18, 2011, 3:38am

Could you please provide a longer and more accurate snippet of your input please? Use code tags when doing so, thanks.

agathaeleanor · August 18, 2011, 4:55am

the actual input file,

VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence of
diabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|Testing
Remarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitus
aged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test Test
Test||
HPB_001_V_04|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |2010|R|11.3%|11.30|U|A||20101201000000|HSID|||

zaxxon · August 18, 2011, 7:11am

Ok, and I assume that every line but the 1st should be appended so that all lines start with "HPB"?

zaxxon · August 18, 2011, 7:40am

$> awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l $0; next}1' infile
VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence ofdiabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|TestingRemarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitusaged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test TestTest||

If you want the ":::" as delimeter for the pasted parts, use:

$> awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l ":::" $0; next}1' infile
VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence of:::diabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|Testing:::Remarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitus:::aged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test Test:::Test||

agathaeleanor · August 19, 2011, 4:03am

Thanks zaxxon for your reply, sad to say that it is not working...

alister · August 19, 2011, 4:53am

How is it not working? Post any error messagees and/or describe how the output deviates from your expectations. Also, you posted a sample input file in post #8 but not the corresponding desired output. You're making it very difficult to provide effective assistance.

Regards,
Alister

agathaeleanor · August 19, 2011, 5:49am

It is not working as if the output swapped between the first chunk of the carriage return with second chunk of the carriage return.

input sample:

a|b|c|d
e|e|f|g|h

Output:

e|e|f|g|h
a|b|c|d

I have compensated the actual input file thus zaxxon provided the solution as

awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l $0; next}1' infile

or

awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l ":::" $0; next}1' infile