Hi all,
need your help in replacing carriage return in a record.
Input:
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee
eee|ff|ggggg|hh
hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr
Output:
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr
Currently i execute script to get rid of carriage return, but it concatenates records start from line2 into one whole record
grep -v '^$' Input.txt | nawk -F\| 'FNR==1{n=NF}NF<n{for (i=1;i<=n;i++) if (i<n) {l=$0;getline;$0=l":::"$0}}1'
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii:::2|zz|yy|xx|ww|vv|uu|tt|ss|rr
Your help is much appreciate.
zaxxon
August 18, 2011, 1:14am
2
You have already a lot of posts - you should be familiar with using code tags.
awk -F\| 'NF<10 {l=l? l OFS $0: l $0; next} l {print l} 1 ' OFS=":::" infile
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10
1|aa|bb|cc|dd|eee:::eee|ff|ggggg|hh:::hhh|iii
2|zz|yy|xx|ww|vv|uu|tt|ss|rr
I think I saw the similar request somewhere else before.
awk '{printf /^col1/||/^[0-9]/?RS $0:":::" $0}' infile
Yes, rdcwayx, i have been posting the carriage-return-related topic as i yet resolve the issue
awk '{printf /^col1/||/^[0-9]/?RS $0:":::" $0}' infile
I executed the above code, running fine with the given sample file.
As the actual source file is too lengthy, i avoid to post it here.
When i applied the code above, it doesnt work for my actual source file
The actual source file has 16 columns in total, first column name is VALUE_ID and thus i modified the code as
/usr/xpg4/bin/awk '{printf /^VALUE_ID/||/^[0-15]/?RS $0:":::" $0}' KP_IU_KP_FT_KP_VALUE_new.txt
it hits error:
/usr/xpg4/bin/awk: line 0 (NR=5): insufficient arguments to printf or sprintf
Thanks in advance for your help.
zaxxon
August 18, 2011, 2:23am
5
Did you try out my example?
Yea, zaxxon. I have tested with your code also.
It works with sample file, but not with my actual file.
Modified to be
/usr/xpg4/bin/awk -F\| 'NF<16 {l=l? l OFS $0: l $0; next} l {print l} 1 ' OFS=":::" KP_IU_KP_FT_KP_VALUE.txt
It chopped off all the records with carriage return, left header and one data row which has no carriage return.
zaxxon
August 18, 2011, 3:38am
7
Could you please provide a longer and more accurate snippet of your input please? Use code tags when doing so, thanks.
the actual input file,
VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence of
diabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|Testing
Remarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitus
aged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test Test
Test||
HPB_001_V_04|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |2010|R|11.3%|11.30|U|A||20101201000000|HSID|||
zaxxon
August 18, 2011, 7:11am
9
Ok, and I assume that every line but the 1st should be appended so that all lines start with "HPB"?
zaxxon
August 18, 2011, 7:40am
10
$> awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l $0; next}1' infile
VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence ofdiabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|TestingRemarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitusaged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test TestTest||
If you want the ":::" as delimeter for the pasted parts, use:
$> awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l ":::" $0; next}1' infile
VALUE_ID|HPB_DIV_DEPT|KPI_ID|KPI_DESC|YEAR|VERSION|KPI_VALUE|KPI_VALUE_NUM|KPI_PERFORM|VALUE_STATUS|INACTIVE_DATE|UPDATED_DATE|UPDATED_BY|REMARKS|CREATED_DATE|MODIFIED_DATE
HPB_001_V_01|HPB|HPB_001|Prevalence of:::diabetes mellitus aged 18 - 69 years |1998|B|9%|9.00||A||20101201000000|HSID|||
HPB_001_V_02|HPB|HPB_001|Prevalence of diabetes mellitus aged 18 - 69 years |1998|R|9%|9.00||A||20101201000000|HSID|Testing:::Remarks||
HPB_001_V_03|HPB|HPB_001|Prevalence of diabetes mellitus:::aged 18 - 69 years |2004|R|8.2%|8.20||A||20101201000000|HSID|Test Test:::Test||
Thanks zaxxon for your reply, sad to say that it is not working...
How is it not working? Post any error messagees and/or describe how the output deviates from your expectations. Also, you posted a sample input file in post #8 but not the corresponding desired output. You're making it very difficult to provide effective assistance.
Regards,
Alister
It is not working as if the output swapped between the first chunk of the carriage return with second chunk of the carriage return.
input sample:
a|b|c|d
e|e|f|g|h
Output:
e|e|f|g|h
a|b|c|d
I have compensated the actual input file thus zaxxon provided the solution as
awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l $0; next}1' infile
or
awk -F\| 'NR>1 && /^HPB/ {if(l){print l};l=$0; next} l {l=l ":::" $0; next}1' infile