help with data formatting

Hi,

I have data coming in like below. Not all data is like that, these are the problem records that is causing the ETL load to fail. Can you pls help me with combining theese broken records!

001800018000000guyMMAAY~acct name~acct type~~"address part 1
address part2"~city~STATE~ZIP~COUNTRY~(123) 123-1234~~~~~~~~~~~~~~~~~^M
0018000000gwQ63AAE~acct name~acct type~~"address part 1
address part2"~city~state~zip~country~(123) 123-1234~(123) 123-1234~~~~~~~~~~~~~~~~^M

Appreciate your time and effort!

Thanks,

What do you mean by broken records? Can you please clarify the problem by giving the desired output?

The desired out put is

001800018000000guyMMAAY~acct name~acct type~~"address part 1 address part2"~city~STATE~ZIP~COUNTRY~(123) 123-1234~~~~~~~~~~~~~~~~~^M
0018000000gwQ63AAE~acct name~acct type~~"address part 1 address part2"~city~state~zip~country~(123) 123-1234~(123) 123-1234~~~~~~~~~~~~~~~~^M

Assuming that the the area-code (xxx) is always after the 'break point', and is always present, this might work:

awk '
    /\([0-9][0-9][0-9]\)/ {
        if( buffer )
            printf( "%s%s\n", buffer, $0 );
        else
            print;
        buffer = "";
        next;
    }
    { buffer = $0; }
' input-file >fixed-file

Hi,

Appreciate the quick response.

It kind of worked on the problem records with the exception that it has combined multiple non problem records into one.

like 459484~~~~~~~~~~~~~~^M001C000000wO3Y8IAK~

Thanks,
varman

The problem is that my script assumed there'd be an area code in each complete record.

Assuming that the tilda characters are field seperators, is there a fixed number of fields per good record? That'd be the best way to determine if a record is broken.

The number of fields are 12.

Thanks,
Varman

Hi,

Cna someone help me getout of this issue, pls.

Thanks,
Varman

This might get you going:

awk -F '~' '
    NF < 12 {
            if( !buffer )
                buffer=$0;
            else
            {
                printf( "%s%s\n", buffer, $0 );
                buffer = "";
            }
            next;
     }
    { print; }
'

I do note that your example broken records have more than 12 fields.