help with awk delimited by |~| in a record

knijjar · April 6, 2008, 3:24am

I have a file which contains 1 million records of the following format. Each field is delimited by a pipe tilda pipe "|~|" show below. I would like to print only one column ie the CARDDESC value.

for example here it says CARDDESC=A8T1.
so anything after CARDDESC= and before |~|CARDTYPE is what I would like to get from this record and ignore everything else.

In other words the value between CARDDESC= and |~|CARDTYPE which in the following example is A8T1 is what I need to know.

Any help is really appreciated. if my question is not clear please let me know I will try to phrase it in a better way.

Each filed is delimited by |~| that is for sure through out the file. so even if I can get CARDDESC=A8T1 I am happy.

RECORD=NEW|~|NAME=SCO|~|MODEL=8220|~|WORK=ATM|~|SUBNET=ATM|~|SITE=RKS|~|REGION=NORTH A|~|COUNTRY=US|~
|SWITCH=FAX2|~|ETHERNET=1.4.1.1|~|LOOPB=N/A|~|SHELF=N/A|~|SLOT=12|~|SUBSLOT=N/A|~|STSCHAN=N/A|~|PORT=8|~|DS1SLOT=0|~|LINE=1|~|LPORTID=0|~|C
ARDDESC=A8T1|~|CARDTYPE=0|~|ENCAPSULATION=N/A|~|BUNDLEID=0|~|PORTUSE=N/A|~|STATUS=T|~|MANAGED=YES|~|BILLINGID=N/A|~|CKTID=W86|~|CKTIP=1.0.0.1
7|~|PORTSERVICE=T1|~|SPEED=6|~|CHNLS=N/A|~|NX_Y=83|

era · April 6, 2008, 3:33am

Whoever devised that particular file format should be taken out and shot.

Is the field always in the same column?

 awk -F '\|~\|' '{ print $20 }'

matrixmadhan · April 6, 2008, 3:44am

awk -F"CARDDESC=" '{ split($2, arr, "|"); print arr[1] }' t1

the same could be done with multiple field delimiters

matrixmadhan · April 6, 2008, 3:45am

I believe this will not give what the OP had asked for

knijjar · April 6, 2008, 3:46am

I wouldnt agree with you more about the format. I hate it when people do that. and yes the filed is always going to be in the same column

which awk
/usr/bin/awk
awk -F '\|~\|' '{ print $20 }' RECORD
awk: syntax error near line 1
awk: bailing out near line 1

/usr/local/bin/awk -F '\|~\|' '{ print $20 }' RECORD
|CARDDESC=A8T1|

/usr/local/bin/awk -F "|~|" '{print $20}' CIRCUITRECORD
|CARDDESC=A8T1|

same result

The output contains a "pipe" symbol which is ok but can that be taken out from the output. I am happy with the results but just wondering.

Thanks ERA in advance for your help

matrixmadhan · April 6, 2008, 3:47am

Oh! Really

Then I recommend you to take a look at the library formats ( bibliography formats ) am sure you would come back and say much better

era · April 6, 2008, 3:51am

Works for me on Linux (I guess gawk, or actually mawk). matrixmadhan's solution is closer to the real McCoy so use that if it works for you.

Or even, how about this.

sed -n 's/.*|~|CARDDESC=//; T; s/|~|.*//p'

matrixmadhan: agreed, I've seen much worse than this. I can also swear a lot louder than you have seen so far. (My colleagues can tell when I'm forced to use Windows.)

knijjar · April 6, 2008, 4:13am

Thanks ERA and MATRIXMADHAN.

/usr/local/bin/awk -F"CARDDESC=" '{ split($2, arr, "|"); print arr[1] }' t1
A8T1

now would you be kind enough try to explain what it really did.

I have gone with what matrixmadhan recommended

Peace !!!

ghostdog74 · April 6, 2008, 6:33am

sometimes, delimiters need to be unique so that it won't get mixed up with actual data in the fields. therefore, you shouldn't jump to conclusions like this

ghostdog74 · April 6, 2008, 6:40am

just another way

awk -F'[|~=]' '
{
 for ( i =1; i<=NF;i++ ) {
   if ( $i ~ /CARDDESC/ ) {
    print $(i+1)
   }
 }
}' file

Franklin52 · April 6, 2008, 8:09am

Another possibility with sed:

sed 's/.*ARDDESC=\([^|]*\).*/\1/'

Regards

matrixmadhan · April 6, 2008, 1:03pm

here the field delimiter is CARDDESC=
and within that split the second field ($2) with delimiter as "|" - result is populated in array arr.

Now the result is available at first field of the array