Grab from file with sed

ldiaz2106 · January 18, 2013, 7:18am

Hello All

I have a file with this type of records:

=LDR  01157nas a22003011a 4500
=001  vtls000000013
=003  VRT
=005  20111020150800.0
=008  100128c19699999sp\a|||||\||||0\\\||spa|
=037  \\$a1327$i090$j090$k03
=039  \9$a201110201508$bstaff$c201001280942$dstaff$c200910281236$dstaff$c200906301240$dstaff$y2000092208370000$zload
=040  \\$aES-Ba-GIE$bcat$cEs-Ba-GIE
=025  \\$a14
=590  \\$aEn Curs
=027  \\$aAnuari

What I would like to do is just get the records where the field =590 is eq to \\$aEn Curs
as in the exemple, and from this condition get an other filed and put it into a csv file..for exemple I need the =001 field and the =025

=001 vtls000000013

So the csv should be:

vtls000000013;14

Is it possible to do with sed? I try the first filter with this, but I have errors:

sed -i '/590$a En curs/ p/' bib.txt

Someone could help?? Cheers

Yoda · January 18, 2013, 11:16am

Here is a solution using awk:

awk '/^=590/{ L=$0; gsub(/=590  /,"",L); print L; }' filename
\\$aEn Curs

Modify as per your requirement.

ldiaz2106 · January 18, 2013, 11:26am

hello
Thanks for you reply.
Look the error I have, I'm trying your script with WINDOWS (I know).
Later I'll test on my linux:

C:\>awk '/^=590/{ L=$0; gsub(/=590  /,"",L); print L; }' bib.txt
awk: '/=590/{
awk: ^ invalid char ''' in expression

And I have a question, I do not see in the code the part where I must put the 2 field to join... In my exemple it was =025
Thanks

Scrutinizer · January 18, 2013, 11:28am

sed? Try:

sed -n '/^=001/{s/.* //;h;}; /^=025/{s/.*\$//; H;}; /^=590.*\$aEn Curs/{g;s/\n/;/g;p;}' infile

awk:

awk '/^=001/{p=$2} /^=025/{sub(/.*\$a/,";",$2); p=p $2} /^=590.*\$aEn Curs/{print p}' infile

I believe awk on windows works with double quotes instead of single ones, but I am not sure, you'd have to search the forums..

ldiaz2106 · January 18, 2013, 11:38am

Whassshh yes it works.
Many thanks,,, But how does it work?
I see 3 parts in you script.
the 2 first are the field I want to catch and join...ok and the last one is the filter condition. But I'm not abble to see more

But .. it's ok I can use this script ..If I want to catch other field I just have to change the code I guess, I'll try

Thans and have a nice week end

---------- Post updated at 11:38 AM ---------- Previous update was at 11:35 AM ----------

Hi,
I change single quote with " and the script works:

awk "/^=590/{ L=$0; gsub(/=590  /,\"\",L); print L; }" bib.txt

I have the fine script now with sed command !!
Thanks a lot

Scrutinizer · January 18, 2013, 11:49am

Regarding the sed script: The first is put in the hold buffer ( h ) and the second is appended to the hold buffer ( H ). It the 590 field is correct the hold buffer is retrieved ( g ), the linefeed ( \n ) is replaced by a semicolon and the result is printed..

I hope this helps..

ldiaz2106 · January 19, 2013, 5:20am

Hello
yes it helps...Thanks
But now I have other dubt, imagine I would like to add one or more fields, for exemple this ones:

=003  VRT
=005  20111020150800.0
=008  100128c19699999sp\a|||||\||||0\\\||spa|

So the =003, =005 and =008
Can I just adding this into the sed command?
Like this with =003 ?

sed -n '/^=001/{s/.* //;h;}; /^=003/{s/.*\$//; /^=025/{s/.*\$//; H;}; /^=590.*\$aEn Curs/{g;s/\n/;/g;p;}' bib.mrk

Thanks

Scrutinizer · January 19, 2013, 5:57am

Try:

sed -n '/^=001/{s/.* //;h;}; /^=003/{s/.* //;H;}; /^=025/{s/.*\$//;H;}; /^=590.*\$aEn Curs/{g;s/\n/;/g;p;}' infile

ldiaz2106 · January 19, 2013, 9:34am

Ok Great !!!
Thanks a lot, now I see the sctructure better.
I'll look for some tutorial in youtube in order to get it better.

Have a nive week end !!

PD: If you need some help with oracle database, just ask !
Cheers

---------- Post updated at 09:34 AM ---------- Previous update was at 06:00 AM ----------

Hello

looking the result more carefully I see that the field 310 is never present in the output file. Look the exemple:

Script executed:

sed -n '/^=001/{s/.* //;h;}; /^=310/{s/.* //;H;}; /^=037/{s/.* //;H;}; /^=050/{s/.* //;H;}; /^=099/{s/.*\$//; H;}; /^=590.*\$aEn Curs/{g;s/\n/;/g;p;}' bib.mrk >seriadas.csv

result:

vtls000000101;\\$a1570$i090$j090$k03;\\$a1570$i090$j090$k03;\\$a951$i090$j090$k03

source file:

=001  vtls000000101
=037  \\$a1570$i090$j090$k03
=037  \\$a1570$i090$j090$k03
=037  \\$a951$i090$j090$k03
=590  \\$aEn Curs
=027  \\$aAnuari
=050  \\$aA-0189
=099  \\$aSALA-17.01(I)
=310  \\$aAnual

As I see in the result is that only the field 037 have been passed, the other not...
The result have to be like:

vtls000000101;\\$aAnual;\\$a1570$i090$j090$k03-\\$a1570$i090$j090$k03-\\$a951$i090$j090$k03;\\$aA-0189;\\$aSALA-17.01(I)

Maybe one little detail to adapt no?
Thanks !

Scrutinizer · January 19, 2013, 10:43am

That has to do with the changed order of sample fields. This gets a bit too complicated for sed, perhaps you could try if awk might be a better choice:

awk '
  {
    i=$1
    sub(i " *",x)
    A=A(A?"-":x) $0
  } 
  i=="=310"{
    if (A["=590"]~/\$aEn Curs/) print A["=001"],A["=310"],A["=037"],A["=050"],A["=099"]
    for(i in A) delete A
  }
' OFS=\; file

ldiaz2106 · January 19, 2013, 11:35am

Ok,
I try and this is the result:

good

vtls000000013;\\$aAnual;\\$a1327$i090$j090$k03;\\$aG-0644;\\$aSALA-14.04(E)
vtls000000017;\\$aAnual;\\$a1465$i090$j090$k03-\\$a46$i090$j090$k03;\\$aG-0022;\\$aSALA-11.00(E)
vtls000000021;\\$aAnual;\\$a1196$i090$j090$k03-\\$a1196$i090$j090$k03;\\$aG-0541;\\$aDIP�SIT
vtls000000028;\\$aAnual;\\$a1156$i090$j090$k03;\\$aG-0949;\\$aDIP�SIT

and later you have this
not good

vtls000000167-vtls000000169-vtls000000171-vtls000000174-vtls000026748;\\$aAnual;\\$a1898$i090$j090$k03;\\$aB-0258-\\$aG-0781-\\$aGI-0144-\\$aGI-0160-\\$aA-0791;\\$aDIP�SIT-\\$aDIP�SIT-\\$aDIP�SIT-\\$aDIP�SIT-\\$aDIP�SIT
vtls000000176-vtls000000179-vtls000021436-vtls000000202;\\$aAnual;\\$a832$i090$j090$k03-\\$a832$i090$j090$k03-\\$a832$i090$j090$k03;\\$aGI-0180-\\$aGI-0246-\\$aG-0180;\\$aDIP�SIT-\\$aDIP�SIT-\\$aSALA-17.01(E)

and then the file become good again.
Strange...
Maybe it's because the source file have some strange caracters.

THanks if you can take a look
I post the complete source file comressed if you want

Scrutinizer · January 19, 2013, 11:58am

Your file does not appear to be standard UTF-8. There is a byte order mark and some extra strange characters for example in some of the =099 lines.

ldiaz2106 · January 20, 2013, 5:02am

mmmm
so with notepad++ I open the file and "conversion=>utf8" will be enougth?
I try !

---------- Post updated at 12:22 PM ---------- Previous update was at 12:16 PM ----------

I try this way

I remove from you awk script the last ,A["=099"] in order not to evaluate this
But stell same result

vtls000000094;\\$aAnual;\\$a951$i090$j090$k03;\\$aA-0245
vtls000000167-vtls000000169-vtls000000171-vtls000000174-vtls000026748;\\$aAnual;\\$a1898$i090$j090$k03;\\$aB-0258-\\$aG-0781-\\$aGI-0144-\\$aGI-0160-\\$aA-0791

The test is valid?

---------- Post updated 01-20-13 at 05:02 AM ---------- Previous update was 01-19-13 at 12:22 PM ----------

Hello

ok I found the problem, but I don't know how to fix it with the awk script.
The problem happend when one of the field in the list do NOT exists into the bib.mrk file:
Exemple =310
I can see that adding =310 manually to the bib.mrk file, this record is created properly into the csv output file.
Can you modify the awk script in order to get this probability ?
Some fields in the list can NOT be present.

Thanks

Scrutinizer · January 20, 2013, 5:23am

Some thing like this?

awk '
  function pr(){                                                                         # define the print array elements function
    if (A["=590"]~/\$aEn Curs/) print A["=001"],A["=310"],A["=037"],A["=050"],A["=099"]  # Print the array elements
    for(i in A) delete A                                                              # Delete the array elements
  }
  {
    i=$1                                                                                 # i becomes the index in field $1
    sub(i " *",x)                                                                        # delete the index and spaces following it from the line
    A=A(A?"-":x) $0                                                             # add  the line to the array element with index "i" and insert a "-" when there is already an entry present
  } 
  !NF{                                                                                   # if there is an empty line then
    pr()                                                                                 # print array elements
  }
  END{                                                                                   # if there are no more records
    pr()                                                                                 # print array elements  
  }
' OFS=\; bib.mrk                                                                         # set the Output Field Separator to ";"

ldiaz2106 · January 20, 2013, 6:31am

Yes,
this time is 100% ok
Whaaa Thanks again for your help
feel free to ask for all you want around Oracle db
Cheers