Text manipulation with sed - Advanced technic

Hello everybody,

I have the following input file:

START ANALYSIS 1
DATA LINE
DATA LINE
DATA LINE
DATA LINE
  Libray   /home/me/myLibrary
  Source   library_name_AAAAA
DATA LINE
DATA LINE
DATA LINE
  BEGIN SOURCE ANALYSIS
  Function A
  Function B
  Function C
  Function D
  END 4 Functions Founded
DATA LINE
DATA LINE
START ANALYSIS 2
DATA LINE
DATA LINE
DATA LINE
DATA LINE
  Libray   /home/me/myLibrary
  Source   library_name_BBBBB
DATA LINE
DATA LINE
DATA LINE
  BEGIN SOURCE ANALYSIS
  Function E
  Function F
  Function G
  Function H
  END 4 Functions Founded
DATA LINE
DATA LINE

This is the output file expected :

/home/me/myLibrary library_name_AAAAA Function A
/home/me/myLibrary library_name_AAAAA Function B
/home/me/myLibrary library_name_AAAAA Function C
/home/me/myLibrary library_name_AAAAA Function D
/home/me/myLibrary library_name_BBBBB Function E
/home/me/myLibrary library_name_BBBBB Function F
/home/me/myLibrary library_name_BBBBB Function G
/home/me/myLibrary library_name_BBBBB Function H

Subsidiary: Is there a way to add 2 counters ?

/home/me/myLibrary library_name_AAAAA Function A   001  00001
/home/me/myLibrary library_name_AAAAA Function B   002  00002
/home/me/myLibrary library_name_AAAAA Function C   003  00003
/home/me/myLibrary library_name_AAAAA Function D   004  00004
/home/me/myLibrary library_name_BBBBB Function E   001  00005
/home/me/myLibrary library_name_BBBBB Function F   002  00006
/home/me/myLibrary library_name_BBBBB Function G   003  00007
/home/me/myLibrary library_name_BBBBB Function H   004  00008

I am not looking for an algorithm to develop a program (python, java or other, although it is a good idea).

I am looking for a solution (bash script) based on the use of the sed, sort, uniq, awk, etc commands

With sed I know:

  • select lines between two marker patterns
  • capture group with '(' and ')'
  • I heard about h, H, g, G commands but I'm never used

But I have no idea how to mix all.

Can you please help me ?

Your sample data have DOS line terminators (<CR>, 0x0D, \r ^M) that need to be eliminated first. How about

awk '
                        {sub (/\r/, _)
                        }
/^ *Libray/             {LIB = $2
                        }
/^ *Source/             {SRC = $2
                        }
/^ *END/                {SA = 0
                        }
SA                      {CNT1++
                         CNT2++
                         printf "%s %s %s %03d %04d\n",  LIB, SRC, $0, CNT1, CNT2
                        }
/BEGIN SOURCE ANALYSIS/ {SA   = 1
                         CNT1 = 0
                        }
' file
1 Like

RudiC

It's really a very good solution.
That's exactly what I needed.
Thank you very much for this answer and for your responsiveness

Is it really Libray ?
Perhaps one should prepare for a bug fix with /^ *Librar?y/ :smiley:

Well, I went for Library first, and changed only when failed...