How to Split File based on String?

hi ,

The scenario is like this,

i have a large text files (max 5MB , about 5000 file per day ),
Inside almost each line of this file there is a tag 3100.2.22.1 (represent Call_Type) , i need to generate many filess , each one with distinct (3100.2.22.1 Call_Type ) , and one more file to collect all lines without (3100.2.22.1 Call_Type)

the question is how can i split that file by using bash/sed/awk.

sample file hd_auto_22700123_0021 content (there are alot of Call_Type ) ;
Code:

! HISTORICAL DATA ! ONE FILE DECODING REPORT ! SERVICE : ce20 ! FILE : /osp/spm/svc/ !
! TICKET NBR : 1 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665004551 ! 3100.2.22.8 Browsing !
! TICKET NBR : 2 ! GSI : 102 ! 3100.2.137.4 665017728 !3100.2.22.2 7 ! 3100.2.70.8 1050 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 3 ! GSI : 102 ! 3100.2.137.4 665017728 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665017728 ! 3100.2.22.2 7 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 4 ! GSI : 102 ! 3100.2.137.4 665002105 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665002105 ! 3100.2.22.1 410 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 5 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665009058 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 6 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665012633 ! 3100.2.97.1 192.168.0.12 ! 3100.2.18.1 0 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 7 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665019277 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !   
! TICKET NBR : 8 ! GSI : 102 ! 3100.2.112.1 15/08/2013 10:42:43 ! 3100.2.22.8 Free_Traffic ! 3100.2.97.1 192.168.0.12  ! 3100.2.22.11 2 !
.
.
.
! RESULT = successfull 1657 tickets treated !

the result of split should look likes below ,

hd_auto_22700123_0021_without_tag  (without 3100.2.22.1 tag)
! TICKET NBR : 1 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665004551 ! 3100.2.22.8 Browsing !
! TICKET NBR : 3 ! GSI : 102 ! 3100.2.137.4 665017728 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665017728 ! 3100.2.22.2 7 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 8 ! GSI : 102 ! 3100.2.112.1 15/08/2013 10:42:43 ! 3100.2.22.8 Free_Traffic ! 3100.2.97.1 192.168.0.12  ! 3100.2.22.11 2 !
! RESULT = successfull 3 tickets treated !
hd_auto_22700123_0021_189 (with tag 3100.2.22.1 189)
! TICKET NBR : 2 ! GSI : 102 ! 3100.2.137.4 665017728 !3100.2.22.2 7 ! 3100.2.70.8 1050 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 6 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665012633 ! 3100.2.97.1 192.168.0.12 ! 3100.2.18.1 0 ! 3100.2.22.1 189 ! 3100.2.70.11 016c6f63000431333000 !
! RESULT = successfull 2 tickets treated !
hd_auto_22700123_0021_410 (with tag 3100.2.22.1 410)
! TICKET NBR : 4 ! GSI : 102 ! 3100.2.137.4 665002105 ! 3100.2.97.1 192.168.0.12 ! 3100.2.19.2 665002105 ! 3100.2.22.1 410 ! 3100.2.70.11 016c6f63000431333000 !
! RESULT = successfull 1 tickets treated !
hd_auto_22700123_0021_164 (with tag 3100.2.22.1 164)
! TICKET NBR : 5 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665009058 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !
! TICKET NBR : 7 ! GSI : 102 ! 3100.2.22.3 0 ! 3100.2.137.4 665019277 ! 3100.2.97.1 192.168.0.12 ! 3100.2.22.1 164 ! 3100.2.70.11 016c6f63000431333000 !
! RESULT = successfull 2 tickets treated !

Try

awk -F! 'match ($0, "3100.2.22.1[^!]*") {print >FILENAME " " substr ($0, RSTART, RLENGTH); next}
                                        {print >FILENAME " without_tag"}
        ' hd_auto_*
1 Like

Wow , thank you
can you please explain how this script doing this magic

It tries to match the entire record to your 3100... plus call type represented by a regex. If found, RSTART and RLENGTH (see man awk) are sufficient to locate the whole string and extract it for use as a filename, to which the entire record then is printed. If no match, print to "without" file.
I see now that the -F! is not needed at all...