Building hierarchy with the list

Hi All,

Sorry for more question today. I am having a text file . Like below

704925680_TOTAL->MANUAL->TT IOR GSB
775116444_TOTAL->POO TO->TT
-572275295_TOTAL->MANUAL->MTO
-611408278_TOTAL->PRIE LEL
456690129_TOTAL->BTT TOO
475919266_TOTAL->MANUAL->COM
-172680236_TOTAL->BTT TOO->MTO
481202389_TOTAL->MANUAL->TT COM
655296953_TOTAL->POO TO->COLE
-434977163_TOTAL->POO TO->SPI
-766208436_TOTAL->MANUAL->TT 3W GSB
-1903630442_TOTAL->COLE
-541283452_TOTAL->MANUAL->CRT

These are the hierarchical data. But not on the hierarchy order. If you see all fall on total and after total the next data come in the file in the example above it is MANUAL . But the MANUAL data are not group and scattered. I am trying to bring these on the group. The data to be grouped always falls after -> . The example I have given is 3 level we may have 4 or level. The above files needs to be grouped as below

-766208436_TOTAL->MANUAL->TT 3W GSB
704925680_TOTAL->MANUAL->TT IOR GSB
-572275295_TOTAL->MANUAL->MTO
-541283452_TOTAL->MANUAL->CRT
481202389_TOTAL->MANUAL->TT COM
475919266_TOTAL->MANUAL->COM
-611408278_TOTAL->PRIE LEL
456690129_TOTAL->BTT TOO
-172680236_TOTAL->BTT TOO->MTO
775116444_TOTAL->POO TO->TT
655296953_TOTAL->POO TO->COLE
-434977163_TOTAL->POO TO->SPI
-1903630442_TOTAL->COLE

Is there any easy way to do in Perl or awk. I am thinking to do C code. Any help will be great.

Try

sort -t">" -k2 file
456690129_TOTAL->BTT TOO
-172680236_TOTAL->BTT TOO->MTO
-1903630442_TOTAL->COLE
475919266_TOTAL->MANUAL->COM
-541283452_TOTAL->MANUAL->CRT
-572275295_TOTAL->MANUAL->MTO
-766208436_TOTAL->MANUAL->TT 3W GSB
481202389_TOTAL->MANUAL->TT COM
704925680_TOTAL->MANUAL->TT IOR GSB
655296953_TOTAL->POO TO->COLE
-434977163_TOTAL->POO TO->SPI
775116444_TOTAL->POO TO->TT
-611408278_TOTAL->PRIE LEL

If this doesn't satisfy your needs, pls post why and how M(ANUAL) sorts before B(TT), and C(OLE) after P(OO).

1 Like

Thanks that worked. I have the ID's too with delimited by ~. Tried with cut and sort it is sorting only the description. I need the whole record grouped by tag ( Manual,POO TO etc )

Also one more question I have multiple level after > I can sort by each column and pipe it to next K2 right ?

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM

Expected output

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO

What do you mean by I can sort by each column and pipe it to next K2 ?[/FONT][/COLOR]

Please ignore the statement.

I was asking for below where it worked using sort -K4

TOT-COL-TOOT-SOF-KID
TOT-GEK-TOOT-SOF-MAN

Now the issue I am having is with the ID's as mentioned

Please be aware that a sort key by default stops at line end; to confine it to a single field, the stop position needs to be specified as well, e.g. -k4,4 . Also note that it's a lower case k .

1 Like

Thanks. You mean to say I can make the below worked with -k4,4

Original

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM

Expected output

Code:

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO

Output after execution. Not sure how can traverse through the ~ delimiter and sort the whole record

sort -t">" -k4,4  ttt
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO

With a ">" separator, there's no 4th field to serve as a sort key in above data. I'm sorry, I don't get what you're after.

sorry If I am not clear. Basically my input is like below with ID's. I want to sort 4th columns with the hierarchy sort as MANUAL,POO TO etc . I have added expected output below

Original

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM

Expected Output

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO

Applying copious amounts of phantasy, (or wildly guessing, whatever you prefer) let me try to paraphrase your request, which, btw, dramatically changed during the course of the thread:

A file of records consisting of fields separated by "~" needs to be rearranged by a sub field of the 4th field which in turn is separated by "->". I use the term "rearranged" as I can't dream up an algorithm to sort character B after P after M. The algorithm with which to rearrange is not defined.

1 Like

From a row delimited by ~ take the 4th field and from that delimited by > take the last record and group record based on the entire row . I don't need to sort I need to group basically it is ending in sort. As you mentioned I am not worried about where B comes. I need to group MANUAL and others If any it can be of any order.

The desired output can be in a form below.

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO

OR

179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
179341~1704509~1704480~456690129_TOTAL->BTT TOO
179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM

OR

179341~1704509~1704480~456690129_TOTAL->BTT TOO
179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL

"the last record" - even if I translate "record" to "field", "MANUAL" would NOT be the last.

"group record based on the entire row" - what does this mean? Is that bespoke subfield the key, or the entire row?

oops it is not last record it is second last . My bad.

When I mean entire row is the the record group needs to with ID as shown in example. I need to ID to be preserved with the grouping. I the example below the ID associated with the records needs be stick to it where the record moved based on grouping

179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB

Try

sed 's/[~>]/&#/g' file | sort -t"#" -k5,5 | sed 's/#//g'
179341~1704509~1704480~456690129_TOTAL->BTT TOO
179341~1704481~1704480~475919266_TOTAL->MANUAL->COM
179341~1704489~1704480~704925680_TOTAL->MANUAL->TT IOR GSB
179341~1704505~1704480~-572275295_TOTAL->MANUAL->MTO
179341~1704497~1704480~775116444_TOTAL->POO TO->TT
179341~1704507~1704480~-611408278_TOTAL->PRIE LEL
1 Like

Works with the example. Not sure why it is not doing on the below file

~  179341~1704503~1704480~1818623254_TOTAL ->MANUAL ->COL 
~ 179341~1704504~1704480 ~1883140245_TOTAL ->MANUAL ->PRIV
~ 179341~1704505~1704480 ~1978359756_TOTAL   
~ 179341~1704506~1704480 ~2058772066_TOTAL ->MANUAL ->ORAL-B PRO HEALTH GS B  
~ 179341~1704507~1704480 ~2138579186_TOTAL ->PPNNH
~ 179341~1704508~1704480 ~257669199_TOTAL ->POWER TOOTHBRUSH->PPNNH   
~ 179341~1704509~1704480 ~359869040_TOTAL ->RRRLOP
~ 179341~1704510~1704480 ~361375824_TOTAL ->POWER TOOTHBRUSH->AO BRANDS  
~ 179341~1704511~1704480 ~372808786_TOTAL ->MANUAL ->REACH 
~ 179341~1704512~1704480 ~456690129_TOTAL ->BATTERY TOOTHBRUSH   
~ 179341~1704513~1704480 ~475919266_TOTAL ->MANUAL ->COL KIDS GSB  
~ 179341~1704514~1704480 ~481202389_TOTAL ->MANUAL ->IRRR COMPLETE GSB   
~ 179341~1704515~1704480 ~655296953_TOTAL ->POWER TOOTHBRUSH->COL
~ 179341~1704516~1704480 ~704925680_TOTAL ->MANUAL ->IRRR INDICATOR GSB  
~ 179341~1704517~1704480 ~775116444_TOTAL ->POWER TOOTHBRUSH->IRRR 
~ 179341~1704518~1704480 ~939829720_TOTAL ->REFILL HEADS 
~ 179341~1704480~  0 ~TOTAL CCCC 
~ 179341~1704481~1704480 ~-1235133484_TOTAL ->MANUAL ->IRRR PULSAR GSB   
~ 179341~1704482~1704480 ~-1308801667_TOTAL ->MANUAL ->IRRR  
~ 179341~1704483~1704480 ~-1366207571_TOTAL ->POWER TOOTHBRUSH->PRIV
~ 179341~1704484~1704480 ~-1477892976_TOTAL ->POWER TOOTHBRUSH->IOPP SONICARE 
~ 179341~1704485~1704480 ~-1565345926_TOTAL ->MANUAL   
~ 179341~1704486~1704480 ~-1600570243_TOTAL ->MANUAL ->AO BRANDS   
~ 179341~1704487~1704480 ~-172680236_TOTAL ->BATTERY TOOTHBRUSH->PPNNH
~ 179341~1704488~1704480 ~-1838093819_TOTAL ->MANUAL ->FIREFLY 
~ 179341~1704489~1704480 ~-1903630442_TOTAL ->COL
~ 179341~1704490~1704480 ~-2042984783_TOTAL ->POWER TOOTHBRUSH   
~ 179341~1704491~1704480 ~-2137023982_TOTAL ->MANUAL ->P&G KIDS GSB
~ 179341~1704492~1704480 ~-258140860_TOTAL ->MANUAL ->COL  
~ 179341~1704493~1704480 ~-408734766_TOTAL ->REFILL HEADS->PPNNH  
~ 179341~1704494~1704480 ~-434977163_TOTAL ->POWER TOOTHBRUSH->ORPRR
~ 179341~1704495~1704480 ~-541283452_TOTAL ->MANUAL ->CREST
~ 179341~1704496~1704480 ~-572275295_TOTAL ->MANUAL ->PPNNH 
~ 179341~1704497~1704480 ~-611408278_TOTAL ->PRIV   
~ 179341~1704498~1704480 ~-766208436_TOTAL ->MANUAL ->IRRR 3DW GSB   
~ 179341~1704499~1704480 ~1321224044_TOTAL ->RECHARGEABLE TOOTHBRUSH 
~ 179341~1704500~1704480 ~1452856872_TOTAL ->RECHARGEABLE TOOTHBRUSH->PPNNH  
~ 179341~1704501~1704480 ~1684494481_TOTAL ->MANUAL ->COL MAX GSB  
~ 179341~1704502~1704480 ~1733685778_TOTAL ->MANUAL ->COL 360 GSB  

---------- Post updated at 07:06 PM ---------- Previous update was at 06:06 PM ----------

Again my mistake the ~ at start caused the issue. It worked..Thanks a million

For the future - PLEASE

  • provide representative samples - AND STICK TO THEM!
  • provide a detailed and specific description of your problem - DON'T reference ID not explaining what that is, explain what you need (sorting / grouping / whatever) EVEN IF IT TAKES A FEW MORE LINES OF TEXT.
  • Don't let helpers guess what you want.

I recognize people in here are struggling with the "language barrier" - I do so myself - but that shouldn't keep them from telling necessary details from gaping info holes from blabber.