How to target certain delimiter to split text file?

huiyee1 · June 24, 2015, 2:43am

Hi, all.

I have an input file. I would like to generate 3 types of output files.

Input:

LG10_PM_map_19_LEnd_1000560
LG10_PM_map_6-1_27101856
LG10_PM_map_71_REnd_20597718
LG12_PM_map_5_chr_118419232
LG13_PM_map_121_24341052
LG14_PM_1a_456799
LG1_MM_scf_5a_opt_abc_9029993

Output_file_1 (replace the last occurrence of delimiter with tab):

LG10_PM_map_19_LEnd	1000560
LG10_PM_map_6-1	27101856
LG10_PM_map_71_REnd	20597718
LG12_PM_map_5_chr	118419232
LG13_PM_map_121	24341052
LG14_PM_1a	456799
LG1_MM_scf_5a_opt_abc	9029993

Output_file_2 (replace the first occurrence of delimiter with tab):

LG10	PM_map_19_LEnd_1000560
LG10	PM_map_6-1_27101856
LG10	PM_map_71_REnd_20597718
LG12	PM_map_5_chr_118419232
LG13	PM_map_121_24341052
LG14	PM_1a_456799
LG1	MM_scf_5a_opt_abc_9029993

Output_file_3 (replace the second occurrence of delimiter with tab):

LG10_PM	map_19_LEnd_1000560
LG10_PM	map_6-1_27101856
LG10_PM	map_71_REnd_20597718
LG12_PM	map_5_chr_118419232
LG13_PM	map_121_24341052
LG14_PM	1a_456799
LG1_MM	scf_5a_opt_abc_9029993

Thanks in advance.

Don_Cragun · June 24, 2015, 4:02am

Is this a homework assignment?

bakunin · June 24, 2015, 4:32am

And, if i may be so bold to add, what have you tried so far?

bakunin

huiyee1 · June 24, 2015, 5:48am

I have tried a few codes. But these codes involved separate commands

To generate the first output file:

cat input | rev | cut -d"_" -f1 | rev > last_field  #this generates file containing the last field
cat input | rev | cut -d"_" -f2- | rev > without_last_field #this generates file containing all fields except the last one
paste -d"\t" without_last_field last_field > output_1

To generate the second output file:

cat input | cut -d"_" -f1 > first_field  #this generates file containing the first field
cat input | cut -d"_" -f2-> without_first_field #this generates file containing all fields except the first one
paste -d"\t" first_field without_first_field > output_2

To generate the third output file:

 cat input | cut -d"_" -f1,2 > first_second_field  #this generates file containing the first and second field
cat input | cut -d"_" -f2-> without_first_second_field #this generates file containing all fields except the first and second field
paste -d"\t" first_second_field without_first_second_field > output_3

Is there any improved one-liner commands to generate the above output files?

Thanks.

---------- Post updated at 04:48 AM ---------- Previous update was at 04:46 AM ----------

This is not an assignment. I am learning linux by myself. I thought that I might face the similar situation in the future. I have come out with a few solutions. But they are rather complicated.

bakunin · June 24, 2015, 9:37am

This is OK. We want to help people help themselves. This is why we ask for what they have done - even if didn't work - to show them where they have gone wrong.

Further, we have a special forum for "Homework and Coursework" because we do help students alike. The difference is that special rules apply there and we (try to) help in a different way so that the stdent takes the most education out of our help. This was the background of Don Craguns and my questions.

Notice that you do not need "cat" to generate a stream usually. If you look at the man page of "rev" you will notice (this is taken from an AIX man page, yours might look slightly different):

rev Command

Purpose

       Reverses characters in each line of a file.

Syntax

       rev [ File ... ]

This means the following two lines do the same, but the second one uses one command ("cat") less, which is why it is preferable:

cat /path/to/file | rev
rev /path/to/file

When you look up "useless use of cat" on the internet you will find many more examples for the same error, because it is a very common one, which made it part of the "UNIX culture".

As a matter of fact there are: you might want to learn a bit of sed (see "man sed" for help) and look around here in the forum. Here a link to some introductory article:

Regular expression introduction

sed ("stream editor") is a non-interactive text editor or, looking at it differently, a programmable text manipulation program. The most basic procedure for this is to look out for some pattern in a text and then manipulate it (delete or add parts, etc.).

Here is a simple sed program:

sed 's/abc/def/' /path/to/input > /path/to/output

It takes a file "/path/to/input", executes the program "s/abc/def/" on it and writes the result to file "/path/to/output". The program itself does a "substitution" ("s") of a fixed string "abc" by a fixed string "def". This replacement is done in every line once - for the first occurrence of "abc". It is possible to replace every occurrence instead by adding a "g" (global) to the end of the command:

sed 's/abc/def/g' /path/to/input > /path/to/output

It should be easy to see how you could do the text manipulation you have in mind with such a substitution, given that you craft the search- and substitution patterns correctly. Since your intention is to learn UNIX i won't tell you outright what the solution is. You might want to try yourself. If you have further questions feel free to ask.

I hope this helps.

bakunin

RudiC · June 24, 2015, 10:14am

On top of what bakunin said, you could use shell's parameter expansion (e.g. "remove matching pattern") to achieve the goals.
And, yes, there is a sed one liner to produce all three output files (at least with GNU sed).