A command to split a file into two based on a string

Hello

What command can i use to split a tab delimited txt file into two files base on the occurrence of a string

my file name is EDIT.txt

The content of file is below

XX 1234 PROCEDURES
XY 1634 PROCEDURES
XM 1245 CODES
XZ 1256 CODES

It has more than a million record
If there is PROCEDURES in a row i want to output to PROCEDURES.txt file else CODES.txt file

how would i use a AWK or SPLIT command?

Thanks for your help

Using awk:

awk '/PROCEDURES/ { print > "PROCEDURES.TXT"; next} { print > "CODES.TXT" }' infile
1 Like

Hi madrazzii,
Check this out:

awk '{if($0~/PROCEDURES/) print >"PROCEDURES.TXT" ; if ($0~/CODES/) print >"CODES.txt" }' EDIT.txt
1 Like

Thank you both.

I was using to separate commands to do this

sed '/procedures/d' Edit.TXT > codes.txt

grep "procedures" > Procedures.txt

But will use the awl and now

Thanks

J

To stick with the two command approach, you would be better with:-

grep "procedures" > procedures.txt
grep -v "procedures" > codes.txt

Robin

1 Like

another question i had...i have a file without a file extension but can be opened in notepad. The file is 300 mb in size. it basically has multiple data sets in one file and i want to extract each of the data sets into a txt file. The remarks to identify each data set in the file are T0, P1, P2, P3, P4, P5, P6 and T9 that appear at the END of each record in the file and i want this file to be split into 8 different files where file 1 will have only records with T0 and file 2 with P1 and so on. There might be T0, P1 etc remarks in the middle of the line but the criteria to extract should be that these remarks are at the end of the row/line/record in the source file. source file name is RAW. Is there GREP command or any other command where I could use a IF then ELSE or a CASE statement

awk '{print > $3".txt"}' EDIT.txt
awk '{print > $NF".txt"}' RAW
1 Like

I'f you know you only have a few different record types you could try something like this:

awk '{ print > "file." substr($0,length-1) }' RAW

otherwise try

awk '
/T0$/ { print > "file1" ; next }
/P1$/ { print > "file2" ; next }
/P2$/ { print > "file3" ; next }
/P3$/ { print > "file4" ; next }
{ print > "file.UNKNOWN" }' RAW
1 Like

Thank you. i did the first method and changed a little

awk '{print > "file.txt" susbtr($0,length-2) }' RAW

i did length -2 to get the TO, P1 etc appended to output files and also added ..txt hoping i would get fileT0.txt, fileP1.txt but i get file.txtT0, file.txtP1.

is there a way to have it saved as .txt extension when it outputs?

Thanks

awk '{print > "file"$NF".txt"}' RAW
1 Like

Works perfect. thanks again

---------- Post updated at 04:54 PM ---------- Previous update was at 04:43 PM ----------

sorry to bother again but when i use ls command to list it the file is displayed as fileT0?.txt. it opens in my ubuntu machine but when i try to copy it into windows box, it doesn't open because the file is displayed as fileT0 .txt.(there is a space). i am not able to rename it nor copy it.

any help?

I suspect your RAW file to have \r\n at the end of each line (typically all txt files created from MS Windows notepad). You have to remove \r from the original file. Try (not tested)

tr '\r\n' '\n' RAW > RAW.1

and retry the awk script with RAW1

The file was received from a client and they said it was from z/OS system. i tried the code but it says 'tr - extra operand' and the output file is 0 byte

---------- Post updated at 05:21 PM ---------- Previous update was at 05:15 PM ----------

i opened the raw file in notepad++ and has the [CR][LF] at the end of each line

like

..........................T0[CR][LF]
..........................P1[CR][LF]
..........................P2[CR][LF]
..........................P2[CR][LF]
..........................P2[CR][LF]

tr does not work that way. It deals with individual characters, not strings. Since you gave it two characters on the input, it's expecting 2 characters on the output, too.

This should work:

tr -d '\r' < input > output
2 Likes

You could also just have awk ignore the extra char on the end like this:

awk '{print > "file.txt" susbtr($0,length-2,2) }' RAW
1 Like

Thanks Corona688 for your input !
Here is another solution using AWK:

~/unix.com$ awk '{gsub("\r","");print > "file"$NF".txt"}' RAW
1 Like

Thank you all for your replies. i will try them and let you know. thanks again

---------- Post updated at 11:58 AM ---------- Previous update was at 10:09 AM ----------

i got it without the space in the file name but since the "\r" is removed (carriage return) my rows all are jumbled up. when i view in Notepad++ i see only [LF] at end of line and it is not like [CR][LF] at end of line. this cause multiple rows in one line. i was going to use the file to import into SQL server but that would cause an error without a correct line breaker like [CR][LF].

is there way to append to the code to have the carriage return?

---------- Post updated at 12:18 PM ---------- Previous update was at 11:58 AM ----------

Note: I used Chuber's code and renamed the file extensions. it does involve time but i got what i need now. thanks

Code:

awk '{print > "file.txt" susbtr($0,length-2,2) }' RAW