How to extract subset file from dataset?

sajmar · September 4, 2013, 10:50am

Hello
I have a data set which looks like this :

progeny      sire          dam        gender
12                  1             3                M
13                  2             4                F
14                  2              5               F
15                  6              5               M

I need a subset data which separate the gender (M and F) to two files.
I want something like this:
file 1 output:

progeny      sire          dam        gender
13                  2             4                F
14                  2              5               F

file2 output:

progeny      sire          dam        gender
12                  1             3                M
15                  6              5               M

Thanks

Corona688 · September 4, 2013, 10:59am

awk 'NR==1 { print > "M" ; print > "F"; next }
{ print > $4 }' inputfile

sajmar · September 4, 2013, 12:28pm

@ Corona
Thanks for your suggestion. However, this command do not solve my problem.

Corona688 · September 4, 2013, 12:30pm

In what way did it not solve your problem? Be specific or I won't know what problem to fix.

sajmar · September 4, 2013, 12:37pm

@ COrona
To be clear my problem, I have a data set :

progeny            sire          dam        gender 
12                             1                  3                     M 
13                             2                  4      F 
14                             2                   5                    F 
15                             6      5                   M

I want the subset data based on selecting the gender which looks like this:

progeny            sire          dam        gender 
13                           2                   4                      F 
14                           2                    5                     F

Corona688 · September 4, 2013, 12:49pm

That is what my suggestion does, yes.

In what way does it not work for you? Be specific. What exactly did you do, and what precisely happened?

sajmar · September 4, 2013, 1:19pm

@ Corona:
When I run the program, it gives me the empty file.

awk 'NR==1 { print > "M" ; print > "F"; next }{ print > $4 }' aa > bb

Corona688 · September 4, 2013, 1:29pm

The output file was not included in my instructions, for the reason that it would be empty. It doesn't use it.

Check for the files 'M' and 'F' in the same directory, they will not be empty.

sajmar · September 4, 2013, 1:38pm

When I run the program I had M, F file but there is just one line.
What I have in my data set is more lines than the example. I have 2600 lines which contains M and F which are genders. What I want is how to separate 2 files from the data set in 2 file that have separate gender M and gender F.

Corona688 · September 4, 2013, 1:59pm

That is what my example does, yes. It writes to different file names depending on what the value of the fourth column is.

If the fourth column isn't what you showed it to be in your example data, it won't do what I expect. Check the contents of your folder with 'ls', it may have made weird names.

Could you show a more complete example of your input data please?

sajmar · September 4, 2013, 2:04pm

you can find my data set which I want to subset base on gender M and F in 2 separate file.

Corona688 · September 4, 2013, 2:09pm

The data you posted clearly shows M/F in the fifth column, not the fourth.

Also, the data you posted has no header row, which your original data did. I can simplify my code a lot knowing it's not there.

awk '{ print > $5 }' inputfile

briandanielz · September 8, 2013, 5:13am

This is really bad, but seems to work.
Making the assumption that M or F will only appear once on each line
and separated by white space.

while read line
	do
	    if [[ $line == *M* ]]; then  
	    echo "$line"
	    ## cat to file	
	    fi
	    if [[ $line == *F* ]]; then
	    echo "$line"
	    ## cat to file
	    fi
	done < file

w020637 · September 10, 2013, 11:57am

The solution works

---------- Post updated at 11:57 AM ---------- Previous update was at 11:52 AM ----------

grep M aa.txt > M
grep F aa.txt > F

This will get you what you need