I have a huge csv file with the following format of data,
[HEADER]
Num SNPs, 549997
Total SNPs,555352
Num Samples, 157
[Data]
SNP, SampleID, Allele1, Allele2
A001,AB1,A,A
A002,AB1,A,A
A003,AB1,A,A
...
...
...
I would like to write out a list of unique SNP (column 1). Could you let me know how to do this with UNIX command? Do I need to at firstl convert csv file to text file?
I get correct number of unique "SampleID", but not "SNP". I wonder why it didn't work for "SNP" (column 1).
I used
$ cat abc.csv | cut -f1 -d , | uniq
to get list of unique "SNP", and
$ cat abc.csv | cut -f2 -d , | uniq
to get list of unique "SmpleID"
I have total of 8,634,9539 rows in the csv file. It supposed to have 54,9997 unique SNP, but it turned out to be 8,634,9539, which is the same as total rows of file.
Again, I get correct number of unique SampleID, which is 167.