Hello,
I have two files.
File 1 is a list of interested IDs
Ex1
Ex2
Ex3
File 2 is the original file with over 8000 columns and 20 millions rows and is a compressed file .gz
Ex1 xx xx xx xx ....
Ex2 xx xx xx xx ....
Ex2 xx xx xx xx ....
Now I need to extract the information for all the IDs of interest from File 1. I have a script that should do that
import argparse
import gzip
if __name__ == '__main__':
parser = argparse.ArgumentParser
parser.add_argument('--file',action='store',dest='file',help="FILE2")
parser.add_argument('--IDs', action='store',dest='ids',help='FILE1')
parser.add_argument('--header', action='store_true',dest='header',help='TRUE or FALSE')
args = parser.parse_args()
file = gzip.open(args.file, 'rb')
idfile = open(args.ids, 'r')
if(args.header):
idfile.next()
id = set([s.rstrip() for s in idfile])
idfile.close()
oname = args.file[:-7] + 'result.txt'
o = open(oname, 'w')
o.write(file.next())
for l in file:
tmp = l.rsplit('\t')
if(tmp[0].rstrip() in ids):
o.write(l)
o.close()
but I get an error, which I don't understand as this script was used on the same file as before and it worked.. not sure what is going on in here... anyone help?
File "extract.py", line 24, in <module>
for l in file:
File "/usr/lib64/python2.7/gzip.py", line 450, in readline
c = self.read(readsize)
File "/usr/lib64/python2.7/gzip.py", line 256, in read
self._read(readsize)
File "/usr/lib64/python2.7/gzip.py", line 307, in _read
uncompress = self.decompress.decompress(buf)
zlib.error: Error -3 while decompressing: invalid block type