Deleting specific rows in large files having rows greater than 100000

manish2009 · December 11, 2009, 2:42am

Hi Guys,

I need help in modifying a large text file containing more than 1-2 lakh rows of data using unix commands. I am quite new to the unix language

the text file contains data in a pipe delimited format

sdfsdfs
sdfsdfsd
START_ROW
sdfsd|sdfsdfsd|sdfsdfasdf|sdfsadf|sdfasdf
sdfsd|sdfsdfsd|sdfsdfasdf||sdfasdf
sdfsd||sdfsdfasdf|sdfsadf|sdfasdf
END_ROW
sdfsd
sdfsfsdf

i want to remove the header and the footer, so the final file would look like below.

sdfsd|sdfsdfsd|sdfsdfasdf|sdfsadf|sdfasdf
sdfsd|sdfsdfsd|sdfsdfasdf||sdfasdf
sdfsd||sdfsdfasdf|sdfsadf|sdfasdf

I tried varous vb methods to do it .However when i use it for large files it hangs and closes.

Thanks very much.

dennis.jacob · December 11, 2009, 2:46am

Try this with grep:

grep \| < infile >outfile

Yogesh_Sawant · December 11, 2009, 2:50am

if your data file contains only one START END block:

sed -n -e '/START_ROW/,/END_ROW/ {p} ; /END_ROW/ q' file.txt > newfile.txt

manish2009 · December 11, 2009, 2:50am

can i use the grep command to print lines which are greater than a specific length..... let say lines having length > 25

dennis.jacob · December 11, 2009, 2:59am

Yes. But in this case, perfromance will not be good, as the output of grep shoule be piped to another command which will select records >25 chars.

In this context, sed would be faster.

sed -n '/|/ { /.\{25\}/p }' < infile > outfile

Scrutinizer · December 11, 2009, 3:01am

sed '1,/START_ROW/d;/END_ROW/,$d' infile

manish2009 · December 11, 2009, 3:20am

below i m getting sed:garbage after command and a blank file generates

sed -n -e '/START_ROW/,/END_ROW/ {p} ; /END_ROW/ q' file.txt > newfile.txt

thanks
Manish

ghostdog74 · December 11, 2009, 3:46am

awk '/END/{f=0}/START/{f=1;next}f' file

manish2009 · December 11, 2009, 5:48am

The below code is working fine.

sed '1,/START-OF-DATA/d;/END-OF-DATA/,$d' corp_pfd_asia.out > corp_pfd_asiaout.txt

However there is a slight problem. The header of the data ends with "# PRODUCT=Corp/Pfd" after START-OF-DATA. when i input "# PRODUCT=Corp/Pfd" instead of "START-OF-DATA" it gives error think coz of "/"

sed '1,/PRODUCT=Corp/Pfd/d;/END-OF-DATA/,$d' corp_pfd_asia.out > corp_pfd_asiaout.txt

Thank you very much
Manish

---------- Post updated at 04:18 PM ---------- Previous update was at 03:42 PM ----------

The below code is working fine.

sed '1,/START-OF-DATA/d;/END-OF-DATA/,$d' corp_pfd_asia.out > corp_pfd_asiaout.txt

However there is a slight problem. The header of the data ends with "# PRODUCT=Corp/Pfd" after START-OF-DATA.

when i input "# PRODUCT=Corp/Pfd" instead of "START-OF-DATA" it gives error think coz of "/"

sed '1,/# PRODUCT=Corp/Pfd/d;/END-OF-DATA/,$d' corp_pfd_asia.out > corp_pfd_asiaout.txt

Thank you very much
Manish

Scrutinizer · December 11, 2009, 12:17pm

Hi Manish 2009:
You will have to escape forward / like this: \/ :

sed '1,/# PRODUCT=Corp\/Pfd/d;/END-OF-DATA/,$d' corp_pfd_asia.out > corp_pfd_asiaout.txt