how to write shell script to extract lines we want

hi
i have a file which is very large . it contains lines in the format below:

seed url, html url
....
...
seed url, html url

i have sort it already.
2010�Ϸ�����*_�����籩_������ �ż���ר��������24��Ļ��� ��������δ���Ž�_2010�Ϸ�����_�����籩_������
2010�Ϸ�����*_�����籩_������ �����뱴����˹������ Ī�*������������ʽ����_��������_2010�Ϸ�����_�����籩_������
2010�Ϸ�����*_�����籩_������ ר�ù������������������� �������ӽ�16ǿ__2010�Ϸ�����_�����籩_������
2010�Ϸ�����*_�����籩_������ �����Ը�Ů�Ѳ�������������� �뵽�й�������è(ͼ)_H��_2010�Ϸ�����*_�����籩_������
2010�Ϸ�����*_�����籩_������ �ƽ��裺���м����� ����*��ս���ڡ��Ƽ����ݡ�__2010�Ϸ�����*_�����籩_������
���ݷ��ز������Ż�_ס�ں�����_��*���Ӱ������������ý��_���ݷ�����,�����·�,����¥��,������,�����ⷿ,���س���,���ݶ��ַ���ѡ 70���г��� ��8�����ۻ����µ�-����-��*����-ס�ں���
���ݷ��ز������Ż�_ס�ں�����_��*���Ӱ������������ý��_���ݷ�����,�����·�,����¥��,������,�����ⷿ,���س���,���ݶ��ַ���ѡ 2010��6��11�ա�ס���ʱ���-�ʱ�-��*����-ס�ں���
���ݷ��ز������Ż�_ס�ں�����_��*���Ӱ������������ý��_���ݷ�����,�����·�,����¥��,������,�����ⷿ,���س���,���ݶ��ַ���ѡ �ⷿ�н��"��500��200" �г����ɷ����н�����ֶ�-�н��,�ⷿ-��*����-ס�ں���
���ݷ��ز������Ż�_ס�ں�����_��*���Ӱ������������ý��_���ݷ�����,�����·�,����¥��,������,�����ⷿ,���س���,���ݶ��ַ���ѡ 5�¾�����ݽ񹫲� ͨ�������Դ��Ϣ���ٴηŻ�--������-ס�ں���
���ݷ��ز������Ż�_ס�ں�����_��*���Ӱ������������ý��_���ݷ�����,�����·�,����¥��,������,�����ⷿ,���س���,���ݶ��ַ���ѡ ���������������� ��Ȩʽ�Ƶ���������� -�Ƶ�ʽ��Ԣ,����-��*����-ס�ں���
���ݷ��ز������Ż�_ס�ں�����_��*���Ӱ������������ý��_���ݷ�����,�����·�,����¥��,������,�����ⷿ,���س���,���ݶ��ַ���ѡ 5�·ݾ�����ݹ��� �������Ѽ۸�ͬ������3.1%--������-ס�ں���

now i want to get 3 htmlurl for each seedurl.
any tips will be appreciated.

---------- Post updated at 11:15 AM ---------- Previous update was at 11:11 AM ----------

hi
i have a file which is very large . it contains lines in the format below:

seedurl1, htmlur1
seedurl1, htmlurl2
....
seedurl1,htmlurln
.....
seedurlm,htmlurl1
seedurlm,htmlurl2
.....
seedurlm,htmlurln
......

now i want to get 3 htmlurl3 for each seedurl.
any tips will be appreciated.

Your first post can't be read.

For your second post, do you want to get the output as below?

seedurl1, htmlur1, htmlurl2, htmlurl3 (the first 3 urls for each seedurl)?
...
seedurlm, htmlur1, htmlurl2, htmlurl3

for the second,i want to get the output as:
seedurl1,htmlurl1
seedurl1,htmlurl2
seedurl1,htmlurl3
.....
seedurlm,htmlurl1
seedurlm,htmlurl2
seedurlm,htmlurl3
....

thanks,any idea

your first post can't be read, all of them are converted to http links automatically. I guess you need wrap CODE tags around the input file.

you can treat first post and second post as the same, and ignore the first.
just give tips about the second.
thanks

Should have better solution.

$ cat urfile
seedurl1,htmlurl1
seedurl1,htmlurl2
seedurl1,htmlurl3
seedurl1,htmlurl4
seedurl1,htmlurl5
seedurl1,htmlurl6
seedurlm,htmlurl1
seedurlm,htmlurl2
seedurlm,htmlurl3
seedurlm,htmlurl4
seedurlm,htmlurl5

$ awk -F , '{a[$1]=a[$1] FS $2}
            END {for (i in a) {split(a,b,","); printf "%s,%s\n%s,%s\n%s,%s\n",i,b[2],i,b[3],i,b[4]}} ' urfile
seedurlm,htmlurl1
seedurlm,htmlurl2
seedurlm,htmlurl3
seedurl1,htmlurl1
seedurl1,htmlurl2
seedurl1,htmlurl3

With your real data:

$ cat urfile1
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/01528724.shtml
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/03238769.shtml
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/04448785.shtml
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/05328842.shtml
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/13359200.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/10/016678515.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/11/016678967.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/11/016679056.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/11/016679169.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/11/016679553.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/11/016679707.shtml

$ awk -F , '{a[$1]=a[$1] FS $2}
            END {for (i in a) {split(a,b,","); printf "%s,%s\n%s,%s\n%s,%s\n",i,b[2],i,b[3],i,b[4]}} ' urfile1
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/10/016678515.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/11/016678967.shtml
http://zzhz.zjol.com.cn,http://zzhz.zjol.com.cn/05zzhz/system/2010/06/11/016679056.shtml
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/01528724.shtml
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/03238769.shtml
http://2010.sina.com.cn,http://2010.sina.com.cn/2010-06-09/04448785.shtml

thanks