PERL: Extract random record which has 4 lines each

phoeberunner · October 13, 2009, 12:46pm

Hi,

I have a data file with millions of record (N). Each record was saved in 4 lines. So there are total of NX4 lines in the data file.

For Example:

Host1
a
b
c
d
Host2
e
f
g
h
Host3
i
j
k
l

I would like to write a PERL script to extract 1000 random records , WITHOUT repeating/replacement. So, there is total of 4000 lines in output file.

Could you help me this ?

Thanks,
Phoebe

sweetblood · October 13, 2009, 1:18pm

what have you done to accomplish your desire to write a perl script for this? What problems are you having?

phoeberunner · October 13, 2009, 1:35pm

I have code below to randomly select number of records (1 line for each record only) from file.
I'm thinking to modify this code in a way like, if the selected random number is 6, which means record 6 is picked, then it will retrieve lines from (5*4)+1 (which is 21) to line 24.

This is my first time writting perl script. Please help.

#!/usr/bin/perl

die "Usage: $0 <N>, where N is the number of lines to pick\n"
if @ARGV<1;
$N = shift@ARGV;

@pick=();
while(<>){
if (@pick < $N) {
push @pick,$;
($r1,$r2)=(rand(@pick),rand(@pick));
($pick[$r1],$pick[$r2])=($pick[$r2],$pick[$r1]);
} else {
rand($.)<=$N and $pick[rand(@pick)]=$;
}
}

print @pick;