UNIX script for making random numbers without repetition

sajmar · September 2, 2013, 7:53pm

Hello,
I have a column which have 7200 numbers and I am deciding to pick up 1440 numbers randomly without any reputation? Could any one let me know which script in unix will be work for my case?

Regards
Sajjad

Chubler_XL · September 2, 2013, 9:58pm

Here is a solution using awk.

You didn't say column separator your file has - so I'm assuming it's a comma.

You can change the -F parameter according to you actual value, also note that some POSIX awk implementations may fall over if you number of columns is too large:

awk -F',' '{ srand();
  for(i=1;i<=1440;) {
        v=$(int(1+rand()*7200))
        if (!(v in N)) {
           N[v]
           printf "%s%s", v, (i++<1440?FS:RS)
        }
  }
}' random.csv

sajmar · September 2, 2013, 10:05pm

@Chubler_XL

Thanks for your command suggestion. According to your guess about the comma, I should to mention I just have one column which have 7200 line and there is no comma. do I need to just get rid of the comma in the awk command?

jim_mcnamara · September 2, 2013, 10:06pm

FWIW - "random" means any number in the range of valid values can occur any time. By saying "random with no duplicates" is not the same thing. Do not use this for encryption.

sajmar · September 2, 2013, 10:21pm

Thanks jim
Do you have any suggestion command?

Chubler_XL · September 2, 2013, 10:23pm

Yes, if you just have spaces (or tabs) between the values then remove the -F',' part.

sajmar · September 2, 2013, 10:33pm

I think my question was not very clear. I have a file which has one column with 7200 line and want to select 1440 lines ( %20 of the line) randomly without the duplication number between the 1440 numbers.
The command did not give me what do I want.

Chubler_XL · September 2, 2013, 10:44pm

OK my mistake, try this:

awk '
{ V[NR]=$1 }
END {
  srand()
  for(i=1;i<=1440;) {
    v=V[int(1+rand()*7200)]
    if (!(v in N)) {
       N[v]
       print v
       i++
    }
  }
}' random.txt

or even:

sort random.txt | uniq | shuf | head -1440

RudiC · September 3, 2013, 4:14pm

Maybe this one:

 awk ' {T[NR]=$1} END {srand(); for (i=1; i<=1440; i++) print T[int(1+rand()*NR)]}' file

Chubler_XL · September 3, 2013, 4:20pm

Nice Rudic, but the requirement was to pick 1440 numbers randomly without any reputation

RudiC · September 3, 2013, 4:30pm

Rats! Should have read the entire thread. Sorry for that...

Corona688 · September 3, 2013, 5:23pm

If you just want 1440 random lines, try

shuf < inputfile | head -n 1440

Chubler_XL · September 3, 2013, 5:38pm

Good point Corona688, I'd assumed the random file could contain duplicate entries (that being the nature of random data) and these were to be removed hence my sort and uniq code in post #8.

Your solution is optimal if sajmar doesn't require detection/avoidance of any duplicate values that may occur in the inputfile.

sajmar · September 14, 2013, 12:48pm

The shuf command will not work for me. could any one give me an awk command for select randomly 1440 out of 7200 numbers which is in one column?

Scott · September 14, 2013, 1:10pm

jot -s" " 7200 1 7200 | ./rand.php

$ cat rand.php
#!/usr/bin/php
<?php
  $file = fopen( 'php://stdin', 'r' );
  $numarr = explode( " ", fgets( $file ) );
  srand();
  $nums = array_rand( $numarr, 1440 );
  echo implode( " ", $nums );
  fclose( $file );
?>

$ jot -s" " 7200 1 7200 | ./rand.php | wc -w
    1440

You can use PHP to generate the array, or seq if you don't have jot, but you said you already had the numbers, so just <my numbers> | ./rand.php .

Corona688 · September 19, 2013, 11:56am

In what way will it "not work"? Does it not work because you don't have it, or does it not work because it's not the result you want? Important difference.

Chubler_XL · September 19, 2013, 4:03pm

Not POSIX, but some sort commands have a the --random-sort parameter, so:

sort --random-sort inputfile | head -n 1440

If you really need an awk solution stick to post #8

Corona688 · September 19, 2013, 4:53pm

An awk-based replacement for shuf:

awk 'BEGIN {srand(); } { A[NR-1]=$0; E++ }
        END { while(E>0) { print A[N=int(rand()*E)] A[N]=A[--E]; }' input > output