Sunday, March 13, 2011

Random Sampling in AMPL

This came up on the AMPL user group; I figured I'd "double dip" and post an edited version of my solution here.  The question was how to randomly sample without replacement in AMPL.  The code below shows one way.

#
# randomly sample items from a set, with or without replacement
#
set UNIVERSE ordered;  # set of items to sample -- must be ordered
param ssize integer > 0;  # sample size desired
param sample {1..ssize} symbolic;  # will hold generated sample

data; # data below is just for demonstration purposes
set UNIVERSE := 'cat', 'dog', 'bird', 'fish', 'turtle', 'snake', 'frog', 'rock';
param ssize := 4;
model;  # back to the code

let ssize := min(ssize, card(UNIVERSE));  # omit if sampling with replacement
param item symbolic;  # dummy parameter to hold each selection
for {i in 1..ssize} {
  let item := member(ceil(Uniform(0, card(UNIVERSE))), UNIVERSE);
  let sample[i] := item;
  let UNIVERSE := UNIVERSE diff {item};  # omit if sampling with replacement
}

display sample; 

I might mention a few things about the code:
  • I used symbolic data in the example. If you are sampling from a set of numbers, you can omit the two instances of the keyword symbolic.
  • If you want to sample with replacement, there are two lines (marked by comments) that you need to omit.
  • The code puts the sample in a (vector) parameter.  If you are sampling without replacement, an alternative is to put the code into a set.  Define the set with a default value of {}, and use the union operator with second argument {item} to add the item to the sample set inside the loop.

3 comments:

  1. Hey Rubin,

    do you have an idea how to adjust your code, if the set members have different probabilities?

    Thank you in advance,

    Marc

    ReplyDelete
    Replies
    1. I can outline one approach. (I'm not sure it's the most efficient, but if the sample is not very large, it should be good enough.)

      1. Create a vector of partial sums of the probabilities, in the same order that the entities appear in the UNIVERSE set. If the probabilities are $p_1, \dots, p_n$, the new vector will be $s = (p_1, p_1 + p_2, \dots, p_1 + \dots + p_{n-1}, 1)$.
      2. For each observation in the sample:
      2a. Generate a pseudorandom number $u\in (0,1)$.
      2b. Find the smallest index $i$ such that $u \le s_i$.
      2c. Select member($i$, UNIVERSE) as the sampled item.
      2d. If sampling without replacement:
      2d1: Remove item $i$ from UNIVERSE.
      2d2: Recalculate the probabilities of the remaining items, changing $p_j$ to $p_j / (1-p_i)$ for $j \neq i$.
      2d3: Regenerate the partial sum vector $s$ (now shorter by one entry).

      Delete
    2. I like your approach.

      Thank you very much,

      Marc

      Delete

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.