Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
205 views
in Technique[技术] by (71.8m points)

python - Sampling from a set according to unnormalized log-probabilities in NumPy

I have a 1-D np.ndarray filled with unnormalized log-probabilities that define a categorical distribution. I would like to sample an integer index from this distribution. Since many of the probabilities are small, normalizing and exponentiating the log-probabilities introduces significant numerical error, therefore I cannot use np.random.choice. Effectively, I am looking for a NumPy equivalent to TensorFlow's tf.random.categorical, which works on unnormalized log-probabilities.

If there is not a function in NumPy that achieves this directly, what is an efficient manner to implement such sampling?

question from:https://stackoverflow.com/questions/65867476/sampling-from-a-set-according-to-unnormalized-log-probabilities-in-numpy

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In general, there are many ways to choose an integer with a custom distribution, but most of them take weights that are proportional to the given probabilities. If the weights are log probabilities instead, then a slightly different approach is needed. Perhaps the simplest algorithm for this is rejection sampling, described below and implemented in Python. In the following algorithm, the maximum log-probability is max, and there are k integers to choose from.

  1. Choose a uniform random integer i in [0, k).
  2. Get the log-weight corresponding to i, then generate an exponential(1) random number, call it ex.
  3. If max minus ex is less than the log-weight, return i. Otherwise, go to step 1.

The time complexity for rejection sampling is constant on average, especially if max is set to equal the true maximum weight. On the other hand, the expected number of iterations per sample depends greatly on the shape of the distribution. See also Keith Schwarz's discussion on the "Fair Die/Biased Coin Loaded Die" algorithm.

Now, Python code for this algorithm follows.

import random
import math

def categ(c):
 # Do a weighted choice of an item with the
 # given log-probabilities.
 cm=max(c) # Find max log probability
 while True:
      # Choose an item at random
      x=random.randint(0,len(c)-1)
      # Choose it with probability proportional
      # to exp(c[x])
      y=cm-random.expovariate(1)
      # Alternatively: y=math.log(random.random())+cm
      if y<c[x]:
          return x

The code above generates one variate at a time and uses only Python's base modules, rather than NumPy. Another answer shows how rejection sampling can be implemented in NumPy by blocks of random variates at a time (demonstrated on a different random sampling task, though).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...