DISQUS

20bits: Random Weighted Elements in PHP

  • Chris · 1 year ago
    I'm glad you're still blogging. It's been a while since your previous post. I've enjoyed reading your posts on programming (especially the one on MySQL improvements). Thanks for writing these useful posts.
  • Jesse · 1 year ago
    Chris,

    Yeah, I'm trying to get back into it. Ever since I sold Adonomics I've been working more than full-time on it. Between that and a desire for an active social life I don't have much time for blogging.

    I'm going to try to write one article at least once every other week, though.
  • Mike · 1 year ago
    Great post. Now I'm trying to remember what I learned in my courses on probability in college... seems like there is a probability distribution for this, or something like that... what's the difference between a PDF and a CDF again? In any case, I think your method is essentially the same thing a stats class would have you do. :P
  • Jesse · 1 year ago
    Mike,

    X = f(W) is the random variable in question, if you want to make it even more formal. The specific properties of X, e.g., its CDF, are determined entirely by W.
  • Jeremy · 1 year ago
    Brilliant, and superbly clean code and explanation!
  • Venkat Koduru · 1 year ago
    Hmmm. I like your function, but here's what I'm looking to do...

    I want to generate a random number from 0 to 6000. I want their to be the highest probability (40% chance) of getting between 100 and 2000, inclusive. I want the probability of getting a number from 99 to 0, to gradually decrease, so that it would be almost impossible to get 0. Just to be clear the probability of getting 98 should be less than that of 99, the probability of getting 97 should be less than the probability of getting 98, etc. Similarly, I want the probability of getting a number from 2001 to 6000 (again including 2001 and 6000) to gradually decrease, so that it's almost impossible to get 6000.

    Anyone know how I could do this?
  • Kareem · 1 year ago
    Thank you very much for this awesome trick, i was working on a small application, and i was using the array method, but i didn't like it, as you said it takes lots of resources if there was a big set of data, which is the case for me.
    Wish you the best.
    Regards.
  • pj · 1 year ago
    I'm going to implement this for my python program, thanks!
  • Ishaan · 11 months ago
    I've been looking for something which does exactly this for a very long time now, and this is by far more elegant than anything I've been able to make so far. Thanks!
  • Porkins · 10 months ago
    This is great - a nice, tight function that does exactly what I was looking for - with one possible exception. Does this work well if you are drawing from an array with, say, 200 elements? Does adjusting the weight to the thousandths instead of the hundredths result with any degree of accuracy?
  • Matt · 10 months ago
    Thank you! Exactly what I was looking for.
  • blueskiwi · 9 months ago
    awesome, you rock, this is just what I needed, thanks!!
  • blueskiwi · 9 months ago
    for anyone wondering... you need to adjust the random range and the 1000 multiplier according to your weights

    ie the min and max rand value should (I think) correspond to the smallest possible weight value and the sum of all your weight values respectively (?)

    and if you use a multiplier that would be...

    min rand value = smallest possible weight * multiplier
    max rand value = sum of all weight values in set * multiplier
  • niabi · 8 months ago
    This is great, looking for something similar, the only thing that can't quite work on my side, is what if i have about 1,000 elements and I want certain ones to have more weight than others?

    I would have to write
    A = [1,2-999,1000]

    but what about the weight? it would be kind of hard I guess to divide 100 / 1,000 and just give weight to certain items...

    any solution for this kind of problem?

    something like: in my sql i have 1,000 items, in the table i have 3 rows, ID, NAME, WEIGHT then the items that have a higher number (1,2,3,4 ...) have more chance of showing up, 2 has double the chance of 1 and 3 double the chance of 2 but triple the chance of 1 and so on...

    any idea?
  • Richard Lynch · 7 months ago
    As far as I can tell, the choice of 1000 within the function is fairly arbitrary.

    I replaced it with 10, 100, 10000, and 42 and got seemingly equally random results (just eyeballing them).

    I was careful, however, to provide an array with weights between 0 and 1, and whose sum was 1.
    I.e., $ws = array(array(0.1,1),array(0.2,2),array(0.3,3),array(0.4,4));

    But I think you'll be in trouble no matter what if your array input does not give a full 100% values.

    I'd have to test more to be sure, but it's easier to just validate the input. :-)
  • Konrad · 5 months ago
    There are some improvements I noticed when I converted this to Python.
    First instead of adding onto the extra variable offset you could just substract from r.
    Also replacing 1000 by the sum of all weights makes things more flexible cause you can easily go by just having counts instead of weights also you don't have to make the sum up to 1. And even leave out the multiplication in line 5.
    This is what I have implemented in Python

    def wchoice(lst):
    wtotal = sum(lst)
    n = random.uniform(0, wtotal)

    for i, weight in enumerate(lst):
    if n < weight:
    break
    n = n - weight
    return i
  • leeoniya · 1 month ago
    Konrad, your implementation is not accurate. see for yourself. accumulate 100,000 cycles of a 50/50 weight. you will always get A around 49.5 and B around 50.5.
  • crhatfield · 1 month ago
    Add sorting to further increase accuracy, but at a cost of speed. (not alot though!)

    arsort($weights,SORT_NUMERIC);
  • Andy · 4 weeks ago
    Thanks for this.
    A few thoughts, in response to this and comments:
    1. Yes, the 1000 factor is arbitrary and should be replaced if not approp to your context.
    2. if the same weights are to be used repeatedly in a session, it may be worth setting up array of factored cumulative vals from the original weighted. This makes the loop a straight lookup rather than doing the same cumulative calcs each time.
    3. I keep thinking about the inefficiency of the 'bottom to top' comparison order. I wonder how much difference this makes with a large number of elements?
  • Andy · 4 weeks ago
    Oh! just realised:
    4. it should be $r = mt_rand(0,1000) or mt_rand(0,$factor) [NOT mt_rand(1,1000)] as the whole affair is merely a scaling of mt_rand(0,1)