Dimitri DeFigueiredo Ph.D.Computing, finance, security and collaborative decisions
http://dimitri.xyz/
Tue, 02 May 2017 14:31:54 +0000Tue, 02 May 2017 14:31:54 +0000Jekyll v3.4.3Generating random integers from random bytes<p>Here is a problem that appears simpler than it is. Programmers many times have at their disposal a source of random bits but need to generate uniformly distributed random integers in a given range. The immediate algorithms for doing this, using remainders or scaling lead to biased distributions. A cursory review of open source software shows that using these is a common problem. They even show up in cryptographic libraries! Usually with minor effects. If you want to learn how to do this “conversion” properly, read on.</p>
<p>Imagine you are running a simple lottery that works like a raffle. All the tickets are sold at the same price. People buy tickets throughout the week and a single winning ticket is announced at the end of the week. There is a winner every week; the total prize does not accumulate week-to-week.</p>
<p>A basic requirement for this lottery is that every ticket must have the same chance of winning as every other ticket. In fact, this may be a legal requirement. It is not fair for a ticket to have a higher chance of winning than any other ticket as they all cost the same.</p>
<p>The number of people buying tickets changes from week to week; but no matter how many tickets are sold, the lottery must always be fair. Each week, the wining ticket must be drawn from a uniform distribution over all the tickets sold. You don’t want to break the law.</p>
<p>To aid you in executing the lottery you are given a <em>perfect</em> source of random bytes. Imagine that by using some oscillators, heat and quantum wizardry Intel has just release a wonderful new chip. This chip will give you as many random bits as you want. Each bit is drawn from a Bernoulli distribution with probability \(1/2\). In the javascript world, this is equivalent to a perfect version of <code class="highlighter-rouge">getRandomValues()</code> or Node.js’ <code class="highlighter-rouge">randomBytes()</code>.</p>
<p>Armed with your perfect source of random bits you set out to code the software that will draw the winning lottery ticket. Here are a few ways <em>not</em> to do it.</p>
<h3 id="bias-from-remainders">Bias from Remainders</h3>
<p>Assume that in a given week \(N=10\) tickets were sold. Using our source of randomness we use the next power of 2, that is 16, and pick a random number between 0 and 15 by using 4 perfectly random bits. We then proceed to calculate the winning ticket using simple modular arithmetic.</p>
<div class="language-javascript highlighter-rouge"><pre class="highlight"><code><span class="s2">"use strict"</span><span class="p">;</span>
<span class="kr">const</span> <span class="nx">crypto</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'crypto'</span><span class="p">)</span>
<span class="cm">/* use just 4 random bits - a value from 0 to 15 */</span>
<span class="kr">const</span> <span class="nx">randomBits</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">randomBytes</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="nx">readUInt8</span><span class="p">()</span> <span class="o">&</span> <span class="mh">0x0F</span>
<span class="kr">const</span> <span class="nx">N</span> <span class="o">=</span> <span class="mi">10</span>
<span class="kd">let</span> <span class="nx">winner</span> <span class="o">=</span> <span class="nx">randomBits</span> <span class="o">%</span> <span class="nx">N</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="err">`</span><span class="nx">And</span> <span class="nx">the</span> <span class="nx">winner</span> <span class="nx">is</span> <span class="nx">ticket</span> <span class="nx">number</span> <span class="nx">$</span><span class="p">{</span><span class="nx">winner</span><span class="p">}</span> <span class="o">!</span><span class="err">`</span><span class="p">)</span>
</code></pre>
</div>
<p>Unfortunately, this generates a biased distribution because \(N\) is not a power of 2. We will be grouping the 16 values that can be returned by <code class="highlighter-rouge">randomBytes</code> into 10 “equivalence classes”. Some classes will collect more values than others, as the next diagram shows.</p>
<p><img src="http://dimitri.xyz/assets/mod-16-diagram.png" alt="Using mod 16 bits" /></p>
<p>This is really bad for the lottery as tickets 0 through 5 are twice as likely to win as the other tickets. We can mitigate this problem by using more random bits. If we use 6 random bits to pick a number between 0 and 63 and then apply the modulus (i.e. <code class="highlighter-rouge">%</code>) operation, we have the following situation:</p>
<p><img src="http://dimitri.xyz/assets/mod-64-diagram.png" alt="Using mod 64 bits" /></p>
<p>Two points to note:</p>
<ol>
<li>Because \(N = 10\) is not a power of 2, there will <em>always</em> be some favored tickets that are more likely to win.</li>
<li>The discrepancy between the most likely and the least likely tickets is a function of the number of random bits used. If we use \(m\) random bits (we used \(m = 6\) in the last diagram), the difference in probability between the more likely and least likely tickets is \(1/2^m\).</li>
</ol>
<p>The second point above means that we can make the unfairness imperceptibly small by using lots of random bits. This is good. However, the difference in probability never actually goes away. We will never generate a truly uniform distribution this way.</p>
<h3 id="bias-from-scaling">Bias from Scaling</h3>
<p>Another attempt at generating a uniform distribution goes like this. Imagine we have a random number \(r\) in the interval \([0,1)\). We can multiply \(r\) by the number of tickets sold \(N = 10 \) to obtain a random number \(r N\) in the interval \([0,10)\) and then just take the integer part.</p>
<p>This mistaken algorithm <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random#Getting_a_random_integer_between_two_values">is used a lot</a> in javascript because the output of <code class="highlighter-rouge">Math.random</code> looks a lot like the random number \(r\). Here’s what it would look like in Node.js code:</p>
<div class="language-javascript highlighter-rouge"><pre class="highlight"><code><span class="s2">"use strict"</span><span class="p">;</span>
<span class="cm">/* get random number in [0,1) */</span>
<span class="kr">const</span> <span class="nx">r</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span>
<span class="kr">const</span> <span class="nx">N</span> <span class="o">=</span> <span class="mi">10</span>
<span class="kd">let</span> <span class="nx">winner</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nx">r</span> <span class="o">*</span> <span class="nx">N</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="err">`</span><span class="nx">And</span> <span class="nx">the</span> <span class="nx">winner</span> <span class="nx">is</span> <span class="nx">ticket</span> <span class="nx">number</span> <span class="nx">$</span><span class="p">{</span><span class="nx">winner</span><span class="p">}</span> <span class="o">!</span><span class="err">`</span><span class="p">)</span>
</code></pre>
</div>
<p>Unfortunately, this idea also generates a biased distribution. This is because we don’t actually have a random number \(r\) in the interval \([0,1)\). All we have are random bits. We would need an <em>infinite</em> number of random bits to be able to generate a random number in \([0,1)\). Because we do not have an infinite number of bits, the best we can do is to approximate \(r\) and this approximation leads to the bias. To see why, first consider using 4 bits to generate a random number in \([0,1)\). This can be done by sticking a “0.” in front of the bits generated and reading them as a binary number. This is equivalent to dividing the output of 4 bits by \(16=2^4\) as shown below.</p>
<p><img src="/assets/scaling-4-bits-diagram.png" alt="random 4 bit fraction" style="width: 400px;" /></p>
<p>These are the possible values that we can generate inside the \([0,1)\) interval with only 4 random bits. Now consider what will happens in the expression <code class="highlighter-rouge">Math.floor(r * N)</code> for \(N = 10 \). Any value of \(r\) smaller than 0.1 will be floored to zero. Similarly, values of \(r\) in the range \([0.1 , 0.2)\) will be output as 1, values in the range \([0.2 , 0.3)\) will be output as 2, and so on. In short, this function partitions the \([0,1)\) interval into 10 distinct regions. Unfortunately, we have a different number of possible inputs falling in each region as the next diagram shows.</p>
<p><img src="/assets/scaling-16-line.png" alt="regions of unit interval" style="width: 250px;" /></p>
<p>This asymmetry makes some outputs more likely than others. For example, zero which corresponds to values in the region \([0 , 0.1)\) is twice as likely as 2 which corresponds to the region \([0.2 , 0.3)\)). Again, this happens because some regions or “equivalence classes” contain more possible input values. Each equivalence class corresponds to a lottery ticket, so some lottery tickets are more likely to win.</p>
<p>Unless \(N\) is a power of 2, this will always happen. Just like in the case of remainder bias, we can mitigate the problem by using a large number of bits. If we use a standard 53 bits, the difference in probability between the most likely and the least likely tickets will be \(2^{-53}\approx \frac{1}{10^{16}}\). That is virtually undetectable and good for most (non-cryptographic) applications, but this is not a truly uniform distribution. We can do better.</p>
<p><strong>Warning:</strong> The trick of using more random bits to obtain a smaller bias works if we don’t also have to increase the number of equivalence classes \(N\), otherwise we might be back where we started.</p>
<h3 id="doing-it-right--rejection-sampling">Doing it right — Rejection Sampling</h3>
<p>There is, in fact, a simple way to generate our winning ticket from a perfectly uniform (i.e. 100% fair) distribution. We sample our source of random bits and then simply reject samples we don’t like.</p>
<p>For example, we use our perfect source of random bits to generate an integer and simply reject samples outside our range. We can generate an integer between 0 and 15 using 4 bits of our perfect source. If the output is smaller than 10 then it is accepted; otherwise, it is rejected and we try again. Here’s what it looks like in code:</p>
<div class="language-javascript highlighter-rouge"><pre class="highlight"><code><span class="s2">"use strict"</span><span class="p">;</span>
<span class="kr">const</span> <span class="nx">crypto</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'crypto'</span><span class="p">)</span>
<span class="kd">function</span> <span class="nx">sample</span><span class="p">(){</span><span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">randomBytes</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="nx">readUInt8</span><span class="p">()</span> <span class="o">&</span> <span class="mh">0x0F</span><span class="p">}</span>
<span class="kr">const</span> <span class="nx">N</span> <span class="o">=</span> <span class="mi">10</span>
<span class="cm">/* Rejection Sampling */</span>
<span class="kd">var</span> <span class="nx">s</span>
<span class="k">do</span> <span class="p">{</span>
<span class="nx">s</span> <span class="o">=</span> <span class="nx">sample</span><span class="p">()</span> <span class="c1">// s is a value from 0 to 15</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="nx">s</span> <span class="o">>=</span> <span class="nx">N</span><span class="p">)</span> <span class="c1">// reject if outside our desired range</span>
<span class="kd">let</span> <span class="nx">winner</span> <span class="o">=</span> <span class="nx">s</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="err">`</span><span class="nx">And</span> <span class="nx">the</span> <span class="nx">winner</span> <span class="nx">is</span> <span class="nx">ticket</span> <span class="nx">number</span> <span class="nx">$</span><span class="p">{</span><span class="nx">winner</span><span class="p">}</span> <span class="o">!</span><span class="err">`</span><span class="p">)</span>
</code></pre>
</div>
<p>Our source of randomness is perfect, so rejecting out of bounds samples will <em>not</em> bias the samples that are allowed through. However, there is something slightly troubling about this algorithm. It may never terminate! The random source may continually output random bits whose value is larger than 9 forcing us to sample forever. This is highly unlikely as our source is random, but we cannot set a hard worst-case bound on how long this algorithm will run.</p>
<p>This is a trade-off we face, in the previous biased-distribution examples we could guarantee that our algorithms would terminate, but had to use more and more bits to make the bias in the generated probability distributions small. With rejection sampling, we guarantee that there is no bias in the generated probability distribution, but we may have to use more and more bits to ensure our algorithm terminates.</p>
<h3 id="termination">Termination</h3>
<p>When one does have a good source of randomness, it is easy to ensure “timely” termination. It will be extremely unlikely that the rejection sampling algorithm will have to sample many times to be able to produce a valid sample. More precisely, the likelihood that \(k \) samples are needed before a valid sample is found decreases exponentially as \(k \) increases and is smaller than \(2^{-k}\). We just have to ensure that the probability that any single sample is accepted is at least \(1/2\).</p>
<p>Consider our running example. There is a \(10/16 = 5/8\) chance that one sample will fall in our desired range. In other words, there is a \(3/8\) chance that it will be rejected. What is the likelihood that we have to try 10 or more samples to get one in the desired range?</p>
<p>We only try 10 or more samples if on the previous 9 attempts we got a sample that was larger than 9. As the samples are independent, that is going to happen with probability \( (\frac{3}{8})^9 \approx 0.00015 \). This is once every 6818 runs. The probability we will need more than 50 samples is a minuscule \(5 \times 10^{-22} \).</p>
<p>In the previous example, we only used 4 bits of randomness to generate an integer from 0 to 15. If we had used 32 bits from our random source to obtain a number between 0 and \( 2^{32} - 1 = 4294967295 \) and rejected all samples larger than 10, we would have a problem because we would be rejecting most of the samples. In this case, the probability we will have to sample at least 10 times becomes \( ( \frac{2^{32}-10}{2^{32}})^9 \approx 0.99999998 \). And the probability we will need more than 50 samples becomes 0.99999988. In other words, we will most likely be sampling more than 50 times every time we run our algorithm. This is definitely not what we want.</p>
<h3 id="extending-our-range">Extending our range</h3>
<p>The problem with using 32 bits as above is that we rejected too many samples. The probability of finding an acceptable sample was too low. Again, we have to ensure that the probability that any single sample is accepted is at least \(1/2\).</p>
<p>Consider the simpler example of using 8 bits, that is a number between 0 and 255 to obtain a lottery winner \(N = 10\). Instead of rejecting all samples equal to 10 or above, we calculate the largest multiple of 10 that is less than \(2^8\). That number is 250. We then only reject values at or above this number. Obviously, if we do this we will get a number between 0 and 250, which is not what we want. But because 250 is an exact multiple of 10, we can now safely apply the remainder to get a number in the desired range without biasing our perfectly uniform distribution.</p>
<p>Here’s the code:</p>
<div class="language-javascript highlighter-rouge"><pre class="highlight"><code><span class="s2">"use strict"</span><span class="p">;</span>
<span class="kr">const</span> <span class="nx">crypto</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'crypto'</span><span class="p">)</span>
<span class="kd">function</span> <span class="nx">sample</span><span class="p">(){</span><span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">randomBytes</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="nx">readUInt8</span><span class="p">()}</span>
<span class="kr">const</span> <span class="nx">maxRange</span> <span class="o">=</span> <span class="mi">256</span>
<span class="kr">const</span> <span class="nx">N</span> <span class="o">=</span> <span class="mi">10</span>
<span class="cm">/* extended range rejection sampling */</span>
<span class="kr">const</span> <span class="nx">q</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span> <span class="nx">maxRange</span> <span class="o">/</span> <span class="nx">N</span> <span class="p">)</span>
<span class="kr">const</span> <span class="nx">multiple_of_N</span> <span class="o">=</span> <span class="nx">q</span> <span class="o">*</span> <span class="nx">N</span>
<span class="kd">var</span> <span class="nx">s</span>
<span class="k">do</span> <span class="p">{</span>
<span class="nx">s</span> <span class="o">=</span> <span class="nx">sample</span><span class="p">()</span> <span class="c1">// 0 to 255</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="nx">s</span> <span class="o">>=</span> <span class="nx">multiple_of_N</span><span class="p">)</span> <span class="c1">// extended acceptance range</span>
<span class="kd">let</span> <span class="nx">winner</span> <span class="o">=</span> <span class="nx">s</span> <span class="o">%</span> <span class="nx">N</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="err">`</span><span class="nx">And</span> <span class="nx">the</span> <span class="nx">winner</span> <span class="nx">is</span> <span class="nx">ticket</span> <span class="nx">number</span> <span class="nx">$</span><span class="p">{</span><span class="nx">winner</span><span class="p">}</span> <span class="o">!</span><span class="err">`</span><span class="p">)</span>
</code></pre>
</div>
<p>Furthermore, the probability each sample is accepted is \( \frac{250}{256} \) and that is larger than \( \frac{1}{2} \) as we wanted.</p>
<p>We also need to make sure we have more random bits than the range of integers we need.</p>
<h3 id="polishing-up">Polishing up</h3>
<p>We have assumed, so far, that we have a perfect source of random bits. This gave us assurance that our algorithm will (with high probability) eventually be done. However, if we have a buggy or biased source of randomness, our rejection sampling algorithm can go into an infinite loop by rejecting all samples. This should never happen with a truly random source. Actually, never is a strong word, let’s just say it should not happen in the next billion years. So, we want to signal there is an error when this happens, rather than go into an infinite loop.</p>
<p>The final version of our algorithm sets an upper bound on how many times sampling is attempted before a valid sample is found. We will set this upper bound to be 100 attempts. The probability that we need more than 100 random samples is less than \( 2^{-100} \), so it is more likely we have a bug in our source of randomness than that this long sequence of rejected samples actually happened by chance.</p>
<p>We also provide a simple utility function <code class="highlighter-rouge">getRandIntInclusive</code> to generate integers in an arbitrary range that <em>includes</em> the upper and lower bounds. We also made everything work sensibly when those bounds are fractional numbers: it works as long as there is an integer in the range. So that,</p>
<ul>
<li><code class="highlighter-rouge">getRandIntInclusive(2, 3)</code> returns 2 or 3;</li>
<li><code class="highlighter-rouge">getRandIntInclusive(2.1, 2.9)</code> fails; and</li>
<li><code class="highlighter-rouge">getRandIntInclusive(2.1, 3.9)</code> always returns 3.</li>
</ul>
<p>Here’s the final version:</p>
<div class="language-javascript highlighter-rouge"><pre class="highlight"><code><span class="s2">"use strict"</span><span class="p">;</span>
<span class="kr">const</span> <span class="nx">crypto</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'crypto'</span><span class="p">)</span>
<span class="c1">// 32 bit maximum</span>
<span class="kr">const</span> <span class="nx">maxRange</span> <span class="o">=</span> <span class="mi">4294967296</span> <span class="c1">// 2^32</span>
<span class="kd">function</span> <span class="nx">getRandSample</span><span class="p">(){</span><span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">randomBytes</span><span class="p">(</span><span class="mi">4</span><span class="p">).</span><span class="nx">readUInt32LE</span><span class="p">()}</span> <span class="c1">//Node.js, change for Web API</span>
<span class="kd">function</span> <span class="nx">unsafeCoerce</span><span class="p">(</span><span class="nx">sample</span><span class="p">,</span> <span class="nx">range</span><span class="p">){</span><span class="k">return</span> <span class="nx">sample</span> <span class="o">%</span> <span class="nx">range</span><span class="p">}</span>
<span class="kd">function</span> <span class="nx">inExtendedRange</span><span class="p">(</span><span class="nx">sample</span><span class="p">,</span> <span class="nx">range</span><span class="p">){</span><span class="k">return</span> <span class="nx">sample</span> <span class="o"><</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nx">maxRange</span> <span class="o">/</span> <span class="nx">range</span><span class="p">)</span> <span class="o">*</span> <span class="nx">range</span><span class="p">}</span>
<span class="cm">/* extended range rejection sampling */</span>
<span class="kr">const</span> <span class="nx">maxIter</span> <span class="o">=</span> <span class="mi">100</span>
<span class="kd">function</span> <span class="nx">rejectionSampling</span><span class="p">(</span><span class="nx">range</span><span class="p">,</span> <span class="nx">inRange</span><span class="p">,</span> <span class="nx">coerce</span><span class="p">){</span>
<span class="kd">var</span> <span class="nx">sample</span>
<span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">do</span><span class="p">{</span>
<span class="nx">sample</span> <span class="o">=</span> <span class="nx">getRandSample</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">i</span> <span class="o">>=</span> <span class="nx">maxIter</span><span class="p">){</span>
<span class="c1">// do some error reporting.</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"Too many iterations. Check your source of randomness."</span><span class="p">)</span>
<span class="k">break</span> <span class="cm">/* just returns biased sample using remainder */</span><span class="p">}</span>
<span class="nx">i</span><span class="o">++</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span> <span class="o">!</span><span class="nx">inRange</span><span class="p">(</span><span class="nx">sample</span><span class="p">,</span> <span class="nx">range</span><span class="p">)</span> <span class="p">)</span>
<span class="k">return</span> <span class="nx">coerce</span><span class="p">(</span><span class="nx">sample</span><span class="p">,</span> <span class="nx">range</span><span class="p">)</span>
<span class="p">}</span>
<span class="c1">// returns random value in interval [0,range) -- excludes the upper bound</span>
<span class="kd">function</span> <span class="nx">getRandIntLessThan</span><span class="p">(</span><span class="nx">range</span><span class="p">){</span>
<span class="k">return</span> <span class="nx">rejectionSampling</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">ceil</span><span class="p">(</span><span class="nx">range</span><span class="p">),</span> <span class="nx">inExtendedRange</span><span class="p">,</span> <span class="nx">unsafeCoerce</span><span class="p">)}</span>
<span class="c1">// returned value is in interval [low, high] -- upper bound is included</span>
<span class="kd">function</span> <span class="nx">getRandIntInclusive</span><span class="p">(</span><span class="nx">low</span><span class="p">,</span> <span class="nx">hi</span><span class="p">){</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">low</span> <span class="o"><=</span> <span class="nx">hi</span><span class="p">)</span> <span class="p">{</span>
<span class="kr">const</span> <span class="nx">l</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">ceil</span><span class="p">(</span><span class="nx">low</span><span class="p">)</span> <span class="c1">//make also work for fractional arguments</span>
<span class="kr">const</span> <span class="nx">h</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="nx">hi</span><span class="p">)</span> <span class="c1">//there must be an integer in the interval</span>
<span class="k">return</span> <span class="p">(</span><span class="nx">l</span> <span class="o">+</span> <span class="nx">getRandIntLessThan</span><span class="p">(</span> <span class="nx">h</span> <span class="o">-</span> <span class="nx">l</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="p">}}</span>
<span class="kd">var</span> <span class="nx">winner</span> <span class="o">=</span> <span class="nx">getRandIntInclusive</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="err">`</span><span class="nx">And</span> <span class="nx">the</span> <span class="nx">winner</span> <span class="nx">is</span> <span class="nx">You</span><span class="o">!</span> <span class="nx">Ticket</span> <span class="nx">number</span> <span class="nx">$</span><span class="p">{</span><span class="nx">winner</span><span class="p">}</span> <span class="o">!</span><span class="err">`</span><span class="p">)</span>
</code></pre>
</div>
<p>(Here’s a <a href="https://gist.github.com/dimitri-xyz/ba6f6d81a9db39d2a918fb8ecece9a76">haskell version</a> of the same code.)</p>
<p>That’s all folks! Now, you know how to properly generate uniformly distributed random integers from random bytes. Tricky, isn’t it!?</p>
<p><em>Acknowledgments: I would like to thank <a href="http://jelv.is/">Tikhon Jelvis</a> for some great suggestions to improve this blog post.</em></p>
Fri, 21 Apr 2017 00:00:00 +0000
http://dimitri.xyz/random-ints-from-random-bits/
http://dimitri.xyz/random-ints-from-random-bits/Flushing Mortgage Payments Down the Toilet<p>Nobody likes to pay rent. You never see that money again. From the point of view of the renter, paying rent is just like flushing your money down the toilet. The money is gone.</p>
<p>Many proud home owners live under the impression that they do not flush their money down the toilet each month. Unfortunately, that is not entirely true. Most home owners have to make monthly mortgage payments. Part of these payments does pay for the house, but another part pays for interest on the debt. Paying interest on debt is very similar to paying for rent. The money is gone, you will never see it again. To find out how much of your mortgage payments will be flushed down the toilet each month, we need to do some calculations.</p>
<p>Buying a house is a highly leveraged and, therefore, risky bet. Buying a property with a 20% down payment is a bet with 5× leverage. An 8% decrease in the value of the home means that the home owner just lost 5×8 = 40% of his down payment! The calculations that follow should be considered in conjunction with changes in home value or when home prices are expected to remain stable over the long run.</p>
<h3 id="how-much-interest-are-you-paying">How much interest are you paying?</h3>
<p>As you pay your mortgage, how much is owed to the bank decreases with time. The home owner only pays interest on the remaining balance, not the whole amount borrowed.</p>
<p>To illustrate this point consider a simple mortgage payment scheme with decreasing payments. Assume you borrow $500,000 from your bank at an interest rate of 5% per year (or 0.407% per month) for 25 years (300 months). One way to pay down the mortgage debt is to pay, each month, a fixed part of your total debt and all interest accrued during that period. If you choose to pay your mortgage in 25 years (300 months), then at each month you pay 1/300th of the initial balance of $500,000 and all the interest accrued during that same month. This way, any interest owed never compounds for more than a month. If the amount initially borrowed is $500,000, then the amount of principal paid each month is</p>
<p>$500,000/(300 payments) = $1,666.67 per month.</p>
<p>The interest accrued during the <em>first</em> period is</p>
<p>500,000 × 0.00407 = $2,037.06.</p>
<p>So, the first month’s payment is</p>
<p>1,666.67 + 2.037.06 = $3,703.73</p>
<p>You owe an initial balance \(B_{0}\) of $500,000 when you take out the money. After you make your first payment, at the end of your first month, you will owe</p>
<p>\(B_{1} = 299/300 × 500,000 = $498,333.33 \)</p>
<p>At the end of the second month, you will again pay $1,666.67 corresponding to 1/300th of the principal, but this time your remaining balance has decreased to $498,333 and so you will pay a little less interest. The total payment at the end of the second month is given by</p>
<p>1,666.67 + (299/300 × 500,000) × 0.00407 = 3,696.94</p>
<p>At the end of the second month, the debt is reduced to 298/300ths of its original value.</p>
<p>\(B_{2}\) = 298/300 × 500,000 = $496,667</p>
<p>Fast forward 30 years and you will only owe 1/300th of $500,000</p>
<p>\(B_{299}\) = 1/300 × 500,000 = $1,666.67</p>
<p>during the last month of your mortgage. You will pay a correspondingly very small amount of interest on your debt then, only 0.00407 × $1,666.67 = $6.79. Your final payment will be</p>
<p>1,666.67 + 6.79 = 1,673.46</p>
<p>Notice that the amount of interest paid on the first month is 300 times larger than the amount of interest paid on the last month! This large difference makes this scheme impractical for many people who can easily afford the final payments but not the initial ones.</p>
<p>In this decreasing payments scheme, the monthly payments decrease linearly by \( 0.00407 × 1,666.67 = $ 6.79 \) per month. Simply adding up all the payments shows that you will end up paying a total of $806,579 on $500,000 of debt. In other words, 1.613 times the original amount.</p>
<h3 id="interest-paid-through-fixed-payments">Interest paid through Fixed Payments</h3>
<p>Most mortgages are paid through fixed (rather than decreasing) monthly payments.</p>
<p>In a fixed-payments mortgage, the home owner borrows an initial amount \( B_0 \) from the bank and makes a total of \(n\) fixed monthly payments of \(P\) dollars to pay off his debt. As the home owner makes his payments, the remaining balance decreases and a progressively smaller fraction of each payment goes towards paying interest. Consequently, more and more money goes towards paying the principal. This is shown in the figure below.</p>
<p>The red areas depict how much interest is being paid each month and the \(a_k\) represent the corresponding amount of principal being paid on the k-th month. The remaining balance after the k-th payment is \(B_k\). After the home owner makes his last payment this balance is zero.</p>
<p><img src="http://dimitri.xyz/assets/fixed-payments1.png" alt="Fixed Payments" /></p>
<p>Let us calculate what the monthly payments should be. Instead of using the interest rate \(i\), we will use the <em>gross </em>interest rate given by \(r=1+i\) to simplify the calculation. For the monthly rate in our example above \(i=0.00407\) and \(r=1.00407\).</p>
<p>The initial balance is \(B_0\). After the first month, the remaining balance will be</p>
<script type="math/tex; mode=display">B_1 = r B_0 - P</script>
<p>The balance at the end of the second month is</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
B_2 & = r B_1 - P \\
& = r ( r B_0 - P) - P \\
& = r^2 B_0 - r P - P
\end{align} %]]></script>
<p>At the end of the third month</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align}
B_3 & = r B_2 - P \\
& = r ( r B_1 - P) - P \\
& = r ( r ( r B_0 - P ) - P) - P) \\
& = r^3 B_0 - r^2 P - r P - P
\end{align} %]]></script>
<p>In short, after each month the remaining balance is being multiplied by the gross interest rate \(r\) and then a payment of \(P\) is subtracted. At the end of \(n\) months the home owner will have paid his mortgage and we will have</p>
<p>\( 0 = B_n = r^n B_0 -r^{n-1} P - r^{n-2} P - \ldots - r P - P \)</p>
<p>\( 0 = r^n B_0 - P ( r^{n-1} + r^{n-2} + \ldots + r + 1) \)</p>
<p><a href="http://dimitri.xyz/topic-pages/gp-sum">Calculating the sum</a> for the standard geometric progression on the right we obtain.</p>
<p>\[ 0 = r^n B_0 - P \frac{r^n - 1}{r - 1} \]</p>
<p>Finally, rearranging the formula for \(P\)</p>
<p>\[ P = B_0 \frac{r^n(r - 1)}{r^n - 1}\]</p>
<p>This formula is all we need to calculate how much your monthly payments should be. For our $500,000 loan with 5% yearly interest we obtain that \(P\)=$2890.69. Simply adding up all 300 payments shows that you will pay a total of $867,207 which is 1.73 times the borrowed amount. Put it another way, each payment is 73% larger, an extra 2890.69 - 1666.67 = $ 1224.02, than it would be for a zero interest rate loan.</p>
<p>We left out the very important tax breaks. The U.S. government gives a tax break for interest paid on mortgages. If we assume that the home owner is in a 33% tax bracket and can benefit from both federal and state tax incentives. The amount paid in interest is reduced by 2/3. In our example, this means that only \(\frac{2}{3}\) × 1224.02 = $ 816.01 is flushed down the toilet each month. Add to this amount the extra costs of owning a home and compare the total to your rent payments. If you are still paying more than the total in rent and you don’t expect home prices to fall, buying a home may be a good move for you.</p>
<h3 id="conclusion">Conclusion</h3>
<p>Don’t fool yourself into thinking buying a home is always a good investment because you will not be paying rent. Calculate how much interest you will be paying, what tax breaks you will get, the extra costs of owning a property (recurring property tax, insurance, maintenance and depreciation and the one-time costs of closing a deal) add the opportunity cost of not having savings and compare tall hat to your rent payments before making a decision. The $816 dollars above only take into account the interest itself.</p>
<p><em>This entry has been updated to include the opportunity cost and make explicit that the calculation above does not include all costs. The layout has also been changed.</em></p>
Sat, 15 May 2010 16:41:01 +0000
http://dimitri.xyz/2010/05/15/flushing-mortgage-payments-down-the-toilet/
http://dimitri.xyz/2010/05/15/flushing-mortgage-payments-down-the-toilet/The Liar’s Paradox<blockquote>
<p><strong>“This sentence is false.”</strong></p>
</blockquote>
<p>If this statement is true, then what it states must be true. It states that it is false. So, if we assume the statement is true we conclude it must be false. If the statement is false, then what it states should be false, but it correctly states that it is false. So, it is true. Thus, if we assume the statement is false, we conclude that it must be true. The statement appears to be neither true nor false and yet it must be either true or false. That is a paradox.</p>
<p>Now consider another statement: “This statement is true.” Is this true or false? Could it be both? Be careful with self-referential statements, they are tricky and can prove anything. The following statement can never be false, “I’m the hottest Brazilian in the world or this statement is false.” Think about it. I have just proved I’m the hottest Brazilian alive!</p>
Mon, 08 Feb 2010 23:04:43 +0000
http://dimitri.xyz/2010/02/08/the-liars-paradox/
http://dimitri.xyz/2010/02/08/the-liars-paradox/counterintuitivecounterintuitive