<![CDATA[Wakatta!]]> 2019-04-25T16:13:09+09:00 https://blog.wakatta.jp/ Octopress <![CDATA[Concrete Mathematics Chapter 2 Exam Exercises]]> 2012-05-05T14:36:00+09:00 https://blog.wakatta.jp/blog/2012/05/05/concrete-mathematics-chapter-2-exam-exercises The exam exercises for this chapter were much trickier than the homework exercises. With the right approach, they could be solved quickly; too bad I never found that right approach…

## Exam Exercises

### $\sum_{k=1}^n(-1)^k\frac{k}{4k^2-1}$

I managed to solve this exercise using sum by parts; the book uses partial fractions, but as far as I can tell I never learned about these. They seem to be part of high school curriculum in America, but might not be in Belgium (or I was sleeping that day).

Sum by parts worked for me because I eventually asked myself what kind of function of $k$ would produce a finite difference $\frac{1}{4k^2-1} = \frac{1}{(2k+1)(2k-1)}$.

By looking at $\Delta (2k)^{\underline m}$, I realised that the solution was fairly simple. I first experimented with $\Delta (2k)^{\underline{-1}}$, then found the right expression:

\begin{aligned} \Delta (2k-2)^{\underline{-1}} & = \frac{1}{2k+1}-\frac{1}{2k-1}\\ & = \frac{2k-1 - 2k - 1}{(2k+1)(2k-1)}\\ & = \frac{-2}{4k^2-1}\\ \end{aligned}

For the sum by parts, I can therefore try to use:

\begin{aligned} \Delta v & = \frac{1}{4k^2-1}\\ v & = -\frac{1}{2}(2k-2)^{\underline{-1}}\\ Ev & = -\frac{1}{2}(2k)^{\underline{-1}}\\ u & = (-1)^kk\\ \Delta u & = (-1)^k +(k+1)(-2 (-1)^k)\\ & = (-1)^k - 2(k+1)(-1)^k\\ & = (-1)^k(1-2k-2)\\ & = -(-1)^k(2k+1)\\ \end{aligned}

The last expression was computed using the product rule for finite difference.

When I put everything into the sum by parts formula, the various blocks felt into place with satisfying “clicks”:

\begin{aligned} \sum (-1)^x\frac{x}{4x^2-1}\delta x = & -\frac{1}{2}(-1)^x x(2x-2)^{\underline{-1}} \\ & - \frac{1}{2} \sum (-1)^x (2x+1)(2x)^{\underline{-1}}\delta x\\ = & -\frac{(-1)^x x}{2(2x-1)} - \frac{1}{2}\sum (-1)^x \delta x &&(2x+1)(2x)^{\underline{-1}}=1\\ = & -\frac{(-1)^x x}{2(2x-1)} + \frac{(-1)^x}{4} + c\\ = & (-1)^x \left(\frac{1}{4} - \frac{x}{2(2x-1)} \right) + c\\ = & (-1)^x \left(\frac{4x - 2 - 4x}{4.2.(2x-1)} \right) + c\\ = & -\frac{(-1)^x}{8x-4} + c\\ \end{aligned}

The answer as a function of $n$ is

\begin{aligned} \sum_{k=1}^{n} (-1)^k\frac{k}{4k^2-1} = & \sum_1^{n+1} (-1)^x\frac{x}{4x^2-1}\delta x\\ = & \left. -\frac{(-1)^x}{8x^4}\right|_1^{n+1}\\ = & -\frac{(-1)^{n+1}}{8n+8-4} - \frac{1}{4}\\ = & \frac{(-1)^n}{8n+4} - \frac{1}{4}\\ \end{aligned}

### 1050

I could not do this exercise; I also failed to see the basic sum that was at the centre of the question (quite prominently).

Even the book solution took me a while to figure out.

The book solution asks how many pairs $a$, $b$ are there such that $\sum_{a\le k \lt b}k = 1050$.

Rewriting this in terms of finite calculus is simple enough:

\begin{aligned} \sum_{a\le k \lt b}k & = \sum_a^b x\delta x\\ & = \frac{1}{2}\left(b^{\underline 2} - a^{\underline 2}\right)\\ & = \frac{1}{2}\left(b(b-1) - a(a-1)\right)\\ & = \frac{1}{2}\left(b^2-b - a^2 + a\right)\\ & = \frac{1}{2}\left(b^2-b + ab - a^2 + a - ab\right)\\ & = \frac{1}{2}\left(b(b-1+a) - a(a-1+b)\right)\\ & = \frac{1}{2}(b-a)(b+a-1)\\ \end{aligned}

Now, one thing to notice is that if a sum of two integers is even, so is the difference, and vice-versa. Therefore the product above is the product of one even and one odd integers.

So we are now looking for ways to express

\begin{aligned} (b-a)(b+a-1)=xy=2.1050=2100=2^2 3 5^2 7\\ \end{aligned}

as a product with $x$ is even and $y$ odd.

To compute how many ways there are to produce a divisor of a number whose prime factors are known, it is enough to see that, for each prime factor $p$ with multiplicity $n_p$, this prime can be left out, or included up to $n_p$ times in the divisor; so for each prime p there are $n_p+1$ possibilities. Summing these over the prime factors give the number of divisors of the original number.

In the present case, this number is 12.

Now that we have the number of possible pairs of $x$ and $y$, we have to go back to $a$ and $b$. There might be a principle involved here, but I don’t really see it, so I’ll just try to rebuild the solution from the ground up.

We already know that either $b-a=x$ and $b+a-1=y$ or $b-a=y$ and $b+a-1=x$. So we already see that whatever expression we need, both $a$ and $b$ will have a $\frac{1}{2}$ added so that their difference cancels out, and their sum produces a $1$ that will cancel the $-1$.

Looking at sum and differences, we have $(x+y)-(x-y) = 2y$ and $(x+y)+(x-y)=2x$, so there will be a sum and a difference involved. We also need $a$ to be smaller than $b$, so it is a candidate for $\frac{1}{2}(x-y)+\frac{1}{2}$. However, we also need it to be positive, so we add an absolute value.

So the candidate solutions are $a=\frac{1}{2}|x-y|+\frac{1}{2}$ and $b=\frac{1}{2}(x+y)+\frac{1}{2}$. Let’s check them:

\begin{aligned} x \gt y: & (\frac{1}{2}(x+y)+\frac{1}{2} - \frac{1}{2}(x-y)-\frac{1}{2}) (\frac{1}{2}(x+y)+\frac{1}{2} + \frac{1}{2}(x-y)+\frac{1}{2} - 1)\\ = & (\frac{1}{2}(2y))(\frac{1}{2}(2x))\\ = & yx\\ x \lt y: & (\frac{1}{2}(x+y)+\frac{1}{2} - \frac{1}{2}(y-x)-\frac{1}{2}) (\frac{1}{2}(x+y)+\frac{1}{2} + \frac{1}{2}(y-x)+\frac{1}{2} - 1)\\ = & (\frac{1}{2}(2x))(\frac{1}{2}(2y))\\ = & xy\\ \end{aligned}

### Riemann’s zeta function

This exercise was more in line with the content of this chapter, and easy enough. Essentially, it is just a matter of changing the order of summation.

#### $\sum_{k\ge 2}(\zeta(k)-1) = 1$

\begin{aligned} \sum_{k\ge 2}(\zeta(k)-1) & = \sum_{k\ge 2}(\sum_{j\ge 1}\frac{1}{j^k}-1)\\ & = \sum_{k\ge 2}(\sum_{j\ge 2}\frac{1}{j^k})\\ & = \sum_{j\ge 2}\sum_{k\ge 2}\frac{1}{j^k}\\ & = \sum_{j\ge 2}\sum_{k\ge 2}(\frac{1}{j})^k\\ \end{aligned}

The inner sum is a geometric progression:

\begin{aligned} \sum_{k\ge 2}(\frac{1}{j})^k & = \lim_{n\rightarrow \infty} \frac{\frac{1}{j^2}-\frac{1}{j^n}}{1-\frac{1}{j}}\\ & = \frac{\frac{1}{j^2}-0}{1-\frac{1}{j}}\\ & = \frac{1}{j(j-1)} \end{aligned}

So we need to solve $\sum_{j\ge 2}\frac{1}{j(j-1)}=\sum_{j\ge 0}\frac{1}{(j+1)(j+2)}$, which we already saw as well, and the value is indeed 1.

#### $\sum_{k\ge 1}(\zeta(2k)-1)$

Using the same approach:

\begin{aligned} \sum_{k\ge 1}(\zeta(2k)-1) & = \sum_{k\ge 1} (\sum_{j\ge 1}\frac{1}{j^{2k}}-1)\\ & = \sum_{k\ge 1}\sum_{j\ge 2}\frac{1}{j^{2k}}\\ & = \sum_{j\ge 2}\sum_{k\ge 1}\frac{1}{j^{2k}}\\ & = \sum_{j\ge 2}\sum_{k\ge 1}(\frac{1}{j^2})^k\\ \end{aligned}

Once again, we have a geometric progression, just as easy to solve as the previous one:

\begin{aligned} \sum_{k\ge 1}(\frac{1}{j^2})^k & = \lim_{n\rightarrow \infty} \frac{\frac{1}{j^2}-\frac{1}{j^{2n}}}{1-\frac{1}{j^2}}\\ & = \frac{\frac{1}{j^2}-0}{1-\frac{1}{j^2}}\\ & = \frac{1}{j^2-1}\\ & = \frac{1}{(j11)(j+1)}\\ & = \frac{j}{(j-1)j(j+1)}\\ & = j (j-2)^{\underline{-3}}\\ \end{aligned}

Now we have $\sum_{j\ge 2} j(j-2)^{\underline{-3}}$, which can be summed by parts. I chose:

\begin{aligned} \Delta v & = (x-2)^{\underline{-3}}\\ v & = -\frac{(x-2)^{\underline{-2}}}{2}\\ Ev & = -\frac{(x-1)^{\underline{-2}}}{2}\\ u & = x\\ \Delta u & = 1\\ \end{aligned}

and now the sum by part:

\begin{aligned} \sum_2^{\infty} x(x-2)^{\underline{-3}}\delta x & = -\frac{x}{(x-1)x} + \frac{1}{2}\sum_2^\infty (x-1)^{\underline{-2}}\delta x\\ & = \left. -\frac{1}{2(x-1)}-\frac{1}{2x}\right|_2^\infty\\ & = \frac{3}{4}\\ \end{aligned}

### $\sum_{k\ge 0}\min(k, x \dot{-} k)$

I am not really sure of my solution here, despite the fact that the outcome is identical to the book; my method is somewhat different from the book’s, and using some concepts from Chapter 3.

To prove that the two sums have the same value, I just evaluate each.

First, a basic observation on the $\dot{-}$ operator: if $b\le 0$, $a\dot{-}b$ is at least zero, and if $a\le 0$, at most $a$ (otherwise it is always zero).

So if $a\le 0$ or $a\le b$, $a\dot{-}b=0$.

I will now assume that $x\ge 0$; otherwise both sums are zero.

### $\sum_{k\ge 0}\min(k,x\dot{-}k)$

First, I replace the infinity sum by a finite one: as seen above, if $k\gt x$, $\min(k,x\dot{-}k)=0$, so

\begin{aligned} \sum_{k\ge 0}\min(k,x\dot{-}k) & = \sum_{0\le k \le x}\min(k,x-k)\\ \end{aligned}

I then try to remove the $\min$ operator:

\begin{aligned} k & \le x - k\\ 2k & \le x\\ k & \le \frac{x}{2}\\ \end{aligned}

so that

\begin{aligned} \sum_{0\le k \le x}\min(k,x-k) & = \sum_{0\le k \le \frac{x}{2}} k & + \sum_{\frac{x}{2}\lt k \le x} x - k \end{aligned}

The general idea here is to find a way to eliminate the $k$ terms, by shifting each term in the second sum by an equal amount.

At this point, the number of terms in each sum is important: the total number of terms is the number of $k$ such that $0\le k\le x$; clearly this is $\lfloor x \rfloor + 1$.

The number of terms in the first sum is similarly $\lfloor\frac{x}{2}\rfloor + 1$.

The number of terms in the second sum is therefore $\lfloor x \rfloor - \lfloor \frac{x}{2} \rfloor$. To find an expression for this, it helps to look at integral $x$ first.

If $x=4$, $\lfloor 4 \rfloor - \lfloor \frac{4}{2} \rfloor = 2$.

If $x=5$, $\lfloor 5 \rfloor - \lfloor \frac{5}{2} \rfloor = 5-2 = 3$.

More generally, if $2n\le \lt 2n+1$, there will be $n+1$ terms in the first sum, and $n$ in the second. And if $2n-1\le x\le 2n$, there will be $n$ terms in each sum.

A first attempt for the number of terms in the second sum is $\lfloor \frac{x}{2} \rfloor$, but this only works for $x$ such that $2n\le x\lt 2n+1$. But it is easy to see that $\lfloor \frac{x+1}{2} \rfloor$ will always work: if $2n\le x\lt 2n+1$, $\lfloor \frac{x+1}{2} \rfloor = \lfloor\frac{2n+1}{2}\rfloor = n$, and if $2n-1\le x\lt 2n$, $\lfloor \frac{x+1}{2}\rfloor = \lfloor\frac{2n}{2}\rfloor = n$.

Using this value to “shift” the terms of the second sum:

\begin{aligned} \sum_{0\le k \le \frac{x}{2}} k + \sum_{\frac{x}{2}\lt k \le x} x - k & = \sum_{0\le k \le \frac{x}{2}} k + \sum_{\frac{x}{2}-\lfloor \frac{x+1}{2}\rfloor\lt k\le x- \lfloor \frac{x+1}{2}\rfloor} (x-\left\lfloor \frac{x+1}{2}\right\rfloor)-k\\ & = \sum_{0\le k \le \lfloor \frac{x}{2} \rfloor} k + \sum_{\frac{x}{2}-\lfloor \frac{x+1}{2}\rfloor\lt k\le \lfloor \frac{x}{2}\rfloor} (x-\left\lfloor \frac{x+1}{2}\right\rfloor)-k\\ \end{aligned}

Now the question is whether the new $k$ in the second sum cancel the $k$ in the first sum. Once again, let’s check the cases:

• if $2n\le x\lt 2n+1$, $\frac{x}{2} \gt \lfloor\frac{x+1}{2}\rfloor$, so $k\ge 1$: this means the $\lfloor \frac{x}{2}\rfloor$ non-zero $k$ terms of the first sum are cancelled by the $\lfloor \frac{x+1}{2}\rfloor = \lfloor \frac{x}{2}\rfloor$ non zero $k$ terms of the second sum; the zero term can safely be ignored.
• if $2n-1\le x\lt 2n$, $\frac{x}{2} \lt \lfloor\frac{x+1}{2}\rfloor$, so $k\ge 0$; this means the $\lfloor \frac{x}{2}\rfloor+1$ $k$ terms (including 0) are all cancelled by the $\lfloor \frac{x+1}{2}\rfloor = \lfloor \frac{x}{2}\rfloor+1$ $k$ terms of the second sum.

So we can safely rewrite the sum as

\begin{aligned} \sum_{\frac{x}{2} - \lfloor \frac{x+1}{2}\rfloor \lt k \le \lfloor\frac{x}{2}\rfloor}(x-\left\lfloor \frac{x+1}{2}\right\rfloor)\\ \end{aligned}

And, as we already know the number of terms is $\left\lfloor \frac{x+1}{2}\right\rfloor$, the sum value is

\begin{aligned} \left\lfloor \frac{x+1}{2}\right\rfloor \left(x- \left\lfloor \frac{x+1}{2}\right\rfloor\right) \end{aligned}

#### $\sum_{k\le 0}(x\dot{-}(2k+1))$

This sum is much easier than the previous one. First I remove the $\dot{-}$ operator. I need $2k+1\le x$:

\begin{aligned} 2k+1 & \le x\\ k & \le \frac{x-1}{2}\\ \end{aligned}

This gives me $\lfloor \frac{x+1}{2}\rfloor$ number of terms.

So I can extract $x$ and work only on $k$

\begin{aligned} \left\lfloor \frac{x+1}{2}\right\rfloor(x-1) + 2 \sum_{0\le k\lt \lfloor \frac{x+1}{2}\rfloor}k & = \left\lfloor \frac{x+1}{2}\right\rfloor(x-1)+ \left\lfloor\frac{x-1}{2}\right\rfloor\left\lfloor\frac{x+1}{2}\right\rfloor\\ & = \left\lfloor \frac{x+1}{2}\right\rfloor (x - 1 - \left\lfloor \frac{x-1}{2}\right\rfloor)\\ & = \left\lfloor \frac{x+1}{2}\right\rfloor (x - \left\lfloor \frac{x+1}{2}\right\rfloor)\\ \end{aligned}

So the both expressions have the same value.

## Bonus Questions

### $\vee$ Laws

I find it easier to work from the basic $\min$ operator, and scale that up to $\vee$ (the formulas can always be derived mechanically from the underlying algebra).

$\min$ is an associative and commutative operator, is distributive with addition, and its neutral element is $\infty$.

I will not repeat the full list of formulas; the book has them already.

### Undefined infinite sums

An undefined sum, according to (2.59), is one in which both the positive sum and the negative sums are unbounded.

I define $K^+$ as $\{k\in K| a_k \gt 0\}$ and $K^-$ as $\{k\in K| a_k\lt 0\}$.

The point about unbounded sums is that, even if I drop a large number of terms, there are always enough remaining terms to add up to an arbitrary amount.

For instance, given $n$ even and $E_n=K^+\setminus F_{n-1}$, it is always true that I can find $E'_n\subset E_n$ such that

\begin{aligned} \sum_{k\in E'_n}a_k \ge A^+ - \sum_{k\in F_{n-1}}a_k\\ \end{aligned}

So if I define $F_n = F_{n-1} \cup E'_n$, $\sum_{k \in F_n}a_k\ge A^+$.

And when $n$ is odd, with $O_n=K^-\setminus F_{n-1}$, I can always find a subset $O'_n\subset O_n$ such that

\begin{aligned} \sum_{k\in O'_n}a_k \le A^- - \sum_{k\in F_{n-1}}a_k\\ \end{aligned}

(with $K^-$, the $a_k$ are smaller than zero, so the sum can be arbitrarily small).

If I define $F_n = F_{n-1} \cup O'_n$, $\sum_{k\in F_n}a_k \le A^-$.

As I could not do the other bonus questions, this completes Chapter 2.

]]>
<![CDATA[Now I'm Blushing...]]> 2012-05-02T16:14:00+09:00 https://blog.wakatta.jp/blog/2012/05/02/now-im-blushing-dot-dot-dot Eric Redmond, one of the authors of Seven Databases in Seven Weeks apparently found my blog, read it and posted a nice comment on his blog.

Now I have no choice but to urge all my three readers (that includes you, Mom) to go and buy this great book. Even if you already own a copy.

]]>
<![CDATA[Concrete Mathematics Chapter 2 Homework Exercises]]> 2012-05-02T13:13:00+09:00 https://blog.wakatta.jp/blog/2012/05/02/concrete-mathematics-chapter-2-homework-exercises It has been a long time since I wrote about this book; I had worked the solutions more than a month ago, but then life happened, and I could not find the time (or, perhaps, more accurately the courage) to typeset my notes…

Anyway, I do have time now and am eager to go on with Chapter 3; but first let’s finish Chapter 2. Today the homework exercises, and very soon the exams and bonus (at least the ones I could do) exercises.

## Homework

### $2T_n = nT_{n-1}+3\cdot n!$

This exercise is not tricky in any way; just follow the method and the result is guaranteed.

The recurrence equations are

\begin{aligned} T_0 & = 5\\ 2T_n & = nT_{n-1}+3\cdot n!\\ \end{aligned}

The $a_n$, $b_n$ and $c_n$ series are:

\begin{aligned} a_n & = 2\\ b_n & = n\\ c_n & = 3\cdot n!\\ \end{aligned}

The summation factor

\begin{aligned} s_n & = \frac{a_{n-1}\dots a_1}{b_n\dots b_2}s_1\\ & = \frac{2^{n-1}}{n!}s_1\\ \end{aligned}

After experimenting a bit, I found that $s_1 = 2$ is slightly easier to work with, so the summation factor is $s_n = \frac{2^n}{n!}$.

With $S_n = \frac{2^n+1}{n!} T_n$, the recurrence equation becomes

\begin{aligned} S_n & = S_{n-1} + 3\cdot 2^n\\ & = S_0 + 3\sum_{k=1}^n 2^k\\ \end{aligned}

The sum is well-known, with $\sum_{k=0}^n 2^k = 2^{n+1}-1$, so $\sum_{k=1}^n 2^k = 2^{n+1}-2$.

Going back to $T_n$, we have

\begin{aligned} T_n & = \frac{n!(10+3(2^{n+1}-2))}{2^{n+1}}\\ & = \frac{n!(5+3(2^n-1))}{2^n}\\ & = \frac{n!(5 + 3\cdot 2^n - 3)}{2^n}\\ & = \frac{n!(3\cdot 2^n + 2)}{2^n}\\ & = 3\cdot n! + \frac{n!}{2^{n-1}}\\ \end{aligned}

### $\sum_{k=0}^n kH_k$

Using the perturbation method:

\begin{aligned} S_{n+1} = S_n + (n+1) H_{n+1} & = 0 + \sum_{k=1}^{n+1} k H_k\\ & = \sum_{k+1=1}^{n+1}(k+1)H_{k+1}&&k\leftarrow k+1\\ & = \sum_{k=0}^n (k+1) (H_k + \frac{1}{k+1})\\ & = \sum_{k=0}^n k H_k + \sum_{k=0}^n \frac{k}{k+1} + \sum_{k=0}^n H_k + \sum_{k=0}^n \frac{1}{k+1}\\ & = S_n + \sum_{k=0}^n\frac{k+1}{k+1} + \sum_{k_0}^n H_k\\ & = S_n + n+1 + \sum_{k=0}^n H_k\\ \end{aligned}

so $\sum_{k=0}^n H_k$ is

\begin{aligned} \sum_{k=0}^n H_k & = (n+1)H_{n+1} - (n + 1)\\ & = (n+1)H_n + (n+1)\frac{1}{n+1} - n - 1\\ & = (n+1)H_n + 1 - n - 1\\ & = (n+1)H_n - n\\ \end{aligned}

### More perturbation method

This exercise is just tricky in the very first step (working out the exact meaning of $S_{n+1}$), as the sign of the terms change depending of whether $n$ is odd or even.

This means that instead of the book equation (2.24) $S_{n+1} = S_n + a_{n+1}$, we find something like $S_{n+1} = a_{n+1} - S_n$.

#### $S_n = \sum_{k=0}^n (-1)^{n-k}$

First, the left hand part of the equation:

\begin{aligned} S_{n+1} & = \sum_{k=0}^n (-1)^{n+1-k} + (-1)^{n+1-n-1}\\ & = -\sum_{k=0}^n (-1)^{n-k} + 1\\ & = 1 - S_n\\ \end{aligned}

Then, the right hand part:

\begin{aligned} S_{n+1} & = (-1)^{n+1} + \sum_{k=1}^{n+1} (-1)^{n+1-k}\\ & = (-1)^{n+1}+\sum_{k+1=1}^{n+1}(-1)^{n+1-k-1}&&k\leftarrow k+1\\ & = (-1)^{n+1} + \sum_{k=0}^n(-1)^{n-k}\\ & = (-1)^{n+1} + S_n\\ \end{aligned}

Putting both together, $S_n = \frac{1-(-1)^{n+1}}{2}$, or, as the book states, $S_n = [n \text{ is even}]$.

#### $T_n = \sum_{k=0}^n (-1)^{n-k}k$

Using the same approach as above:

\begin{aligned} T_{n+1} & = \sum_{k=0}^{n+1}(-1)^{n+1-k}k\\ & = -\sum_{k=0}^n(-1)^{n-k}k + (-1){n+1-n-1}(n+1)\\ & = n+1-T_n\\ \end{aligned}

and

\begin{aligned} T_{n+1} & = \sum_{k=0}^{n+1}(-1)^{n+1-k}k\\ & = (-1)^{n+1}0 + \sum_{k=1}^{n+1}(-1)^{n+1-k}k\\ & = 0 + \sum_{k+1=1}^{n+1}(-1)^{n+1-k-1}{k+1}&&k\leftarrow k+1\\ & = \sum_{k=0}^n(-1)^{n-k}k + \sum_{k=0}^n(-1)^{n-k}\\ & = T_n + S_n\\ \end{aligned}

Together:

\begin{aligned} T_n & = \frac{n+1-S_n}{2}\\ & = \frac{1}{2}\left(n+[n \text{ is odd}] \right)\\ & = \left\lceil \frac{n}{2} \right\rceil\\ \end{aligned}

The last version uses the ceiling operator from Chapter 3.

#### $U_n = \sum_{k=0}^n (-1)^{n-k}k^2$

It will probably not be a surprised to find $U_n$ expressed in terms of $S_n$ and $T_n$.

\begin{aligned} U_{n+1} & = \sum_{k=0}^{n+1}(-1)^{n+1-k}k^2\\ & = \sum_{k=0}^n(-1)^{n+1-k}k^2 + (-1)^{n+1-n-1}(n+1)^2\\ & = -1\sum_{k=0}^n(-1)^{n-k}k^2 + (n+1)^2\\ & = (n+1)^2 - U_n\\ \end{aligned}

and

\begin{aligned} U_{n+1} & = \sum_{k=0}^{n+1}(-1)^{n+1-k}k^2\\ & = (-1)^{n+1}0 + \sum_{k=1}^{n+1}(-1)^{n+1-k}k^2\\ & = 0 + \sum_{k+1=1}^{n+1}(-1)^{n+1-k-2}(k+1)^2&& k\leftarrow k+1\\ & = \sum_{k=0}^n(-1)^{n-k}(k^2+2k+1)\\ & = U_n + 2T_n + S_n\\ \end{aligned}

With $2T_n = n+1-S_n$, this produces $U_{n+1} = U_n + n + 1$, which gives the answer away, but let’s just continue with the current method.

Putting both side together:

\begin{aligned} U_n & = \frac{(n+1)^2 - (n+1)}{2}\\ & = \frac{(n+1)(n+1) - (n+1)}{2}\\ & = \frac{n(n+1)}{2}\\ \end{aligned}

### Lagrange’s Identity

First, I look for a usable double sum. I use the fact that for any $j, k$, $j < k$, $(a_jb_k - a_kb_j) = -(a_kb_j - a_jb_k)$ and $(A_jB_k - A_kB_j)= - (A_kB_j - A_jB_k)$. This means that, with $s_{j,k} = (a_jb_k - a_kb_j)(A_jB_k - A_kB_j)$, $s_{j,k} = s_{k,j}$.

There is also the fact that $s_{j,j} = 0$, so now I can complete the sum to the whole rectangle:

\begin{aligned} \sum_{1\le j,k\le n}s_{j,k} & = \sum_{1\le j\lt k\le n}s_{j,k} + \sum_{1\le j = k \le n} s_{j,k} + \sum_{1\le k \lt j \le n}s_{k,j}\\ & = \sum_{1\le j \lt k \le n}s_{j,k} + 0 + \sum_{1\le j \lt k \le n}s_{j,k}\\ & = 2\sum_{1\le j \lt k \le n}s_{j,k}\\ \end{aligned}

The expansion of $s_{j,k}$ is $a_jA_jb_kB_k - a_jB_jA_kb_k - A_jb_ja_kB_k + b_jB_ja_kA_k$. Showing the summation just for the first one (the other three are identical):

\begin{aligned} \sum_{1\le j, k \le n} a_jA_jb_kB_k & = \sum_{j=1}^n\sum_{k=1}^n a_jA_jb_kB_k\\ & = \sum_{j=1}^n a_jA_j \left(\sum_{k=1}^n b_kB_k \right)\\ & = \left(\sum_{j=1}^n a_jA_j\right)\left(\sum_{k=1}^n b_kB_k \right)\\ & = \left(\sum_{k=1}^n a_kA_k\right)\left(\sum_{k=1}^n b_kB_k \right)\\ \end{aligned}

Putting it all together:

\begin{aligned} \sum_{1\le j\lt k\le n}(a_jb_k - a_kb_j)(A_jB_k - A_kB_j) & = \left(\sum_{k=1}^n a_kA_k\right)\left(\sum_{k=1}^n b_kB_k\right) - \left(\sum_{k=1}^n a_kB_k\right)\left(\sum_{k=1}^n A_kb_k\right)\\ \end{aligned}

In particular, with $a_k = A_k$ and $b_k = B_k$, the sum is $\left(\sum_{k=1}^n a_k^2 \right)\left(\sum_{k=1}^n b_k^2 \right) - 2 \left(\sum_{k=1}^n a_kb_k \right)$.

### $\sum_{k=1}^n \frac{2k+1}{k(k+1)}$

#### Partial fractions

\begin{aligned} \sum_{k=1}^n\frac{2k+1}{k(k+1)} & = \sum_{k=1}^n(k+(k+1))\left(\frac{1}{k}-\frac{1}{k+1} \right)\\ & = \sum_{k=1}^n\left(\frac{k}{k} + \frac{k+1}{k} - \frac{k}{k+1} - \frac{k+1}{k+1} \right)\\ & = \sum_{k=1}^n \frac{k+1}{k} - \sum_{k=1}^n \frac{k}{k+1}\\ & = n + H_n - \sum_{k=1}^n \frac{k}{k+1} - \sum_{k=1}^n\frac{1}{k+1} + \sum_{k=1}^n\frac{1}{k+1}\\ & = n + H_n - \sum_{k=1}^n \frac{k+1}{k+1} + \sum_{k=1}^n \frac{1}{k+1}\\ & = H_n + \sum_{k-1=1}^n \frac{1}{k}&&k\leftarrow k-1\\ & = H_n + \sum_{k=2}^{n+1} \frac{1}{k}\\ & = H_n + H_{n+1} - 1\\ & = H_h + H_n + \frac{1}{n+1} - \frac{n+1}{n+1}\\ & = 2H_n - \frac{n}{n+1}\\ \end{aligned}

#### Sum by parts

Using

\begin{aligned} \Delta v & = \frac{1}{k(k+1)} = (k-1)^{\underline{-2}}\\ v & = -(k-1)^{\underline{-1}}\\ Ev & = -k^{\underline{-1}}\\ u & = 2k+1\\ \Delta u & = 2\\ \end{aligned}

First the sum by part

\begin{aligned} \sum \frac{2x+1}{x(x+1)}\delta x & = -(2x+1)(x-1)^{\underline{-1}} + 2 \sum x^{\underline{-1}} \delta x + c\\ & = -\frac{2x+1}{x} + 2 H_x + c\\ \end{aligned}

Then the evaluation

\begin{aligned} \sum_{k=1}^n\frac{2k+1}{k(k+1)} & = \left. -\frac{2x+1}{x}+2H_x\right|_1^{n+1}\\ & = -\frac{2(n+1)+1}{n+1} + 2 H_{n+1} + 2 + 1 - 2\\ & = 2 H_{n+1} + 1 - 2 - \frac{1}{n+1}\\ & = H_{n+1} + H_n - 1\\ & = 2H_n - \frac{n}{n+1}\\ \end{aligned}

### $\sum_{1\le k \lt n}\frac{H_k}{(k+1)(k+2)}$

For the sum by part, I use

\begin{aligned} \Delta v & = x^{\underline{-2}}\\ v & = -x^{\underline{-1}}\\ Ev & = -(x+1)^{\underline{-1}}\\ u & = H_x\\ \Delta u & = x^{\underline{-1}}\\ \end{aligned}

The sum by part

\begin{aligned} \sum_{k=1}^n H_x x^{\underline{-2}} \delta x & = -H_x x^{\underline{-1}} + \sum (x+1)^{\underline{-1}}x^{\underline{-1}}\delta x + c\\ & = -H_x x^{\underline{-1}} + \sum x^{\underline{-2}} \delta x + c\\ & = -H_x x^{\underline{-1}} - x^{\underline{-1}} + c\\ & = -(H_x + 1) x^{\underline{-1}} + c\\ \end{aligned}

The evaluation is

\begin{aligned} \sum_{0\le k \lt n} \frac{H_k}{(k+1)(k+2)} & = \left. -\frac{H_x + 1}{x+1} \right|_0^n\\ & = 1 - \frac{H_n + 1}{n+1}\\ \end{aligned}

### Product laws

I don’t think I listed all the laws for this exercise, as the only complete list for the sum laws in the book is in the answer for this exercise.

I will not repeat it here; suffice to say that when we replace sum by product, the laws can be updated by replacing product by exponentiation, and sum by product.

### $\prod_{1\le j \le k \le n}a_ja_k$

While it took me a few false starts, I eventually found that the triangular completion used for (2.32) works here as well.

\begin{aligned} \left(\prod_{1\le j\le k \le n} a_ja_k \right)^2 & = \left(\prod_{1\le j,k \le n}a_ja_k \right) \left(\prod_{1\le j=k \le n}a_ja_k\right)\\ & = \left(\prod_{1\le j,k \le n} a_j\right) \left(\prod_{1\le j,k \le n} a_k\right) \left(\prod_{1\le k \le n}a_k^2\right)\\ & = \left(\prod_{1\le k \le n} a_k^n\right) \left(\prod_{1\le j \le n} a_j^n\right) \left(\prod_{1\le k \le n}a_k^2\right)\\ & = \prod_{1\le k\le} a_k^{2n+2}\\ \end{aligned}

So $\prod_{1\le j \le k \le n}a_ja_k = \left(\prod_{1\le k\le} a_k\right)^{n+1}$.

### $\sum_{k=1}^n \frac{(-2)^{\underline k}}{k}$

As suggested, I worked out $\Delta c^{\underline x}$:

\begin{aligned} \Delta c^{\underline x} & = c^{\underline{x+1}} - c^{\underline x}\\ & = c(c-1)\cdots (c-x+1)(c-x) - c(c-1)\cdots (c-x+1)\\ & = c^{\underline x}(c-x-1)\\ \end{aligned}

I did not immediately saw the relation between this and the original sum. First I rewrote the original sum to remove the division:

\begin{aligned} \sum_{k=1}^n \frac{(-2)^{\underline k}}{k} & = \sum_{k=1}^n \frac{(-2)^{\underline{k-1}}(-2-k+1)}{k}\\ & = -\sum_{k=1}^n \frac{(-2)^{\underline{k-2}(k+1)(-2-k+2)}}{k}\\ & = \sum_{k=1}^n \frac{(-2)^{\underline{k-2}(k+1)(k)}}{k}\\ & = \sum_{k=1}^n (-2)^{\underline{k-2}}(k+1)\\ \end{aligned}

Now the relation is visible. So we have

\begin{aligned} \sum_1^{n+1}\frac{(-2)^{\underline x}}{x}\delta x & = \sum_1^{n+1}(-2)^{\underline x}(x+1)\delta x\\ x & = \left. - (-2)^{\underline{x-2}}\right|_1^{n+1}\\ & = (-2)^{\underline{-1}} - (-2)^{\underline{n-1}}\\ & = -1 - (-2)(-3) \cdots (-n)\\ & = (-1)^n n! - 1\\ \end{aligned}

### Incorrect derivation

As stated in the book, the infinite sums do no converge, so the third step is invalid.

And that’s all for today.

]]>
<![CDATA[Machine Learning in Action - Naïve Bayes]]> 2012-04-09T13:51:00+09:00 https://blog.wakatta.jp/blog/2012/04/09/machine-learning-in-action-naive-bayes I am currently reading Machine Learning in Action, as I need something light between sessions with Concrete Mathematics. This book introduces a number of important machine learning algorithms, each time with a complete implementation and one or more test data sets; it also explains the underlying mathematics, and provides information about additional reference material (mostly heavier and more expensive books).

However, in Chapter 4 about Naïve Bayes classifiers, I didn’t see how the implementation derived by the maths. Eventually, I confirm that it could not, and try to correct it.

It is of course possible that the implementation is eventually correct, and derives from more advanced theoretical concepts or practical concerns, but the book mentions neither; on the other hands, I found papers (here or here) that seem to confirm my corrections.

Everything that follows assumes the book’s implementation was wrong. Humble and groveling apologies to the author if it was not.

## What exactly is the model

The book introduces the concept of conditional probability using balls in buckets. This makes the explanation clearer, but this is just one possible model; each model (or distribution) uses dedicated formulas.

The problem is that the book then uses set of words or bags of words as it these were the same underlying model, which they are not.

### Set of words

If we are only interested in whether a given word is present in a message or not, then the correct model is that of a biased coin where tails indicate the absence of the word, and heads its presence.

This is also known as a Bernoulli trial, and the estimator for the probability of presence is the mean presence: the number of documents in which the word is present, divided by the total number of documents.

The book algorithm does not implement this model correctly, as its numerator is the count of documents in which the word is present (correct), but the denominator is the total number of words (incorrect).

### Bag of words

If we want to consider the number of times a word is present in messages, then the balls in buckets model is correct (it is a also known as Categorical distribution), and the code in the book adequately implements it.

## There is a word for it: Additive Smoothing

The book then improves the algorithm in two different ways. One is the use of logarithms to prevent underflow. The other is to always use one as the basic count for words, whether they are present or not.

This is in fact not so much a trick as a concept called Additive smoothing, where a basic estimator $\theta_i = \frac{w_i}{N}$ is replaced by $\hat{\theta}_i = \frac{w_i + \alpha}{N + \alpha d}$

$\alpha$ is a so-called smoothing parameter, and $d$ is the total number of words.

If the model is Bernoulli trial, $w_i$ is the number of documents where word $i$ is present, and $N$ is the total number of documents.

If the model is categorical distribution, $w_i$ is the total count of word $i$ is the documents and $N$ is the total count of words in the documents.

As we are interested in $P(w_i|C_j)$ (with $C_0, C_1$ the two classes we are building a classifier for), $N$ above is restricted to documents in the relevant class; $\alpha$ and $d$ are independent of classes.

So the correct formula becomes

\begin{aligned} \hat{\theta}_{i,j} = \frac{x_i,j+\alpha}{N_j+\alpha d}\\ \end{aligned}

With $\alpha=1$ as a smoothing parameter, the book should have used numWords instead of 2.0 as an initial value for both p0Denom and p1Denom.

## Putting it together

The differences with the code from the book are minor: first I introduce a flag to indicates whether I’m using set of words (Bernoulli trials) or bags of words (categorical distribution) as a model. Then I initialise p0Denom and p1Denom with numWords as explained above; finally I check the bag flag to know what to add to either denominators.

## Evaluation

For the Spam test, the book version has an average error of 6%. The rewritten version has an error between 3% and 4%. The Spam test uses messages as set, for which my version is the most different.

For the New-York/San Francisco messages classification, I did not measure any difference in error rates; this test uses messages as bags, for which the book version was mostly correct (the only difference was in the denominators).

## So what?

OK, well, but the book algorithm still works, at least on the original data.

But how well exactly would it work with other data? As the algorithm does not seem to implement any kind of sound model, is there any way to quantify the error we can expect? By building on theoretical foundations, at least we can quantify the outcome, and rely on the work of all the brilliant minds who improved that theory.

Theories (the scientific kind, not the hunch kind) provide well studied abstractions. There are always cases where they do not apply, and other cases where they do, but only partially or imperfectly. This should be expected as abstractions ignore part of the real world problem to make it tractable.

Using a specific theory to address a problem is very much similar to looking for lost keys under a lamppost: maybe the keys are not there, but that’s where the light is brightest, so there is little chance to find them anywhere else anyway.

So far, this was the only chapter where I had anything bad to say about the book. And even then, it was not that bad.

The rest of the book is very good; the underlying concepts are well explained (indeed, that’s how I found the problem in the first place), there is always data to play with, and the choice of language and libraries (Python, Numpy and matplotlib) is very well suited to the kind of exploratory programming that makes learning much easier.

So I would recommend this book as an introduction to this subject, and I’m certainly glad I bought it.

]]>
<![CDATA[Seven Databases in Seven Weeks Wrapping Up]]> 2012-03-15T10:57:00+09:00 https://blog.wakatta.jp/blog/2012/03/15/seven-databases-in-seven-weeks-wrapping-up This has lasted a little bit longer than seven weeks (the release schedule of the beta versions did not help; my day job did not help either), but finally I finished the book.

### Pro

I liked that the book started with PostgreSQL. All too often, I am put of by the amazingly uninformed criticisms of the NoSQL crowd about relational databases; this left me with the general impression that a younger generation of engineers was just too ignorant to figure SQL out, so they build something new (without the benefits of decades of experience…).

By having a balance approach, the book cleared this misconception (Hadoop, the Definitive Guide also has a balance coverage in its introduction).

Each database’s strengths and weaknesses are correctly (as far as I can tell) reported, along with its position in the CAP triangle, and intended or ideal usage.

A recapitulative (but already partially incorrect, at least in the 5.0 beta version) overview of all the databases properties in Appendix A is also very useful.

### Cons

Well, this is not exactly a problem of the book itself, but rather of the tools it covers: the rapid and sometimes radical changes in some of the databases meant that the technical information in the book was already obsolete.

The book’s intention is not to be a detailed tutorial; for instance, they skip installations (really, most technical books should skip installation and go straight to setup and use; think of the number of trees that would save), but the search for corrections was heavily taxing my already sparse free time.

All this will eventually improve, as the tools and documentation mature; right now using them is a bit too involved for the broad but shallow approach this book follows.

Compared to Seven Languages in Seven Weeks, I found this book more challenging. But this is perhaps a consequence of my prior exposure to a variety of languages and programming concepts; I suspect many people may find this book much easier.

### Recommendation

Of all the books I have read recently, this is the one that changed and enlarged my views the most.

If you are, like me, a traditional software engineer with years of experience in relational databases but little exposure to newer kind of storage, you will benefit from this presentation of many databases and solution designs.

If, however, you already come from the NoSQL database and have experience in a few of the covered tools, this one book might not be the ideal one to convince you of the strengths of PostgreSQL. The problem with relational databases is that, having been the defacto standard storage solutions for decades, nobody remember why they became popular in the first place (they actually replaced databases that looked pretty much like document or graph databases, only much more primitive).

Still, given its price, as a broad introduction to many different data tools and techniques, this book is hard to beat. I certainly am glad for having read it, and I think you would be too.

]]>
<![CDATA[Seven Databases in Seven Weeks Redis Day 3]]> 2012-03-14T15:48:00+09:00 https://blog.wakatta.jp/blog/2012/03/14/seven-databases-in-seven-weeks-redis-day-3 Wow, almost two months since I wrote Day 2, and more than one since the last post in this series… Time to bring it to an end.

Today is less about Redis (indeed, it is hardly used at all), and more about a concept: Polyglot Persistence, and about an implementation that showcases the concept.

In fact, I spent most of my time browsing the documentation of Node.js, the library/framework the authors used to build the demo application.

## Polyglot Persistence

Polyglot Persistence, the use of several kinds of storage systems in a project, makes even more sense than Polyglot Programming (the use of several languages in a project).

While languages are, by and large, equivalent in expressive power, and mostly a matter of choice, culture, or comparative advantage (some languages favour small teams, other large ones), storage systems are sufficiently different that they are not interchangeable.

Once the idea of eventual consistency takes root, it is only a simple extension to view the data as services available from a number of sources, each optimised for its intended use (instead of a single, default source that only partially meets the more specialised needs), and with its own update cycles.

The problem, of course, is that it introduces several levels of complexity: development, deployment, monitoring, and a dizzying range of potential errors, failures, …

## Polyglot Persistent Service

The implementation described in the book is small enough to fit in less than 15 pages, yet rich enough to show what is possible.

The databases are (with the versions I used):

• Redis 2.4.8
• CouchDB 1.1.1
• Neo4j Community 1.6.1

and the glue language is Node.js.

### Redis

Redis is used first as initial storage for the first data take-on. It is then used to track the transfer of data between CouchDB and the other databases, and finally to support auto-completion of band names.

### CouchDB

CouchDB is intended as the System Of Records (i.e. master database) for the system. Data is meant to be loaded into CouchDB first, then propagated to the other databases.

Beside that, it is not used much, and after the exercises, not used at all…

### Neo4j

Neo4j keeps a graph of bands, members, and instruments (or roles), and their relationships.

### Node.js

Node.js is a framework/library for JavaScript based on the concept of event-based programming (similar to, but perhaps more radical than, Erlang). All I/O is done in continuation-passing style, which means that whenever a I/O operation is initiated, one of the argument is a function to handle whatever the operation produces (or deal with the errors).

This is good from a performance point of view, but it is of course more complex to design and code with. Still, it looks like a fun tool to glue various servers together.

### Book Code Fixes

I had to fix some of the code from the authors (nothing serious, and all reported in the errata):

• populate_couch.js: the trackLineCount has an off-by-one error. The check for completion should be totalBands <= processedBands
• bands.js: the initialisation of membersQuery in the function for the /band route has a syntax error. It should be

### Updating the Code

The book uses a now dated version of Neo4j, so the queries do not work. The shortcut to access a node by index does not work anymore, and the uniqueObject step has been replaced by dedup.

Here are the updated relevant portions:

and

## Exercises

I’m not sure what the second homework exercise was supposed to be about: Neo4j already contains information about members and memberships. Perhaps it dates from an early draft, before this chapter’s code evolved into what it is now. In any case, the first exercise had enough Neo4j anyway.

### Adding Band Member’s start and end dates

The start and end dates for memberships in bands is sometimes provided; the purpose of this exercise is to use this information.

#### Pre-populate

I load the start and end dates into their own key in Redis. The key format are from:bandName:artistName and to:bandName:artistName.

First I take the data from the relevant columns:

Then, if they’re not empty, I create the keys in Redis:

#### CouchDB

Adding the information to CouchDB is not hard; the main difficulty is to figure out how to modify the populate_couch.js script (continuation-passing style is hard).

Eventually, I just reused the roleBatch (therefore renamed artistInfoBatch) to retrieve the roles, from and to information.

The putting it in CouchDB is trivial:

#### Neo4j

Neo4j was the hardest piece of the puzzle: I didn’t know, and could not find any definitive documentation on, how to relationship properties at creation time. Eventually I found that adding them to the data attribute passed at creation time did the trick (although it still took me more time to understand how to use them).

The problem to do so is that the neo4j_caching_client.js library does not support adding properties to relationships, but it was easy enough to modify this library to add this feature.

then the relevant properties can be passed to the function above in the graph_sync.js script:

#### Using the new data

To make use of the new data, I tried to differentiate between current and old members of a band. I simply define a current member as one whose to property is null.

Figuring how to write a Gremlin query that extracted the information I needed was challenging: the documentation is often sparse, and many concepts barely explained.

I found that I could collect nodes or relationships along a path by naming them (with the step as), and then gather all of them in a single row of a Table. I used this to get both the from, to properties and the artist name property in a single query. However, I spent some time tracking a bug in my filters where apparently, null to would not be returned as current members. I finally realise that when a given node or relationship is given two different names, these names will appear in reverse order in the Table.

So in my case, the query:

I give the names from and to to the relationship, but used them in reverse order in the Table closures. Is this the intended behaviour or a bug? Does anybody know?

It seems like a common problem with some NoSQL databases: the query language feels very much adhoc, and not entirely sound or fully thought through. Despite its many defects, SQL was at least based (if sometimes remotely) on the relational calculus, which gave a precise meaning to queries. It was further specified in different standards, so that even its defects were fully clarified (XPath/XQuery is another pretty well specified query language). When playing with NoSQL databases that pretend to have a query language, I often find it difficult to go beyond the simpler examples, precisely because of this linguistic fuzziness.

But I solved it for this case, so now I have my Table. It is an object with two properties: columns is an array of column names, and data is an array of arrays (each one being a row). To convert them to an array of objects, I use the following code:

The rest of the code is just the nested Node.js event functions, and the formatting using the mustache (which was pretty cool and easy to use).

#### Full Code

The book (in beta 5.0) suggested to use Riak’s Luwak, but this component has recently been removed, and there seems to be no replacement at this time. So I went with MongoDB’s GridFS instead. This is a little more complex than a simple replacement of the client libraries: MongoDB does not have an HTTP ReST API for GridFS, so I need to stream the content of the file through the server.

#### Overview

To keep things simple, I load only on sample per band; the file name must be the same as the CouchDB key, followed by ‘.mp3’.

To access MongoDB from Node.js, I use node-mongodb-native, which can be installed with npm. It has all the expected features of a client, including GridFS support (with one caveat, see below).

To stream the file from the server, I use a dedicated port, for no better reason than because Brick.js, that the authors used to build the service, was giving me trouble, while the standard http module did not.

When displaying the band information, I check whether a file exists with the same name as the band’s key: if it does, I add a link to the dedicated streaming port, passing the key as parameter:

Then, I create a new http server to send the music:

The only problem I had (but it took me a while to figure it out) was that the stream support in the MongoDB client for GridFS content is (as far as I can tell) defective: it will close the stream after just one or two chunks’ worth of data (Issue in Github).

So instead I have to load the whole file in memory then write it in the response… Clearly not the best approach, but hey, it works!

## Wrapping Up

Well, that was a long day. I should have enjoyed it, but the lack of maturity in some of the tools (Neo4j’s always evolving query language and the GridFS streaming bug) caused hours of frustration. The main cause, however, was missing knowledge: faced with an unexpected behaviour, I had no idea whether it was a bug (find a workaround) or an incorrect invocation (rework the query to correct it).

The exposition of polyglot persistence through the music information service were pretty good, given the space constraint. Of course it skipped the really ugly and tedious parts (how to incrementally keep the databases in sync when the main records are updated, not merely created); given the variation in data models, data manipulation (or lack thereof) and query between the different databases, this can easily become a nightmare (especially if incremental updates are not part of the initial design).

Another upcoming book, Big Data, takes a very different approach (no updates, only appends). I look forward to reading it.

]]>
<![CDATA[Concrete Mathematics Chapter 2 Basics]]> 2012-03-10T11:10:00+09:00 https://blog.wakatta.jp/blog/2012/03/10/concrete-mathematics-chapter-2-basics This second batch of exercises builds on the previous one. Once again, there are no complex manipulations, and very often the solution just follows from the definitions.

## Basics

### $\sum_{0\le k\lt n}(a_{k+1}-a_k)b_k$

To show that

\begin{aligned} \sum_{0\le k\lt n}(a_{k+1}-a_k)b_k & = a_n b_n - a_0 b_0 - \sum_{0 \le k \lt n} a_{k+1}(b_{k+1} - b_k)&&n\ge 0\\ \end{aligned}

I start by rewriting the sum in the right side of the equation:

\begin{aligned} \sum_{0 \le k \lt n} a_{k+1}(b_{k+1} - b_k) & = \sum_{0 \le k \lt n} (a_{k+1}b_{k+1} + a_{k+1} b_k)\\ & = \sum_{0 \le k \lt n} a_{k+1}b_{k+1} + \sum_{0 \le k \lt n} a_{k+1} b_k&&\text{associative law}\\ & = \sum_{0 \le k-1 \lt n} a_k b_k + \sum_{0 \le k \lt n} a_{k+1} b_k&&k\leftarrow k-1\\ & = \sum_{1 \le k \le n} a_k b_k + \sum_{0 \le k \lt n} a_{k+1} b_k\\ \end{aligned}

This latest value can now be put back into the original right:

\begin{aligned} a_n b_n - a_0 b_0 - \sum_{1 \le k \le n} a_k b_k + \sum_{0 \le k \lt n} a_{k+1} b_k & = \sum_{0\le k \lt n} a_{k+1} b_k - (a_0 b_0 + \sum_{1 \le k \le n} a_k b_k - a_n b_n)\\ & = \sum_{0\le k \lt n} a_{k+1} b_k - \sum_{0\le k \lt n} a_k b_k\\ & = \sum_{0\le k \lt n} (a_{k+1} b_k - a_k b_k)\\ & = \sum_{0\le k \lt n} (a_{k+1} - a_k) b_k\\ \end{aligned}

which is indeed the left side of the equation (the but-last step is permitted under the associative law, but that didn’t fit in the margin).

### $p(k) = k + (-1)^k c$

It is clear that there is a single $p(k)$ for every possible (integer) $k$. So I need to show that for every $m$, there is a single $k$ such that $p(k)=m$, defining $p^{-1}$.

The book method is smart, mine clearly less so, but as far as I can tell, still correct: for $m$, I consider $m-c$ and $m+c$. The difference is $2c$, so they’re either both even, or both odd.

If they’re both even, then $m-c+(-1)^{m-c}c=m$, so $k=m-c$. If they’re both odd, then $m+c+(-1)^{m+c}c=m$, so $k=m+c$. So $k$ is always well defined for every $m$, and $p$ is indeed a permutation.

### $\sum_{k=0}^n (-1)^k k^2$

While I found the closed formula for the sum, I could not do it with the repertoire method.

Solving the sum is not really difficult (although a little bit more than the repertoire method, if you know how to do the latter); one way is to solve the positive and negative sums separately (they can be broken down to already solved sums); another one is to compute the sum of an even number of terms (one positive and one negative), then to compute sums of odd number of terms (by adding a term to the previous solution), and finally combining both to find the closed formula.

In both attempts above, I tried to remove the $(-1)^k$ factor from the terms; when using the repertoire method I tried to do the same, which is why I failed.

The repertoire method relies on a good intuition: one must have a sense of general shape of the parametric functions. In retrospect, it seems obvious, but I just couldn’t see it, blinded as I was by$(-1)^k$.

Expressing the sum as a recurrence is easy:

\begin{aligned} R_0 & = 0\\ R_n & = R_{n-1} + (-1)^n n^2\\ \end{aligned}

Also, looking at the first few terms of the sum, $-1, 3, -6, 10, -15, \dots$, it is natural to consider solutions of the form $(-1)^n F(n)$; it is a little bit trickier to see where a good generalisation of the recurrence above should put the additional terms:

\begin{aligned} R_0 & = \alpha\\ R_n & = R_{n-1} + (-1)^n \left(\beta + \gamma n + \delta n^2 \right)\\ \end{aligned}

With such a form, plugging in solutions $(-1)^nF(n)$ will simplify to $F(n) = \beta + \gamma n + \delta n^2 - F(n-1)$.

At this stage, it becomes very easy to find the $A(n)$, $B(n)$, $C(n)$ and $D(n)$ functions (the latter being the solution we are looking for). In fact, if all you care about is $D(n)$, then it is enough to use $R_n = (-1)^n n$ and $R_n = (-1)^n n^2$:

#### $R_n = (-1)^n n$

\begin{aligned} R_0 & = 0&&\alpha = 0\\ n & = \beta + \gamma n + \delta n^2 - n + 1\\ 2n - 1 & = \beta + \gamma n&&\beta = -1, \gamma = 2\\ \end{aligned}

which gives $-B(n)+2C(n) = (-1)^n n$.

#### $R_n = (-1)^n n^2$

\begin{aligned} R_0 & = 0&&\alpha = 0\\ n^2 & = \beta + \gamma n + \delta n^2 - (n-1) ^2\\ 2 n^2 - 2n + 1 & = \beta + \gamma n + \delta n^2&&\beta = 1, \gamma = -2, \delta = 2\\ \end{aligned}

which gives $B(n)-2C(n)+2D(n) = (-1)^n n^2$. Combining with the previous answer, we have $2D(n) = (-1)^n (n^2-n)$, or $D(n) = (-1)^n \frac{n^2-n}{2}$.

#### Wrapping up this exercise

In hindsight, these steps could have helped me solve this exercise as intended:

• compute the first few terms to see if there is something obvious about their shape; in this case, the $(-1)^n$ factor
• at first, write the recurrence equations as simply as possible, with all the “inconvenient” parts; comparing them to the “shapes” identified in the previous step might give some insight about the general solutions, and possibly removed these difficult parts
• only then, consider how to generalise the recurrence equations. The base case is always $R_0 = \alpha$; the recurrent case should add parameters to each term, and additional terms (with their own parameters) to complete some basic classes of problems (for instance, if there are any polynomial, there should be a term for each power smaller than the largest power of the original problem; another basic class is the generalised radix-based Josephus problem)
• each class of problems can be solved independently; this makes it easier to find potential solutions and to combine them.

### $\sum_{k=1}^n k2^k$

Not overly complicated; at least the introduction of $j$ is not a mystery (unlike the next exercise).

\begin{aligned} \sum_{1\le k\le n}k 2^k & = \sum_{1\le k\le n} 2^k \sum_{1\le j\le k}1\\ & = \sum_{1\le k\le n} \sum_{1\le j\le k} 2^k\\ & = \sum_{1\le j\le k \le n} 2^k\\ & = \sum_{1\le j\le n} \sum_{j\le k\le n}2^k\\ \end{aligned}

The inner sum can be rewritten as

\begin{aligned} \sum_{j\le k\le n}2^k & = \sum_{1\le k\le n}2^k - \sum_{1\le k\lt j}2^k\\ & = 2^{n+1} - 2 - 2^j + 2\\ & = 2^{n+1} - 2^j\\ \end{aligned}

Here I use the already known sum $\sum 2^k$. Putting this last result in the original sum

\begin{aligned} \sum_{1\le j\le n} 2^{n+1} - 2^j & = n2^{n+1} - (2^{n+1} -2)\\ \end{aligned}

### $\sum_{k=1}^n k^3$

It took me some time to convince myself that the original rewrite was legitimate; eventually I did it by induction (the book version is much shorter, and once you see it, much easier). Clearly it works for $n=1$, so assuming it does for $n-1$, we have

\begin{aligned} 2\sum_{1\le j\le k\le n} jk & = 2\sum_{1\le j\le k\le n-1} jk + 2\sum_{1\le j\le k=n} jk\\ & = \sum_{1\le k\lt n}(k^3+k^2) + 2n\sum_{1\le j\le n} j\\ & = \sum_{1\le k\lt n}(k^3+k^2) + n^2(n+1)\\ & = \sum_{1\le k\lt n}(k^3+k^2) + n^3+n^2\\ \end{aligned}

So the rewrite is correct. At this stage, (2.33) pretty much finishes it:

\begin{aligned} \sum_{1\le k\le n}(k^3+k^2) & = (\sum_{1\le k\le n}k)+\sum_{1\le k\le n}k^2\\ \end{aligned}

so $\sum_{1\le k\le n}k^3=\frac{n^2(n+1)^2}{4}$.

### $\frac{x^{\underline m}}{(x-n)^{\underline m}} = \frac{x^{\underline n}}{(x-m)^{\underline n}}$

This follows directly from $\frac{a}{b} = \frac{c}{d} \implies ad = bc$, and the use of equation (2.52).

### Rising and Falling Factorial Powers Conversions

I’ll just do the conversion from raising factorial power to falling factorial power; the other conversion is just the same.

$x^{\overline m} = \frac{1}{(x-1)^{\underline m}}$ follows from (2.51) and (2.52).

For the other equalities, by induction on $m$, and using (2.52) and its raising factorial powers equivalent:

\begin{aligned} x^{\underline m} & = x^{\underline{m-1}}(x-m+1)\\ & = x^{\underline 1}(x-1)^{\underline{m-1}}\\ & = x(x-1)^{\underline{m-1}}\\ x^{\overline m} & = x^{\overline{m-1}}(x+m-1)\\ & = x^{\overline 1}(x+1)^{\overline{m-1}}\\ & = x(x+1)^{\overline{m-1}}\\ \end{aligned}

#### Base case $m=0$

\begin{aligned} x^{\overline 0} & = 1\\ (-1)^0 (-x)^{\underline 0} & = 1\\ (x+0-1)^{\underline 0} & = 1\\ \end{aligned}

#### Other positive $m$

Assuming the relations hold for all $k, 0\le k\lt m$:

\begin{aligned} (-1)^m(-x)^{\underline m} & = -\left((-1)^{m-1}(-x)^{\underline{m-1}}(-x-m+1)\right)\\ & = (x^{\overline{m-1}})(x+m-1)\\ (x+m-1)^{\underline m} & = (x+m-1)^{\underline{m-1}}x\\ & = (x+1+(m-1)-1)^{\underline{m-1}}x\\ & = (x+1)^{\overline{m-1}}x\\ \end{aligned}

#### Negative $m$

Using the recurrence relations derived from (2.52) and its raising factorial power equivalent:

\begin{aligned} x^{\underline m} & = x^{\underline{(m+1)+(-1)}}\\ & = x^{\underline{-1}}(x+1)^{\underline{m+1}}\\ & = \frac{(x+1)^{\underline{m+1}}}{x+1}\\ & = x^{\underline{m+1}}(x-m-1)^{\underline{-1}}\\ & = \frac{x^{\underline{m+1}}}{x-m}\\ x^{\overline m} & = x^{\overline{(m+1)+(-1)}}\\ & = x^{\overline{-1}}(x-1)^{\overline{m+1}}\\ & = \frac{(x-1)^{\overline{m+1}}}{x-1}\\ & = x^{\overline{m+1}}(x+m+1)^{\overline{-1}}\\ & = \frac{x^{\overline{m+1}}}{x+m}\\ \end{aligned}

Assuming the relations hold for all $k, m\lt k\le 0$:

\begin{aligned} (-1)^m(-x)^{\underline m} & = -\frac{(-1)^{m+1}(-x)^{\underline{m+1}}}{-x-m}\\ & = \frac{x^{\overline{m+1}}}{x+m}\\ (x+m-1)^{\underline m} & = \frac{(x+m)^{\underline{m+1}}}{x+m-1-m}\\ & = \frac{(x-1)^{\overline{m+1}}}{x-1}\\ \end{aligned}

So the main difficulties is to derive two equalities from (2.52) (four if we count the negative cases as well), and the identification of the recurrence equation in the induction step (especially for $(x+m-1)^{\underline{m\pm 1}}$).

### Absolute Convergence of Complex Sums

I suppose I could say it follows directly from the equivalence of the metric functions (if my memory of metric space terminology is correct).

More basically, the equivalence of the propositions follows from the relationships based on the hypotenuse formula: $\sqrt{(Rz)^2+(Iz)^2}\le |Rz| + |Iz|$, so the absolute convergence of the real and imaginary parts implies the absolute convergence of the absolute value. Conversely, $|Rz|,|Iz|\le\sqrt{(Rz)^2+(Iz)^2}$, so the absolute convergence of the absolute value also implies the absolute convergence of both the real and imaginary parts.

### Wrapping up

This time, I found a solution to all the exercises, which is a progress of some sort. I still have trouble with the repertoire method, or perhaps not with the method itself but in identifying suitable generalisations and candidate solutions. This is something that can only be developed with practice, so I just have to be patient and keep trying (I hope I’ll get there eventually).

]]>
<![CDATA[Concrete Mathematics Chapter 2 Warmups]]> 2012-02-28T19:18:00+09:00 https://blog.wakatta.jp/blog/2012/02/28/concrete-mathematics-chapter-2-warmups This first batch of exercises is meant to develop familiarity with the various concepts and notations introduced in this chapter. There is no complex manipulation, but the trick is to be aware of the often unmentioned assumptions about the precise meaning of the expressions.

## Warmups

### $\sum_{k=4}^0 q_k$

The meaning of such an expression is not clear, so there is no real way to fail this exercise.

A first interpretation, maybe the common one, is that the sum is zero because the range is empty. In other words, the sum is $\sum_{4\le k\le 0} q_k$.

A second interpretation, perhaps for those used to programming languages with very flexible loops could argue that the sum is $q_4 + q_3 + q_2 + q_1 + q_0$.

I toyed briefly with a negative sum, similar to integrals with reversed bounds, but I did not come up with the nice book solution of $\sum_{k=m}^n = \sum_{k\le n} - \sum_{k\lt m}$, which is consistent with and extends the first interpretation.

### Simplify $x([x\gt 0] - [x\lt 0])$

It is easy to see that the expression has the same value as $|x|$:

\begin{aligned} x([x\gt 0] - [x\lt 0]) & = x (1-0)&&\text{when x\gt 0}\\ & = x\\ x([x\gt 0] - [x\lt 0]) & = x (0-1)&&\text{when x\lt 0}\\ & = -x\\ x([x\gt 0] - [x\lt 0]) & = 0&&\text{when x = 0}\\ \end{aligned}

### Writing out sums

The first one is easy:

\begin{aligned} \sum_{0\le k\le 5}a_k = a_0+a_1+a_2+a_3+a_4+a_5\\ \end{aligned}

The second one is tricky, is more than one way. One problem is that $k$ is not explicitly defined, and I had assumed it was a natural, when the authors thought of it as a integer; now the latter is in line with the book conventions, so I was wrong and had missing terms. The right answer is:

\begin{aligned} \sum_{0\le k^2 \le 5}a_k = a_4 + a_1 + a_0 + a_1 + a_4\\ \end{aligned}

### Triple Sum

Here it is important to restrict the bounds as much as possible (but no more); otherwise there is a risk of introducing spurious terms.

\begin{aligned} \sum_{1\le i \lt j \lt k \le n}a_{ijk} & = \sum_{i=1}^2 \sum_{j=i+1}^3 \sum_{k=j+1}^4 a_{ijk}\\ & = \left((a_{123} + a_{124}) + a_{134} \right) + a_{234}\\ & = \sum_{k=3}^4 \sum_{j=2}^{k-1} \sum_{i=1}^{j-1} a_{ijk}\\ & = a_{123}+\left(a_{124} + (a_{134} + a_{234})\right)\\ \end{aligned}

The terms appear in the same order, but are grouped in sums differently.

### Incorrect derivation

The problem is the step

\begin{aligned} \sum_{j=1}^n \sum_{k=1}^n = \frac{a_j}{a_k}\sum_{k=1}^n \sum_{k=1}^n \frac{a_k}{a_k}\\ \end{aligned}

$k$ is already bound in the inner sum, so it is invalid to replace $j$ by $k$ in the outer.

### $\sum_k [1\le j\le k\le n]$

This can be worked explicitly:

\begin{aligned} \sum_k [1 \le j \le k \le n] & = \sum_k [1 \le j \le n] [j \le k \le n]\\ & = \sum_{j\le k \le n} [1 \le j \le n]\\ & = [1 \le j \le n] \sum_{j\le k \le n} 1\\ & = [1 \le j \le n] (n-j+1)\\ \end{aligned}

### $\bigtriangledown f(x)$

The result is not surprising:

\begin{aligned} \bigtriangledown x^{\overline{m}} & = x^{\overline{m}} - (x-1)^{\overline{m}}\\ & = x(x+1)\cdots(x+m-1) - (x-1)x\cdots(x+m-2)\\ & = x(x+1)\cdots(x+m-2)(x+m-1-(x-1))\\ & = m x^{\overline{m-1}}\\ \end{aligned}

So $\bigtriangledown f(x)$ is the difference operator to use with rising factorials.

### $0^{\overline{m}}$

Clearly, when $m\lt 0$, $0^{\overline{m}} = 0$; when $m = 0$, $0^{\overline{m}} = 1$ (to make the expression $x^{\underline{1+0}}=x^{\underline 1}(x-1)^{\underline 0}$ work when $x=1$); I had forgotten about $m<0$, which was perhaps the easiest case, as $\frac{1}{m!}$ (it follows directly from the definition of falling factorials with negative powers).

### Law of exponents for rising factorials

It is easy to see that $x^{\overline{m+n}} = x^{\overline m}(x+m)^{\overline n}$:

\begin{aligned} x^{\overline{m+n}} & = x\cdots(x+m-1)(x+m)\cdots(x+m+n-1)\\ & = \left( x\cdots(x+m-1) \right) \left( (x+m)\cdots(x+m+n-1) \right)\\ & = x^{\overline m}(x+m)^{\overline n}\\ \end{aligned}

From there, the value of rising factorials for negative powers follows quickly:

\begin{aligned} 1 = x^{\overline{-n+n}} & = x^{\overline{-n}} (x-n)^{\overline{n}}\\ x^{\overline{-n}} & = \frac{1}{(x-n)^{\overline{n}}}\\ & = \frac{1}{(x-n)\cdots(x-1)}\\ & = \frac{1}{(x-1)^{\underline{n}}}\\ \end{aligned}

### Symmetric difference of a product

To start, I quickly looked up the proof of the original derivative product rule on Wikipedia; the geometric nature of the proof was illuminating (I believe I was taught the so called Brief Proof both in high-school and at university).

This geometric proof can be used for both the infinite and the finite calculus, and its symmetric nature (there are two ways to compute the area of the big rectangle: $f(x)g(x)+(f(w)-f(x))g(w) + f(x)(g(w)-g(x))$ and $f(x)g(x)+f(w)(g(w)-g(x)) + (f(w)-f(x))g(x)$) can be used in the finite case. The symmetry (and equality) is restored because in the infinite calculus, $\lim_{w\rightarrow x}f(w) = f(x)$ and $\lim_{w\rightarrow x}g(w) = g(x)$, a restoration that is not possible in the finite calculus.

However, the equivalent finite calculus formulas, $\bigtriangleup(uv) = u\bigtriangleup v + Ev\bigtriangleup u$ and $\bigtriangleup(uv) = Eu\bigtriangleup v + v\bigtriangleup u$, have together the symmetry they lack on their own.

### Wrapping up

OK, that was not entirely bad (two small mistakes, both about negative numbers blindness). Next step, the basic exercises.

]]>
<![CDATA[Concrete Mathematics Chapter 2 Notes]]> 2012-02-27T10:54:00+09:00 https://blog.wakatta.jp/blog/2012/02/27/concrete-mathematics-chapter-2-notes After a long but busy silence, I have now a few notes on the second chapter, Sums. As with Chapter 1, these are nothing revolutionary; just some clarifications of the points that were not obvious to me, as well as other, random observations.

Overall, this chapter felt less overwhelming than the first, despite being much longer and introducing very powerful techniques. I have yet to do the exercises, though, so I may still revise this judgement.

### Notation

The authors mentions that the Sigma-notation is “… impressive to family and friends”. I can confirm that assessment.

The remark on keeping bounds simple actually goes beyond resisting “premature optimisation”, that is, removing terms just because they are equal to zero. Sometimes, it is worth adding a zero term if it simplifies the bounds. Such a trick is used in solving $\sum_{1\le j\lt k\le n} \frac{1}{k-j}$, and I’ll get back to this point when I go over this solution.

The Iverson notation (or Iversonian) is a very useful tool, as is the general Sigma-notation. About the latter, it already simplifies variable changes a lot, but I found it useful (and less error prone) to always write the variable change on the right margin (for instance as $k \leftarrow k+1$) and to keep that change as the only one in a given line of the rewrite; otherwise, no matter how trivial the change, any error I make at that time will be hard to locate (I know; I tried).

### Sums and Recurrence

First we see how easy it is to use the repertoire method to build solutions to common (or slightly generalised) sums. The only problem with the repertoire method is it requires a well furnished repertoire of solutions to basic recurrences; I’m sure I would never have come up with the radix-change solution to the generalised Josephus problem. And given that there is an infinite number of functions one could try, a more directed method is sometimes necessary.

This section also shows how to turn some recurrence equations (such as the Tower of Hanoi one) into a sum; this method involve a choice ($s_1$ can be any non-zero value), which could either simplify or complicate the solution. I haven’t done the exercises yet, so I don’t know to what extent the choice is obvious or tricky.

Finally it shows how to turn a recurrence expressed as a sum of all the previous values into a simpler recurrence by computing the difference between two successive values. This is one instance of a more general simplification using a linear combination of a few successive values.

### Manipulation of Sums

Unsurprisingly, sums have the same basic properties as common additions: distributive, associative and commutative laws. Only the latter is really tricky, as it involves a change to the index variable. As mentioned above, I found useful to make such changes really clear and isolated in any reasoning.

With these laws confirmed, it is possible to build the first method for solving sums: the perturbation method. It is very simple, and while it does not always work, when it does it is very quick.

### Multiple Sums

This is perhaps the first section where I had to slow down; basically multiple sums are not different from simple sums, and manipulations are defined by the distributive law, but index variable changes (especially the rocky road variety) require special attention. This, combined with “obvious” simplifications (obvious to the authors, and sometimes in retrospect to the reader as well), gave me some difficulties.

For instance, the solution to

\begin{aligned} \sum_{1\le j\lt k\le n} \frac{1}{k-j} \end{aligned}

The index variable change $k \leftarrow k+j$ is explained as a specific instance of the simplification of $k+f(j)$; more perplexing are the ranges for $j$ and $k$ when the sum is replaced by a sum of sum:

\begin{aligned} \sum_{1\le k\le n} \sum_{1\le j \le n-k} \frac{1}{k} \end{aligned}

The range for $j$ is built from $1\le j$ and $k+j\le n$, so there is nothing really strange here.

The range for $k$, however, looks like a typo: certainly the authors meant $1\le k\lt n$. A margin graffiti confirms the range, but it does not really explain it.

The fact is, it is safe to let $k\le n$ here, because the sum over $j$ when $k=n$ is zero: not only the expression $\sum_{1\le j \le k-n = 0} \frac{1}{k}$ is zero because there is no $j$ that can satisfies the range predicate, but the closed form of this sum, $\frac{k-n}{k}$, is also zero when $k=n$.

With the closed form checked, it is safe to add extra terms to simplify the range of $k$.

What happens if you don’t see this possible simplification? As expected, the answer remains the same:

\begin{aligned} \sum_{1\le k\lt n} \sum_{1\le j \le n-k} \frac{1}{k} & = \sum_{1\le k\lt n} \frac{n-k}{k}\\ & = \sum_{1\le k\lt n} \frac{n}{k} - \sum_{1\le k\lt n} \frac{k}{k}\\ & = \sum_{1\le k\lt n} \frac{n}{k} - (n-1)\\ & = \sum_{1\le k\lt n} \frac{n}{k} + \frac{n}{n} - n\\ & = \sum_{1\le k\le n} \frac{n}{k} - n\\ & = nH_n - n\\ \end{aligned}

So to expend on the original advice of keeping the bounds as simple as possible: sometimes it is possible to extend the bounds (in order to simplify them), as long as the extra terms in closed form evaluate to zero. If the extra terms are still defined as sums, just checking that the range is empty might not be enough.

### General Methods

A cool and fun section on the various ways to solve a given sum.

Method 0 is to look it up. This book, written before the rise of Internet (I remember Internet in the early 1990’s; most of it was still indexed manually on the CERN index pages…), suggests a few books as resources.

Fortunately, some of them have migrated to the Web, which is a more suitable tool than books for such knowledge; the combination of searches and instant updates is hard to beat (a book remains best for a content that is mostly linear and somewhat independent of time; a novel, or textbook, for instance. References are better on Internet, free if possible, for a subscription otherwise).

Method 1 is guessing then proving; proving in fact should be a complement for all the other methods (except perhaps Method 0). Having two independents proofs is always good.

Method 2 is the perturbation method. In this section example, we see how an apparent failure can still be exploited by being imaginative.

Method 3 is the repertoire method. In this chapter it is usually much simpler than in the first.

Method 4 uses calculus to get a first approximation, then uses other methods to solve the equations for the error function.

Method 5 is a clever rewriting of the problem into a sum of sums; like the repertoire method but unlike the others, it requires some intuition to find a solution (perhaps more than the repertoire method); I have bad memories of trying such a method to solve problems at university, always somehow ending up right where I started. I guess I will try other methods if I can.

Method 6 is the topic of the next section; method 7 is for another chapter.

### Finite and Infinite Calculus

This section was surprising and exciting, but not really that complex. It really is a matter of adapting regular calculus reflexes to the finite version. I have to see how it works in practice.

One thing that is causing me some trouble is the falling-power version of the law of exponents:

\begin{aligned} x^{\underline{m+n}} & = x^{\underline m}(x-m)^{\underline n}\\ \end{aligned}

While the rule is easy to prove and to remember, it is less easy than the general one to recognise in practice; I failed to see it when it came up in the solution to

\begin{aligned} \Sigma xH_x\delta x\\ \end{aligned}

Worse, even the explanation in the book, I had to write it down, play with it, before seeing it.

So I’m thinking about a notation that would bring out the rule more clearly, an extension of the shift operator $E$:

\begin{aligned} E_k f(x) & = f(x-k)\\ \end{aligned}

This would turn the exponent law into

\begin{aligned} x^{\underline{m+n}} & = x^{\underline m} E_m x^{\underline n}\\ \end{aligned}

Whether this is useful, or whether I’ll get used to the original notation anyway, we’ll see in the exercises…

### Infinite Sums

The last section is about infinite sums. The authors quite sensibly restrict the scope to absolutely convergent sums, which have the advantage that the three basic laws and the manipulations they allow are still valid.

Once again, this was not overly difficult; the only point I had trouble understanding was the existence of the subsets $F_j$ such that $\sum_{k\in F_j} a_{j,k} \gt (A/A')A_j$ when $\sum_{j\in G} A_j = A' \gt A$. But this last equation means that $A/A' \lt 1$, so $(A/A')A_j \lt A_j$. The first equation is therefore just a consequence of the fact that $A_j$ is a least upper bound.

Next post, the warmups.

]]>
<![CDATA[Psychic Modeling (Fast Version)]]> 2012-02-13T20:23:00+09:00 https://blog.wakatta.jp/blog/2012/02/13/psychic-modeling-fast-version In Psychic Modeling, I described a reasonably understandable implementation of a ticket generator for the Psychic Modeling Problem. While this version is not overly slow, it is not amazingly fast either.

As I’m refreshing my C skills, I thought it would be interesting to try and implement a version as fast as possible.

### Design

I represent a subset as bit patterns in a 32-bits integer. This means I am limited to 32 different values (in other words, $n$ must be no larger than 32). The upside is that I have extremely fast intersection (&) and union (|) operations, among others.

### Memory Management

I use a work memory allocated at the beginning of the search; additional memory is allocated on the stack (using C99 features), and the selected tickets are just printed to avoid having to remember them.

The work memory is large enough to store $1 + \beta$ times a block large that can hold the complete set of $j$-subsets. The first block keeps the remaining $j$-subsets, and there’s an extra block for each random ticket: each time (for a total of $\beta$) a random ticket is generated, the $j$-subsets that are not covered yet is computed for this ticket; after I have generated $\beta$ tickets, I copy the work block of the best one over the first one.

I could have use just 3 blocks, a reference, the best so far, and one for the current random ticket, and copy from the current to the best each time the current ticket is better. There would be more copy operations, but perhaps less movement between the cache and the memory. The current design requires less than 2M, and only one copy operation per random ticket.

### Non portable features

I am using a few GCC built-in bit-level operations (number of bits, index of least significant 1 bit, and count of trailing zeroes); Bit Twiddling Hacks and Hacker’s Delight have portable alternatives.

I also use /dev/random as a source of random numbers; replacing dev_random by random would restore portability (but the output would always be the same, and the random state is reset when the program starts).

### Performance

So, is it fast?

The program found 71 tickets covering all 7-subsets with at least 6 numbers in less than a second. Even when the conditions are not that good, it remains fast:

Here it generated 1077 tickets using the smaller ticket size from Younas and Skiena paper; the paper had a 1080 tickets solution, so my version is effective.

Of course, it would be useless and unfair to compare the speed of this version against the numbers from the paper; more relevant is the difference with the Haskell version: while the latter was not meant to be fast, it is hundreds of times slower. I suppose it would be interesting to try and make it faster, but I suspect it would be just as ugly or uglier than the C version. And I like to keep using Haskell as a design and exploratory tool.

### Overview of the code

#### solve

The main function, solve, is more complex than in the Haskell version. It allocates the work memory, and fills it with init. A first ticket is used in init to filter out $j$-subsets.

Then the loop for the other tickets starts. It of course stops when there are no remaining $j$-subsets.

The subset of remaining numbers is computed with funion (fold union), and the digits array prepared to be used in sample. It consists of the individual bits of the number representing the remaining numbers subset. It is computed by repeatedly isolating the rightmost 1 bit (with d & -d), then clearing this bit (with d &= d -1).

A first ticket is randomly generated and its uncovered set computed. It is also set as the best new ticket (and indeed is the best so far). Then for the remaining $\beta-1$ new tickets, the uncovered set is computed as well, and if the new set is smaller than the best’s, the new ticket becomes the best as well.

The best ticket is printed, the main work memory is updated with the best uncovered set, and if there are any remaining $j$-subsets to find, we loop.

#### init

init’s purpose it to avoid wasting a loop over the $j$-subsets by merging the generation of $j$-subsets with the coverage of a first permutation (defined as [1..k] in solve). The returned value is not size of the not yet covered set of $j$-subsets.

If all tickets had to be generated randomly, 0 could be passed instead of a ticket to keep all $j$-subsets.

#### check_cover

check_cover has a similar design as init, but reads the $j$-subsets from the work memory from instead of generating them.

#### sample

sample is very similar to the Hashell version (indeed they are both based on the same algorithm); here the digits array plays the role that ds played in the Haskell version.

#### next_perm

The next_perm is from Bit Twiddling Hacks, and explained here.

#### Compiling and running

Using gcc, the necessary option is -std=c99 to activate C99 support; -O3 gives much (really) better performance, while -Wall is in general a good idea:

To run it, just pass the $n$, $k$, $j$ and $l$ parameters on the command line. There is no checks, so avoid mistakes. The program outputs the generated tickets:

### Wrapping up

After I completed the Haskell version, I found it not overly difficult to implement the C one. I was lucky to have discovered Bit Twiddling Hacks the week before; the code fragments there were very helpful in writing efficient set oriented functions over words.

Surprisingly, I had just one bug to track (I was using a variable both as parameter and temporary storage in one of the function); that was lucky as I’m not sure I could have debugged such code.

### Complete Code

]]>
<![CDATA[Psychic Modeling]]> 2012-02-10T12:26:00+09:00 https://blog.wakatta.jp/blog/2012/02/10/psychic-modeling In the Algorithm Design Manual, Stephen Skiena entertains, edifies and educates his readers with so called “War Stories”, that is, interesting implementation challenges from his own experience.

The first War Story is Psychic Modeling, an attempt to exploit “precognition” to improve the chances of winning the lottery.

This war story is also the subject of one of the first implementation projects. In chapter 1. A few years ago, when I bought the book, I had easily solved the previous exercises, but then I reached this implementation project, and I got stuck. I could not even get a high level sketch of what a solution would look like.

Certainly, if I was unable to solve an exercise of the first chapter of this book, it was hopelessly beyond my reach…

Still, I had the ambition of one day resuming my reading, and I would from time to time give this problem another attempt.

Recently, it feels like all the pieces finally fell into places, and after a few hours of coding I had an (naive) implementation. Yet I still have doubts, as the only reference I have to compare my solution with, Skiena’s own paper (Randomized Algorithms for Identifying Minimal Lottery Ticket Sets), apparently is worse (in terms of necessary tickets) than my solution…

Note on this paper: unfortunately it is in Word format, and I found that some characters are not properly displayed on non MS Word text processing tools (such as Open Office). So you might have to open it with MS Word or MS Word Viewer.

### The problem

I will use the notation from the book rather than the paper. The problem is defined as this:

• a lottery ticket has $k$ numbers
• a win requires $l$ numbers from the winning ticket
• the psychic visualises $n$ numbers
• of which $j$ are “guaranteed” to be on the winning ticket.

### Defining “sufficient coverage”

A first difference between the paper’s approach and mine is that I’m using the notion of coverage size rather than distance: I measure how similar two subsets are by defining their cover as the size of their intersection; in their paper the authors use a notion of distance defined as the size of the difference of the two subsets (perhaps to help with the design of heuristics in the backtracking version of their algorithm).

Now, clearly the two approaches are equivalent; it is less clear that the formulas derived from either are indeed the same.

For a given $j$-subset, how many $j$-subsets have a coverage of at least $l$ with the first one? The covered $j$-subsets must have at least $l$ numbers (between $l$ and $j$, to be precise) in common with the first one, and the rest taken from the $n-j$ other numbers. This gives

\begin{aligned} \sum_{l \le i \le j} \binom{j}{i} \binom{n-j}{j-i} \end{aligned}

For a given $j$-subset, how many $j$-subsets are within $j-l$ distance of the first one? We can choose at most $j-l$ numbers out of the $n-j$ rest; and complete with numbers from the first subset. This gives

\begin{aligned} \sum_{0 \le i \le j-l} \binom{n-j}{i} \binom{j}{j-i} = \sum_{0 \le i \le j-l} \binom{n-j}{i} \binom{j}{i} \end{aligned}

It took me a while to confirm it, but the formulas are indeed the same:

\begin{aligned} \sum_{0 \le i \le j-l} \binom{n-j}{i} \binom{j}{i} & = \sum_{0 \le i \le j-l} \binom{n-j}{i} \binom{j}{j-i}\\ & = \sum_{l-j \le i \le 0} \binom{n-j}{-i} \binom{j}{j+i}&&\text{changing the sign of i}\\ & = \sum_{l \le j+i \le j} \binom{n-j}{-i} \binom{j}{j+i}\\ & = \sum_{l \le i \le j} \binom{n-j}{j-i} \binom{j}{i}&&\text{replacing j+i by i}\\ \end{aligned}

### Size of a ticket

Note that I do not use the $k$ size of a ticket. In fact, in my original design, I used it but ignored $j$; reading the paper I realised that $j$ was indeed critical: one of the $j$-subsets will be on the winning ticket, so they are the ones we need to cover. However, I could not understand why the paper did not use the potentially larger size of a ticket to cover more $j$-subsets.

Restated with a complete ticket, the coverage formula becomes

\begin{aligned} \sum_{l \le i \le j} \binom{k}{i} \binom{n-k}{j-i} \end{aligned}

This apparent small change actually reduces the lower bound of the necessary tickets significantly. For $n=15$, $k=6$, $j=5$, $l=4$, for instance, will the paper offers as a lower bound $58$, the formula above gives $22$.

So the question is: is it valid to use the possibly larger value $k$ when generating tickets? I could not think of any reason not too, and if I’m right, this gives each ticket a much larger cover, and therefore a lower number of necessary tickets.

## Implementation

For a first effort, I chose to code in Haskell, and favoured simplicity over speed. The code is indeed both simple, and wasteful, but Moore’s Law says that computers have become about 1000 times faster since the time the paper was written, so I have some margin.

To keep things simple, sets and subsets are just lists.

### Support functions

Such functions ought to belong to a dedicated library (and perhaps they do); I include them to keep the implementation mostly self-contained.

fact is just the factorial; combi computes the binomial coefficient, and remainingNumbers is just the union of all the passed $j$-subsets.

genCombi k s generates the $k$-subsets of $s$.

### Lower Bound Estimate

These are simple implementations of the formula above.

ticketCover just implements the coverage estimate I defined above (the one that uses $k$); lowerBound computes the lower bound for a single win.

### Coverage

As stated above, I define the cover between two subsets as the size of their intersection, and define sufficient coverage as the cover being larger than $l$.

cover implements the cover definition; coveredP and notCoveredP are predicates that check for (or against) sufficient coverage.

notCovered and notCoveredBatch computes the subsets that are not covered by a single ticket or a set of tickets, respectively; they are used to compute what is left to cover after selecting a ticket, and to check solutions.

Finally coverageScore computes the size of of the covered subsets by a ticket. This function is used to compare potential tickets and select the one with the best (i.e. largest) coverage.

checkFormula computes the size of the coverage of a single ticket; it can be used to confirm the value of ticketCover above (and as far as I can tell from my checks, it does).

### Solution Loop

The solution loop takes the parameters and a ticket candidate generating function; it then gets one ticket at a time, computes the $j$-subsets not covered yet, and repeat until the remaining $j$-subsets set becomes empty.

The solve function expects the candidate generation function to be a monad; this is to make it possible to use random number generators.

### Naive Ticket Selection

I do not really know how to navigate subsets, so I won’t try to implement a backtracking solution as describe in the paper. Instead, I have what is really the simplest greedy algorithm: when a new ticket is needed, get the one that has the best coverage among all the possible tickets:

So for each $j$-subsets set, generate all the $k$-subsets, and compare their coverage.

Needless to say, this function does not return anything anytime soon for even slightly large values of $n$.

### Randomised Ticket Selection

To improve the performance (well, to get a result in my lifetime), I am using what I understand to be the same approach as in the paper: generates $\beta$ tickets, compare their coverage of the remaining subsets, and keep the best one.

The different with the paper, as mentioned before, is that my tickets are $k$-subsets rather than $j$-subsets themselves.

I first need a function to generate a random combination. I’m using a method derived from Knuth (no reference as I don’t have Volume 4 just yet).

The generating function is very similar to the naive one

The only difference is the tickets candidate set: the naive function generates them all; the randomised one selects $\beta$ randomly.

### Compatibility with the paper version

By using solve n j j l instead of solve n k j l, my implementation should compute subset coverage the same way the paper’s implementation does.

### Testing and Results

I will not compare speed, as this would be meaningless. But I can check whether different values for ticket size can indeed help reduce the size of the covering set.

Let’s start with a very simple problem, where $n=5$, $k=3$, $j=3$ and $l=2$.

I don’t really need to generate the $j$-subsets, but if I do I can check the solution.

The solution itself is computed by passing a ticket generating function; I could have used getCandidate, but here I’m passing getCandidateRandom with a $\beta=100$.

The notCovered set is empty, so the solution is at least a covering one.

The solution has two tickets, and the lower bound confirms it is pretty good.

Next test, with $n=15$, $k=5$, $j=5$ and $l=4$. The paper reports that they found a solution with $137$ tickets. As $k=j$, my algorithm cannot really beat that (and indeed finds a solution of the same size, if I try a couple of times):

For the next test, I should have a better solution than the paper, as $k$ is larger than $j$: $n=15$, $k=6$, $j=5$, $l=4$.

The paper has a lower bound of $58$, and a solution of size $138$, but my lower bound is $22$, and my solution has size $57$.

When the difference between $k$ and $j$ becomes large, the solution improves significantly: with $n=18$, $k=10$, $j=7$, $l=6$, the paper has a lower bound of $408$, mine is $18$. The paper’s solution has size $1080$, but mine is just $73$.

### Wrapping up

Even if my approach is ultimately wrong, I can say I must be close to an actual solution. I could (and probably will, given time) try to rewrite my solution in C, and focus on performance.

So I declare this problem conquered, I will resume my reading.

### Complete code

]]>
<![CDATA[Concrete Mathematics Chapter 1 Exam Problems]]> 2012-02-05T12:27:00+09:00 https://blog.wakatta.jp/blog/2012/02/05/concrete-mathematics-chapter-1-exam-problems It took me longer than I thought, and the outcome is slightly disappointing: I failed to solve two of the problems, and I solved the remaining ones way too slowly, so in a real exam conditions I probably would have solved just one or two…

## Exam Problems

### 4 Pegs Tower of Hanoi

First, it helps to see that the indices of the recurrence are actually $S_n$:

\begin{aligned} W_{n(n+1)/2}&= W_{S_n}\\ W_{n(n-1)/2}&= W_{S_{n-1}} \end{aligned}

And of course, $S_n = S_{n-1} + n$.

Setting $m=S_{n-1}$, we try to show:

\begin{aligned} W_{m+n} & \le 2W_{m} + T_n\\ \end{aligned}

Now, obviously, if we have $m+n$ discs, we can move the $m$ top ones from $A$ to $C$ using $B$ and $D$ as transfer pegs, then move the bottom $n$ ones from $A$ to $B$ using $D$ as transfer peg, and finally move the top $m$ ones from $C$ to $B$.

The first step takes $W_m$ moves, the second one is the classic Tower of Hanoi problem (as we can no longer use peg $C$, we only have three pegs), so it takes $T_n$ moves, and the last step takes $W_m$ moves again.

This is only one possible solution; the optimal one must be equal or better, so we have

\begin{aligned} W_{m+n} & \le 2W_m + T_n\\ \end{aligned}

This is true for any $m+n$ discs, and in particular for $S_n = S_{n-1} + n$ ones.

### Specific Zigs

I could not solve this problem. I had found that the half-lines did intersect, but then I failed to show that their intersections were all distinct.

Even with the solution from the book, it took me a while before I finally had a complete understanding.

One problem I had was that lines in a graph are basic college level mathematics, but college was a long, long time ago. I pretty much had to work from first principles.

Following the book in writing the positions as $(x_j, 0)$ and $(x_j - a_j, 1)$, I need to find $\alpha$ and $\beta$ such that $y=\alpha x + \beta$ is true for both points above.

\begin{aligned} 0 & = \alpha x_j + \beta \\ \beta & = - \alpha x_j\\ 1 & = \alpha (x_j - a_j) - \alpha x_j\\ & = \alpha x_j - \alpha a_j - \alpha x_j\\ & = - \alpha a_j\\ \alpha & = \frac{-1}{a_j}\\ y & = \frac{x_j - x}{a_j}\\ \end{aligned}

With this given, I can try to find the intersection of lines from different zigs, $j$ and $k$:

\begin{aligned} \frac{x_j - x}{a_j} & = \frac{x_k - x}{a_k}\\ a_k (x_j - x) & = a_j (x_k - x)\\ a_k x_j - a_k x & = a_j x_k - a_j x\\ a_k x_j - a_j x_k & = (a_k - a_j) x\\ \end{aligned}

Now, still following the book, I replace $x$ by $t$ with $x=x_j - t a_j$:

\begin{aligned} a_k x_j - a_j x_k & = (a_k - a_j) (x_j - t a_j)\\ a_k x_j - a_j x_k & = a_k x_j - a_j x_j - t a_j a_k + t a_j^2\\ - a_j x_k & = t a_j^ 2 - a_j x_j - t a_j a_k\\ - x_k & = t a_j - x_j -t a_k&&\text{dividing by } a_j\\ x_j - x_k & = t (a_j - a_k)\\ t & = \frac{x_j - x_k}{a_j - a_k}\\ \end{aligned}

Somehow, I have a faint memory of such a result; I need to check a college math book.

To complete, I need to show that $y = t$:

\begin{aligned} y & = \frac{x_j - x}{a_j}\\ & = \frac{x_j - x_j + t a_j}{a_j}\\ & = \frac{t a_j}{a_j}\\ & = t\\ \end{aligned}

So the intersection of any two pair of half-lines from different zigs is $(x_j - t a_j, t)$. Note that $t$ has the same value whether $j \gt k$ or $k \gt j$. To simplify further computations, I set $j \gt k$.

There are two remaining steps: show that $t$ is different for different pairs of $j$, $k$ (with $j \ne k$); and then show that the four intersections for a pair $j$, $k$ are also distinct.

$a_j$ can be of two forms: $n^j$ and $n^j + n^{-n}$. So $a_j - a_k$ can be one of

\begin{aligned} & n^j - n^k\\ & n^j + n^{-n} - n^k\\ & n^j - n^k - n^{-n}\\ n^j + n^{-n} - n^k - n^{-n} = & n^j - n^k\\ \end{aligned}

So there are three different forms for $a_j - a_k$, which I will simply write $n^j - n^k + \epsilon$ where $|\epsilon| \lt 1$.

\begin{aligned} t & = \frac{n^{2j} - n^{2k}}{n^j - n^k + \epsilon}\\ & = \frac{(n^j - n^k)(n^j + n^k)}{n^j - n^k + \epsilon}\\ \end{aligned}

Let’s show that $n^j+n^k - 1 \lt t \lt n^j+n^k + 1$: multiply the whole inequality by $n^j - n^k + \epsilon$. As

\begin{aligned} n^j - n^k & \ge n\\ & \ge 2\\ & \gt |\epsilon|\\ \end{aligned}

so $n^j - n^k + \epsilon \gt 0$. Defining

\begin{aligned} N_{jk} & = n^j + n^k\\ N'_{jk} & = n^j - n^k\\ \end{aligned}

the left and right inequalities become

\begin{aligned} (N_{jk} - 1) (N'_{jk} + \epsilon) & = N_{jk}N'_{jk} - N'_{jk} + \epsilon N_{jk} - \epsilon\\ (N_{jk} + 1) (N'_{jk} + \epsilon) & = N_{jk}N'_{jk} + N'_{jk} + \epsilon N_{jk} + \epsilon\\ \end{aligned}

Subtracting $N_{jk}N'_{jk} = (n^j-n^k)(n^j+n^k)$ from the original inequality:

\begin{aligned} -N'_{jk}+\epsilon N_jk - \epsilon \lt 0 \lt N'_{jk} + \epsilon N_{jk} + \epsilon\\ \end{aligned}

I need to prove the following inequality

\begin{aligned} (n^j - n^k) & \gt |\epsilon| + |\epsilon| (n^j - n^k)\\ \end{aligned}

We already know $|\epsilon| \lt 1$, so looking at the second term (and assuming $\epsilon \ne 0$, as this case is trivial)

\begin{aligned} |\epsilon| (n^j-n^k) & = n^{-n} (n^j - n^k)\\ & = n^{j-n} - n^{k-n}\\ &\lt 1\\ \end{aligned}

and we have

\begin{aligned} n^j - n^k & \ge 2 & \gt |\epsilon| + |\epsilon (n^j - n^k)|\\ \end{aligned}

So the inequalities are established. $N_{jk}$ can be seen as a number in based $n$ where the digits are all zeroes except the $j$ and $k$ ones, $N_{jk} = N_{j'k'} \implies j=j', k=k'$, and therefore $t$ uniquely defines $j$ and $k$ or, two pairs of zigs must have different $t$.

I still need to show that for a given pair, when $t$ is the same, the intersections are different. There are three different values of $t$, so two intersections points have the same height. This happens for

\begin{aligned} t & = \frac{n^{2j} - n^{2k}}{n^j - n^k}\\ \end{aligned}

which happens when $a_j = n^j$, $a_k = n^k$ and $a_j = n^j + n^{-n}$, $a_k = n^k + n^{-n}$. But the $x = x_j - t a_j$ value for intersections is different: $t n^j$ and $t (n^j + n^{-n})$, so there are indeed four distinct intersection points.

### 30 degrees Zigs

I could not solve this problem. Once again, my lack of intuition with geometry was to blame.

But if we have two zigs with half-lines angles $\phi$, $\phi + 30^{\circ}$ and $\theta$, $\theta + 30^{\circ}$, then for any two pairs of half-lines from the two zigs to intersect, their angles must be between $0^{\circ}$ and $180^{\circ}$. Taken together, these constraints give $30^{\circ} \lt |\phi - \theta| \lt 150^{\circ}$.

Update: The original version of this post had a lower bound of $0$. Thanks to Tailshot for pointing out the error

This means there cannot be more than $5$ such pairs (and to be honest, I would have said 4, but the book says it’s indeed 5).

### Recurrence Equations

Using the repertoire method, solve the recurrence equations

\begin{aligned} h(1) & = \alpha\\ h(2n+j) & = 4h(n) + \gamma_j n + \beta_j\\ \end{aligned}

The general form of $h(n)$ is

\begin{aligned} h(n) & = \alpha A(n) + \beta_0 B_0(n) + \beta_1 B_1(n) + \gamma_0 C_0(n) + \gamma_1 C_1(n)\\ \end{aligned}

We get three of these functions directly by solving

\begin{aligned} h(1) & = \alpha\\ h(2n+j) & = 4h(n) + \beta_j\\ h(2^m+b_m\cdots b_0) & = (1\beta_{b_m}\cdots\beta_{b_0})_4\\ \end{aligned}

So we have a solution for $A(n)$, $B_0(n)$ and $B_1(n)$.

Setting $h(n) = n$

\begin{aligned} \alpha & = 1\\ 2n+j & = 4n + \gamma_j n + \beta_j\\ \beta_j & = j\\ \gamma_j & = -2\\ \end{aligned}

which gives the equation $n = A(n) + B_1(n) -2(C_0(n) + C_1(n))$.

Setting $h(n) = n^2$

\begin{aligned} \alpha & = 1\\ 4n^2 + 4jn + j & = 4n^2 + \gamma_j n + \beta_j\\ \beta_j & = j\\ \gamma_j & = 4j\\ \end{aligned}

which gives the equation $n^2 = A(n) + B_1(n) + 4C_1(n)$

The latest gives us $C_1(n) = (n^2 - A(n) - B_1(n))/4$. To solve for $C_0$, one can either replace the value of $C_1$ in the equation for $h(n) = n$ above, or, equivalently, add twice that equation to the one for $h(n) = n^2$, which eliminates $C_1(n)$:

\begin{aligned} 2n + n^2 & = 3A(n) + 3B_1(n) -4C_0(n)\\ C_0(n) & = \frac{3A(n) + 3B_1(n) - n^2 - 2n}{4}\\ \end{aligned}

### Good and Bad Persons in Josephus Problem

It took me a while, as I was trying to find a recurrence equation of some sort which would help me with this problem and the bonus one (where Josephus’ position is fixed but he can pick $m$). Eventually I found one, which did not help me with the bonus problem, but led me to a solution for this problem.

Obviously, if we have $k$ persons and want to remove the last one in the first round, we can choose $m=k$ and that will work. Actually, any multiple $m=ak$ works as well.

This shows that at each round, if we have $k$ persons left, and we start counting on the first one, when $m=ak$ we will remove the $k^{th}$ person then start counting from the first one again.

Back to the original problem: there are $2n$ persons, and we want to get rid of the $n+1, \cdots, 2n$ first. If we take $m=lcm(n+1,\cdots, 2n)$, then for the first $n$ rounds the last (bad) person will be remove, leaving only the good ones at the end.

When first solving the problem, I picked $m=\prod_{i=1}^n (n+i)$, which has the same property as the least common multiple, but is larger. Perhaps a smaller number is better for the nerves of the participants.

### Bonus Problems

I tried to solve the bonus questions, but after repeatedly failing, I had a glimpse at the solutions: they obviously require either knowledge of later chapters, or other concepts I know nothing about, so I will get back to these bonus problems after I finish the book.

I am now working through Chapter 2. It is a much larger chapter than the first, so it will take me some time.

]]>
<![CDATA[Seven Databases in Seven Weeks CouchDB Day 3]]> 2012-02-01T18:06:00+09:00 https://blog.wakatta.jp/blog/2012/02/01/seven-databases-in-seven-weeks-couchdb-day-3 Today is a bit juicier than the previous days (together). On the menu, advanced views (full MapReduce), replication, conflict management, and change monitoring.

Advanced views in CouchDB are, as noted yesterday, materialized output of MapReduce computations.

This has a cost: such computations are saved, so they take more time than with other implementations, the first time at least.

Updating the views, on the other hand, is fairly fast (CouchDB recomputes only what is necessary). Views have to be planned, but once there they are fairly cheap. For exploratory queries, other databases might be more appropriate.

CouchDB’s reduce functions distinguishes between the first invocation, and the following ones (on values that have already gone through the reduce function). This makes it possible to implement a _count function which counts the number of values (the first invocation transforms values into numbers, and the following ones add the numbers up).

### Replication

Replication is the one-way process of replicating the changes of one database on another. Replication can be between any two databases, whether on the same server or on different ones. It can be one time, or continuous. The documents to replicate can be filtered, or selected by _id.

Replication is a lower level mechanism than what MongoDB, for instance, proposes (where there is a strict hierarchy of masters and slaves), and closer to the flexible approach or Riak.

Of course, when concurrent writes are permitted, conflicts can occur, and CouchDB handles them.

### Conflicts

Concurrent updates can cause conflicts, and CouchDB detects them so they can be dealt with.

First, conflicts cannot happen on a single server: updates to a document must refer to the latest revision, otherwise the update fails. So clients are directly aware that they need to resubmit the (merged) document.

When replication is enabled, conflicts result from concurrent updates in two replicated databases. At the next replication, one version will be selected as winning, and replicated to other databases. The other versions are still accessible from the _conflicts attribute (initially, only in the losing databases).

If two ways replications are in place, eventually, all databases will have the _conflicts attribute populated (with all the losing revisions, if there are more than one).

This makes it possible to implement a remedial action; it is possible to have views with only documents in conflicts, or to filter changes for conflicts, and implement merging actions in monitoring scripts.

### Changes

Changes are dedicated views that contains a list of updates for a specific database. The parameters support starting at a given revision (in this case, a database revision, not a document revision), filtering documents, and keeping the stream open in several ways.

This makes it possible (easy, even) to monitor (interesting or relevant) changes, to synchronize with other systems, or to automatically resolve conflicts, for instance.

When using Long-Polling, I found that one very large datasets, the JSON.parse invocation could take a long time, and would suggest to always use a limit parameter on the query, to cut the dataset down to manageable chunks.

## Exercises

### Built-in Reduce Functions

There are three, documented on the Wiki.

They are implemented directly in Erlang, so they have a better performance than JavaScript functions.

#### _sum

This function behaves just as the reduce function from the book; it sums the values by key. It is useful when the map functions uses emit(key, 1); (or some other numeric value).

#### _count

It is similar to _sum, but it counts the number of values rather than merely summing them. It is useful when the value is not a number.

#### _stat

This is an extension of _sum which computes additional statistics (minimum, maximum, …) on the numeric values.

### Filtering _changes output

Filters are nicely described in CouchDB The Definitive Guide.

To create a new filter, I first create a design document to store the function:

The by_country function retrieves a country parameter from the request, and compares it against the record country attribute; only the matching records are returned.

To monitor only updates to bands from Spain, for instance, I can use

To monitor for conflicts, I have the following design document:

With that, I can then listen for changes, keeping only the conflicts:

Because CouchDB only set the _conflicts attribute on the losing database; the winner database (the one in which the winning revision was initially created) does not know about conflicts. This means I must check against music-repl instead of music.

### Replication HTTP API

The API is documented here.

To use it, simply pass the source and target databases to the _replicate URL:

### _replicator database

The _replicator database is an alternative to the use of the _replicate URL above: documents inserted in the _replicator database will, if properly formed, cause a replication job to be started (either one-off, or continuous).

Deleting the document will cancel the replication job.

Document describing replications are updated to reflect the progress of the job.

The command below triggers a replication from music to music-repl:

Using the watch_changes_longpolling_impl.js script on the _replicator database, it is possible to monitor the replication job:

The first change is when the document is created; the second when the job starts, and the third when it successfully completes.

Unlike the _replicate based API, continuous jobs stored in _replicator will resume when the database is restarted.

### Continuous watcher skeleton

The approach is to keep input in a buffer, then extract as many line from the buffer as possible (if the last line is incomplete, it is put back into the buffer), and parse each line as a JSON object.

The format of each parsed object is different: each change is in its own object, so there is no results attribute any more.

### Continuous watcher implementation

I just inserted the code block above in the original watch_changes_skeleton.js; no other modifications were required.

With the code block above, both the long polling and the continuous outputs are identical.

### Conflicts view

As I said above, conflicts are only created in the losing database, so to test this I must use the music-repl database.

Otherwise, the code is simple: iterate on the _conflicts attribute, and for each revision it contains, emit that revision mapped to the document _id:

Testing it:

And this completes Day 3 and this overview of CouchDB.

]]>
<![CDATA[Seven Databases in Seven Weeks CouchDB Day 2]]> 2012-01-30T18:58:00+09:00 https://blog.wakatta.jp/blog/2012/01/30/seven-databases-in-seven-weeks-couchdb-day-2 Day 2 is about Views in CouchDB, which serve as an introduction to the more general MapReduce support.

It is another fairly short day, as much of this section is actually about the complexities of XML parsing…

Like Riak and MongoDB, CouchDB is scripted with JavaScript, so today has a feeling of déjà vu.

### View concept

A View is just a mapping of a key to a value. Keys and values are extracted from documents; there can be more than one key for each document, as in MongoDB.

Once the view has been built and updated for the documents it applies to, it can be accessed by key using optimized methods (all based on some form of lexicographical order).

### View performance

A View in CouchDB is essentially the equivalent of a materialized view in relational databases.

Access to the view causes it to be updated (i.e. recomputed) if necessary, which can be a painfully slow experience. I had imported the whole content of the music database (26990 records), and each time I tested a Temporary View or saved a Permanent one, I had to wait for CouchDB to finish the refresh (fortunately not too long on this dataset).

It interesting to note that while relational databases require the schema to be designed ahead of time, but support arbitrary queries, CouchDB let you ignore the schema, but need you to design the queries ahead of time.

## Exercises

### emit function

The key can be any JSON object, although I would say that only strings and arrays of strings have sensible semantics.

Arrays can be used with reduce functions to provide query time custom grouping, as explained here.

For instance, to compute the number of records by date, I used the releasedate of each album to create a key array [year, month, date], and a value of 1 (1 for each album):

As I intend to use grouping, I also need a reduce function:

Each document in the view is now a date as an array, with a single number for the record made that date (there are as many identical keys as there were records for a given day).

When querying, by default, the reduce function will be called on identical keys to get a single value:

(month is 0 based…)

With the group_level parameter, I can control whether I want to group by day (group=true or group_level=3, as above), by month (group_level=2), or year (group_level=1):

### View request parameters

There are quite a few of them listed here.

### Random artist script

The code is essentially the same as the one mapping names to ids, but here it associates random to name.

### Random artist URL

The URL below returns the first artist whose random number is greater than the random one generated by Ruby.

As expected, if given a value too large (for instance, 1), the query returns nothing:

### Random everything

The code of each script is similar, in a way Russian Dolls are similar: each one is an extension of the previous, digging deeper into the nested structure of the original document.

Testing:

Testing:

#### Random Tag

Testing:

And that’s it for Day 2.

]]>
<![CDATA[Seven Databases in Seven Weeks CouchDB Day 1]]> 2012-01-30T13:57:00+09:00 https://blog.wakatta.jp/blog/2012/01/30/seven-databases-in-seven-weeks-couchdb-day-1 Another beta version of the book, finally with the chapter on CouchDB. I was going through the Redis chapter, but the third day uses other databases, in particular CouchDB. So I’ll get back to Redis after I’m done with CouchDB.

Today is just a short introduction: CouchDB is (yet another) key-value store; it has a ReST API, stores JSON data, and, like Riak, only supports full updates. Unlike Riak, however, it does not support concurrent updates; instead it requires the client to only update from the latest version of the data.

I thought at first that the data was versioned, like in HBase, but this is not the case: the version id (_rev) is there to ensure that updates occur sequentially, not concurrently. CouchDB can keep previous versions of documents, but the retention is unreliable as explained here.

Besides the HTTP based ReST API, CouchDB also provides a web interface; among other tools, there is a complete test suite, which is always nice to check the installation.

## Exercises

### CouchDB HTTP Document API documentation

The documentation is here; there is also a reference

### HTTP commands

Besides the basic CRUD POST GET PUT and DELETE, there is also HEAD (for basic information on a document):

When using cURL, the command HEAD must be used with the flag -I, otherwise cURL will wait (endlessly) for data after the headers.

Finally, there is a COPY command, which as expected copies a document (without having to retrieve it first):

### PUT a new document with a specific _id

It is just a matter of specifying an id when creating the document:

### Document with a text attachment

To create an attachment, it is necessary to know the version of the document, as it is considered an update. The URL for the attachment is just the URL for its document, with any suffix (the suffix naming the attachment). The _rev is specified by passing a rev parameter.

Using the document with _id ‘beatles’ created above, the attachment is uploaded with:

The document now has a new _rev.

To retrieve the attachment, just use its URL:

(the line breaks have been lost…)

Onward to Day 2!

]]>
<![CDATA[Seven Databases in Seven Weeks Redis Day 2]]> 2012-01-21T13:30:00+09:00 https://blog.wakatta.jp/blog/2012/01/21/seven-databases-in-seven-weeks-redis-day-2 Performance tuning with Redis can be achieved in different ways, as we see today. First there are basic changes in the client side (such as pipelines), then configurations options (frequency of saves, …), and finally distribution of load.

### Pipeline

Redis low level protocol supports the notion of pipelines: sending commands in batch, and collect all the results at the end, instead of waiting for results between each command. This should save a round trip delay for each command, so there can be huge performance boosts for specific usages, as the informal benchmarks below show.

### Distributed Redis

Redis servers can be distributed for performance or memory concern, but much of the work falls on the client side.

#### Slaves

Slaves in Redis are just the opposite of MongoDB’s. Whereas MongoDB’s slaves are meant to be written to, so that updates are automatically pushed to the master, Redis slaves are, or should be, read-only. Updates are only propagated from master to slaves.

There is no integrated support for failover; it has to be implemented in client code.

So slaves are mainly a mechanism to distribute reads; combined with monitoring client code, they can also be used to data replication and failover.

Note that each slave needs as much memory as the master, as it contains the same data.

#### Sharding

By itself, Redis does not support sharding, and relies on the client library to spread accesses over several instances. There is a ongoing development to have real Redis Clusters, but for the time being it has to be simulated.

One issue not mentioned in the book is that sharding breaks transactions and pipelines: there is no guarantees that the relevant keys are all in the same instance, so the Redis Ruby client, for instance, will raise an exception when invoking MULTI.

The Java client, Jedis, has a mechanism to “tag” a key such that keys with the same tag are guaranteed to be on the Redis server. This makes the distribution of keys predictable, and allows the use of transactions (provided all the involved keys have the same tag).

This shows that not only this is a client side feature, but the actual extent of the feature may vary widely. And of course, there is no reason to think that different clients will shard keys the same way.

Properly setup, sharding will distribute the data over each node, reducing the memory load of each node.

## Exercises

### Performance tests

I first tried to rewrite the code in Java, to measure the cost of Ruby’s convenience. The code in Java is clumsier than in Ruby, but it ran a bit faster (105 seconds instead of 155 seconds for the Ruby version using hiredis).

Using pipelines, the difference was 11 seconds against 26 seconds (again, the Ruby version is using hiredis).

Disabling snapshots and append only file did not improve the time significantly compared to the default (snapshots but no append only file).

Enabling the append only file and setting it to always was almost 3 times as slow for the pipelined Java version (27 seconds). For the original Ruby version (with hiredis), it was even worse (1101 seconds). This means the overhead of writing to file can be mitigated with pipelines.

To recap: disabling snapshots did not improve performance measurably, but enabling append only file always degrades the performance significantly; using pipelines makes it a bit better, but it is still much slower.

### URL Shortening Service

The exact setup to implement is not described, so what I did is to distribute data between two shards of one master and two slaves.

There is no direct support for such a layout in Jedis (nor, as far as I can tell, in the Ruby library), so I had to write some of it myself.

As always with Redis, the writes are restricted to the masters, and the reads are distributed over the slaves (and the masters as well, if needed).

#### Distribution over slaves

Jedis does not support slaves directly. What the documentation proposes is to have a dedicated client to the master to write on, and a sharded pool to the slaves. However, such an approach would be difficult, as I need to shard the writes to the masters as well (I would have to use a different sharding algorithm, and manage the routing of commands through the tree of Redis instances).

Fortunately, Redis user Ingvar Bogdahn had posted an implementation of a Round Robin pool of slaves. This implementation manages a connection pool to a master, and another connection pool to a set of slaves. The commands are properly distributed: all the write commands are sent to the master, and the reads commands are distributed over the slaves.

I had to fix the code in some places: a command implementation was missing, another was incorrect, and finally the password was never sent to the master, causing authentication errors. But the bulk of the code is Ingvar’s, and I was glad to use it.

The classes are

#### Sharding

Sharding is directly supported by Jedis, but as organized the code is restricted to a set of clients to specific instances.

There are basic, generic classes (Sharded, ShardInfo, …) that can be used to implement sharding of arbitrary clients (such as the Round Robin pool above), but it requires a lot of tedious code to map each command to a method on the right shard. Worse, such code would be the same for every kind of shard.

So I first wrote generic classes that implement sharding in terms of generic Jedis client; the actual implementation is then much simpler (just the constructors, and the few commands that cannot be sharded, such as disconnect or flushAll).

#### Service

The code for the service itself is now fairly small. JedisClient is the class that builds the tree of sharded master/slaves pools. It is loaded and initialized as a Spring bean. The web services are JSR 311 services, running over Jersey, and loaded and initialized by Spring.

Admin let the user defines a keyword for a specific URL, and Client extracts a keyword from the request URL, retrieves the URL for the this keyword, and returns a request to redirect to this URL.

Once deployed (on Apache Tomcat), it can be used in a browser or on the command line:

and for clients:

The code for the whole project can be found on Github.

And this completes Day 2.

]]>
<![CDATA[Seven Databases in Seven Weeks Redis Day 1]]> 2012-01-20T16:52:00+09:00 https://blog.wakatta.jp/blog/2012/01/20/seven-databases-in-seven-weeks-redis-day-1 After a long winter hiatus, the elves at Pragmatic Bookshelf delivered a late but welcome present: the third beta of Seven Databases in Seven Weeks. The book is not complete yet (the chapter on CouchDB is still missing), but it now covers Redis.

Redis is basically a key-value store, like Riak, but while Riak is agnostic about the values, Redis values can be data structures (lists, queues, dictonaries, …, or even messaging queues). This allows Redis to act as a synchronized shared memory for cooperating applications.

### Complex Datatypes

Redis values can have structure, and specific commands manipulate these values in appropriate ways. Redis supports strings, which can also behave as numbers if they have the right format, lists which can also be seen as queues, and support blocking reads, sets, hashes (that is, dictionaries), and sorted sets.

### Transactions

All Redis commands are atomic, and it is possible to group a sequence of commands into a transaction for an all or nothing execution with the command MULTI. But a Redis transaction is not similar to a transaction in relational databases: it just queues all the commands and executes them when it receives the EXEC command. This means it is not possible to read any data while in a transaction.

### Expiry

Perhaps nothing labels Redis as a datastore for transient data more than expiry: keys can be marked for expiration (either relative from the current time, or absolute).

### Messaging

Redis also supports messaging but this is a topic for Day 2.

This post has a more detailed but still balanced coverage of Redis.

## Exercises

### Redis command documentation

The documentation is well done and easy to navigate. Of all the databases I have seen so far, this is probably the base (PostgreSQL being a strong second).

### Create a Redis client

I’m using Java and the Jedis client library.

The code is simple enough:

The pom.xml file:

### Create a pair of Redis clients

This one is simple as well, but having a reader and a writer allowed me to try one writer and two readers.

First the writer program:

The poml.xml is a bit more complex, as it creates a self-contained jar with MANIFEST.MF (so I can run it from the command line easily):

with its pom.xml:

The blpop command can block on several lists, so when it receives something it is always at least a pair: the list key, and the value.

Now, I can open three terminals to test the code: two with readers:

and one with the writer (which must be started last):

The writer will simply state

One of the readers will get the message:

but the other one will just keep waiting:

So Redis blocking queues can only server one blocking reader at a time (as it should).

The reader programs can be stopped with Ctrl-c, or by pushing finish into msg:queue from a Redis client (twice, once for each client):

And that’s all for today.

]]>
<![CDATA[Concrete Mathematics Repertoire Method]]> 2012-01-14T13:33:00+09:00 https://blog.wakatta.jp/blog/2012/01/14/concrete-mathematics-repertoire-method The repertoire method is never really explained in the book, or anywhere else I could find on the Internet. There are a couple of posts on this subject, so I though I should add mine.

The repertoire method is really a tool to help with the intuitive step of figuring out a closed formula for a recurrence equation. It does so by breaking the original problem into smaller parts, with the hope they might be easier to solve.

### Why it works

Let’s assume we have a system of recurrence equations with parameters, so that the unknown function can be expressed as a linear combination of other (unknown) functions where the coefficients are the parameters:

\begin{aligned} g(1) & = b(0, \alpha_1, \cdots, \alpha_m)\\ g(n) & = r_n(g_1, \cdots, g_{n-1}, \alpha_1, \cdots, \alpha_m)\\ & = \sum_{i=1}^m A_i(n)\alpha_i, \end{aligned}

We can consider $g$ as a specific point in a $m$-dimensional function space (determined by both the recurrence equations, and the parameters), and because $g$ is a linear combination, we can try to find $m$ base functions (hopefully known or easy to compute) $f_k(n) = \sum_{i=1}^m A_i(n)\alpha_{i_k}$ with $1 \le k \le m$, expressed in terms of $m$ linearly independent vectors $(\alpha_{1_k},\cdots,\alpha_{m_k})$.

In other words, if we can find $m$ linearly independent parameter vectors such that, for each, we have a known solution $f_k(n)$, then we can express the function $g$ as a linear combination of $f_k(n)$ for any parameters (because the $m$ $f_k(n)$ form a base for the $m$-dimensional function space defined by the recurrence equations).

### How it works

First, we need to check that the recurrence equations accept a solution expressed as

\begin{aligned} g(n) & = \sum_{i=1}^m A_i(n)\alpha_i \end{aligned}

It is enough to plug this definition into the recurrence equations, and make sure the different parameters always remain in different terms.

Then we can either solve $f(n) = \sum_{i=1}^m A_i(n)\alpha_i$ for known $f(n)$, or for known $\alpha_i$ parameters, as long as we end up with $m$ linearly independent parameter vectors (or, as it is equivalent, $m$ linearly independent known functions for specific parameters).

It is important to keep in mind that a solution can be searched from both direction: either set a function and try to solve for the parameters, or set the parameters and solve for the function.

### Homework exercise

Given

\begin{aligned} g(1) & = \alpha\\ g(2n+j) & = 3g(n) + \gamma n + \beta_j&&\text{for j=0, 1 and n \gt 1}\\ \end{aligned}

We need to check that $g$ can be written as

\begin{aligned} g(n) & = \alpha A(n) + \beta_0 B_0(n) + \beta_1 B_1(n) + \gamma C(n)\\ \end{aligned}

The base case is trivial. The recurrence case is

\begin{aligned} g(2n) & = 3g(n) + \gamma n + \beta_0\\ & = 3(\alpha A(n) + \beta_0 B_0(n) + \beta_1 B_1(n) + \gamma C(n)) + \gamma n \beta_0\\ & = \alpha 3A(n) + \beta_0 (3 B_0(n) + 1) + \beta_1 3B_1(n) + \gamma (3C(n) + n)\\ g(2n+1) & = 3g(n) + \gamma n + \beta_1\\ & = 3(\alpha A(n) + \beta_0 B_0(n) + \beta_1 B_1(n) + \gamma C(n)) + + \gamma n\beta_1\\ & = \alpha 3A(n) + \beta_0 3 B_0(n)+ \beta_1 (3B_1(n) + 1) + \gamma (3C(n) + n)\\ \end{aligned}

so $g$ can be expressed as a linear combination of other functions, with the parameters as the coefficients.

Now, when I tried to solve this problem, I didn’t know I could set the parameters to values that would lead to an easy solution ($\gamma = 0$ turns the problem into an easy to solve generalised radix-based Josephus problem); instead I wasted a lot of time trying to find known functions and solve for the parameters, which is why I have four steps below instead of just two as in the book.

#### $g(n) = n$

As the book suggests, I tried to solve for $g(n) = n$:

\begin{aligned} 1 = g(1) & = \alpha&&\alpha = 1\\ 2n = g(2n) & = 3g(n) + \gamma n + \beta_0\\ & = 3n + \gamma n + \beta_0&&\gamma = -1, \beta_0 = 0\\ 2n+1 = g(2n+1) & = 3g(n) + \gamma n + \beta_1\\ & = 3n - n + \beta_1&& \beta_1 = 1\\ \end{aligned}

#### $g(2^m+l) = 3^m$

As the recurrence equation looks like the generalised radix-based Josephus equation, I tried to solve for $g(2^m+1) = 3^m$:

\begin{aligned} 1 = g(1) & = \alpha&&\alpha = 1\\ 3^m = g(2^m+2l) & = 3g(2^{m-1}+l) + \gamma (2^{m-1} + l) + \beta_0\\ & = 3\cdot 3^{m-1} + \gamma (2^{m-1} + l) + \beta_0&& \beta_0, \gamma = 0\\ 3^m = g(2^m+2l+1) & = 3g(2^{m^1}+l) + \gamma (2^{m-1} + l) + \beta_1\\ & = 3\cdot 3^{m-1}&&\beta_1 = 0\\ \end{aligned}

#### $g(n) = 1$

I tried to solve for $g(n) = 1$, as it seemed useful to solve for a constant (no linear combination of linearly independent non-constant functions can produce a constant function).

\begin{aligned} 1 = g(1) & = \alpha&& \alpha = 1\\ 1 = g(2n+j) & = 3g(n) + \gamma n + \beta_j\\ & = 3 + \gamma n + \beta_j&& \gamma = 0, \beta_j = -2\\ \end{aligned}

#### $\alpha, \beta_1 = 1, \beta_0, \gamma = 0$

This is the step that took me the longest, and when I finally understood I could fix the parameters, I was able to use the radix-based Josephus solution.

The recurrence equations

\begin{aligned} g(1) & = 1\\ g(2n) & = 3g(n)\\ g(2n+1) & = 3g(n) + 1\\ \end{aligned}

have as solution $g(2^m + (b_m\cdots b_0)) = 3^m + (b_m\cdots b_0)_3$.

#### Solving for $g(n)$

We have the equations

\begin{aligned} A(n) - C(n) & = n\\ A(2^m + l) & = 3^m\\ A(n) -2(B_0(n) + B_1(n)) & = 1\\ B_1(2^m+l) & = h_3(l)&&\text{where } h_3(b_m\cdots b_0) = (b_m\cdots b_0)_3\\ \end{aligned}

We have two functions already defined ($A(n)$ and $B_1(n)$), and the other two equations give us the remaining function.

Now we can solve for $g(n)$:

\begin{aligned} g(2^m+l) = \alpha 3^m & + \beta_0 (\frac{3^m - 1}{2} - h_3(l))\\ & + \beta_1 h_3(l) \\ &+ \gamma (3^m + h_3(l) - 2^m - l) \end{aligned}

The $\gamma$ term is really $h_3(n) - n$.

The $\beta_0$ term is the same as $h_3(2^m-1-l)$, as can be seen by observing that in base $3$, $3^m$ is $1$ followed by $m$ zeroes, so $3^m-1$ is $m$ twos, and $\frac{3^m-1}{2}$ is $m$ ones, in other words the same representation as the binary representation of $2^m-1$.

Now, the binary representation of $l$ is the same as the representation in base $3$ of $h_3(l)$ (by definition of $h_3$), so the binary representation of $2^m-1-l$ is the same as the representation in base $3$ of $\frac{3^m-1}{2} - h_3(l)$.

With these two observations, it is possible to rewrite $g$ as

\begin{aligned} g(1b_m\cdots b_0) & = (\alpha\beta_{b_m}\cdots\beta_{b_0})_3 + \gamma ((1b_m\cdots b_0)_3 - (1b_m\cdots b_0)_2) \end{aligned}

which is the book solution.

### Faster solution

It is enough to solve for $\alpha, \beta_0, \beta_1 \ne 0, \gamma = 0$, and to find the parameters for $g(n) = n$. The first gives $A$, $B_0$ and $B_1$ directly by the generalised radix-based Josephus solution, and the second one adds a constraint to solve for $C$ as well.

### Wrapping up

As can be seen above, approaching the problem from both directions (solving for known functions and solving for known parameters) can result in time saved, and simplified expression of the solution.

]]>
<![CDATA[Concrete Mathematics Chapter 1 Homework Exercises Part 2]]> 2012-01-14T12:14:00+09:00 https://blog.wakatta.jp/blog/2012/01/14/concrete-mathematics-chapter-1-homework-exercises-part-2 I finally finished the homework exercises.

## Homework Exercises Part 2

### Generalized Tower of Hanoi

To solve this, I first observed that for $n=1$, we need $m_1$ moves, and for $n \gt 1$, we need $A(m_1, \cdots, m_{n-1}) + m_n + A(m_1, \cdots, m_{n-1})$ or $2A(m_1, \cdots, m_{n-1}) + m_n$ moves.

\begin{aligned} A(m_1, \cdots, m_n) &= \sum_{i=1}^n m_i 2^{n-i}\\ \end{aligned}

which is trivially shown by induction. The base case:

\begin{aligned} A(m_1) & = \sum_{i=1}^1 m_i 2^{1-i}\\ & = m_1 2^0\\ & = m_1 \end{aligned}

And for larger $n$, assuming $A(m_1, \cdots, m_n) = \sum_{i=1}^n m_i 2^{n-i}$,

\begin{aligned} A(m_1, \cdots, m_{n+1}) & = 2A(m_1, \cdots, m_n) + m_{n+1}&&\text{by definition}\\ & = 2\sum_{i=1}^n m_i 2^{n-i} + m_{n+1}&&\text{induction hypothesis}\\ & = \sum_{i=1}^{n} m_i 2^{n+1-i} + m_{n+1} 2^{0}\\ & = \sum_{i=1}^{n+1} m_i 2^{n+1-i}\\ \end{aligned}

### Zig-zag lines

A geometric problem, but very similar to the previous intersecting lines. A zig-zag is made of 3 segments, so a pair of zig-zag lines can intersect at 9 different points. The first zig-zag line defines two regions; each new zig-zag adds a new region, plus one more for each intersection point.

This gives the following recurrence equations:

\begin{aligned} ZZ_1 & = 2\\ ZZ_n & = ZZ_{n-1} + 9(n-1) + 1\\ \end{aligned}

Using the linearity of the recurrence equation, it is easy to see that

\begin{aligned} ZZ_n & = ZZ_1 + 9S_{n-1} + (n-1) \end{aligned}

Here I used the linearity to compute solutions to both $ZZ_n = ZZ_{n-1} + 9(n-1)$ and $ZZ_n = ZZ_{n-1} + 1$, which are equally trivial. Then I combined the solutions into one.

I use (again) induction to confirm the solution. The base case is $ZZ_1 = ZZ_1 + 9S_0 + 0$. And for other $n$, assuming $ZZ_n = ZZ_1 + 9S_{n-1} + (n-1)$

\begin{aligned} ZZ_{n+1} & = ZZ_{n} + 9n + 1&&\text{by definition}\\ & = ZZ_1 + 9S_{n-1} + (n-1) + 9n + 1&&\text{induction hypothesis}\\ & = ZZ_1 + 9(S_{n-1} + n) + (n-1+1)\\ & = ZZ_1 + 9S_n + n \end{aligned}

The formula can also be written as

\begin{aligned} ZZ_n & = \frac{9n^2-7n+2}{2} \end{aligned}

### Planes cutting cheese

Again, a geometric problem. This one gave me more trouble. It took me a while before finally seeing that a new plane intersection with the previous ones will be a set of intersecting lines which defines the regions the new plan will divide in two.

The number of regions formed by intersecting lines was solved in the book, and defined as $L_n = S_n + 1$

So a plane cutting $n$ existing planes will define $P_{n+1} = P_n + L_n$ new regions. This recurrence gives $P_5 = 26$ regions.

The book did not expect a closed formula for this exercise, as the necessary techniques are only covered in chapter 5.

### Josephus co-conspirator

The recurrence equation for $I(n)$ follow the structure of $J(n)$, but with different base cases:

\begin{aligned} I(2) & = 2&&\text{I(1) is not defined}\\ I(2n) & = 2I(n) - 1\\ I(2n+1) & = 2I(n) + 1 \end{aligned}

Here I generated the first few values to get inspired. I noticed that $I(n)$ had increasing odd values for batches that were longer than for $J(n)$: $3, 6, 12, 24, \cdots$.

These numbers are from the series $3\cdot 2^m$, so using the same “intuitive” step as in the book, I tried to show that $I(3\cdot 2^m + l) = 2l + 1$ with $0 \le l \lt 3\cdot 2^m$ (the formula does not work for $I(2)$, which has to be defined separately).

By induction on $m$: the base case is $I(3) = I(3\cdot 2^0 + l) = 1$.

Assuming $I(3\cdot 2^m + l) = 2l+1$, we have

\begin{aligned} I(3\cdot2^{m+1} + 2l) & = 2I(3\cdot 2^m + l) -1&&\text{by definition}\\ &= 2(2l+1) -1&&\text{induction hypothesis}\\ &= 4l+2-1\\ &= 2(2l)+1\\ I(3\cdot 2^{m+1} + (2l+ 1)) & = 2I(3\cdot 2^m + l) + 1&&\text{by definition}\\ & = 2(2l+1) + 1&&\text{induction hypothesis}\\ \end{aligned}

The book solution is defined in terms of $2^m+2^{m-1}+k$, which is same:

\begin{aligned} 2^m+2^{m-1}+k & = 2\cdot 2^{m-1} + 2^{m-1} + k\\ & = 3\cdot 2^{m-1} + k \end{aligned}

with $1 \le m$, while I have $0 \le m$.

### Repertoire method

I put the repertoire method in its own post as it was both the most difficult exercise and the one where I learned the most.

]]>
<![CDATA[ANTLR3 Maven Plugin - Eclipse Setup]]> 2012-01-14T11:25:00+09:00 https://blog.wakatta.jp/blog/2012/01/14/antlr3-maven-plugin-eclipse-setup Setting up Eclipse and Maven is getting easier, but some cases require a bit more search and work. As I was experimenting with the ANTLR Maven plugin, I found the default behaviour to be pretty much useless: Eclipse knew nothing about the grammar files or the generated classes, so the rest would not compile; even after adding the relevant source folders I still had to run explicit Maven commands after modifying the grammar files and refresh the workspace…

I eventually found a better way, which I document here.

There is an antlr3-maven-archetype, which I started from. However, for the purpose of clarity, I will start from scratch here.

### Installing m2e

The Maven plugin for Eclipse is called m2e (m2eclipse is an obsolete version), and is available in the default Eclipse Marketplace. However, the current version (1.0 at the time of writing) does not handle the life cycle of some common Maven plugins very well. In particular, it does not know where to put the generation of classes from grammar files into the Eclipse life cycle.

The 1.1 milestone does it much better, so I suggest to install it. The location is http://download.eclipse.org/technology/m2e/milestones/1.1, which can be used for the “Install New Software” function.

### Creating a project with ANTLR

Create a new Maven Project, and skip the archetype selection (i.e. use simple project). As I said above, I could use the ANTLR v3 archetype, but chose not to.

#### Optional: set the target option

By default Maven uses compiler source and target version 1.5. On Mac OS X Lion, there is no JDK 1.5 (only 1.6), so I always update pom.xml to set the source and target configuration options to something meaningful:

I create a property for the ANTLR version, as I will need for both the ANTLR plugin and the jar:

Then I add the plugin declaration

Finally I add the dependency to the ANTLR runtime:

At this stage, Eclipse is upset because the lifecycle configuration org.antlr:antlr3-maven-plugin:3.4:antlr is not covered. But as we’re using m2e 1.1, we can look for the appropriate connector in the m2e Marketplace. There should be only one: antlr by Sonatype, which should be installed.

#### Packaging the ANTLR runtime with the code

This is something that the original ANTLR v3 Maven archetype suggests: to include the ANTLR runtime into the generated jar.

Using the Maven Assembly Plugin, it is possible to declare what goes into the generated jar. As it is self-contained, it is also possible to declare a main class (not done below as I did not have a main class yet):

#### Tuning the Eclipse project

Now, the ANTLR plugin can process code under src/main/antlr3, so we can create this folder, and add it as source folder in the Eclipse project properties. Creating or updating a grammar file in Eclipse will also create or update

The ANTLR connector also added the target/generated-sources/antlr3 directory as another source folder, but it will disappear when executing the Maven/Update Project Configuration action, so it is best to add it manually. You can then change the properties for this folder to check ‘Locked’ (to avoid accidental edition) and ‘Derived’ (to hide the content from the “Open Resource” command).

Note that the plugin is unable to follow the @header directive properly (that is, it will copy the directory structure of the grammar file, instead of following the directory structure implied by the @header directive), so the grammar files must use the same directory structure as the Java package intended for the generated classes. In other words, if you want your generated classes to have the package org.something, you both need to put the grammar files under src/main/antlr3/org/something, and use the @header package directive to set the package of the generated classes.

It is also unable to handle grammar files directly under src/main/antlr3. If you try, it will generate this error: “error(7): cannot find or open file: null/NestedNameList.g” when running the process-sources goal. Running this goal is also the only way to get the error message if something is wrong with the grammar file (unless you install an ANTLR Eclipse plugin, which I didn’t try).

Small gotcha: I found that with the current version of plugins, connectors and so on, Eclipse does not detect changes to generated classes directly: it is always one change behind, especially when there are errors.

If you made a mistake in the grammar file that causes the generated classes not to compile anymore, you would have to change the grammar file twice for the error markers to go away; the first time, Eclipse will correctly report that the errors in the classes are gone, but the project error markers will stay; the second change (even if you changed nothing, just add a character, delete it, and save), and the error markers will finally disappear.

This is more annoying than really a serious problem, and in any case the files are always properly generated, so if there is no error, all files are kept up-to-date.

##### Automating the above steps

If you include the build-helper-maven-plugin plugin in your pom.xml, then it is possible to automatically add the relevant source folders to Eclipse:

To use it, another connector is necessary, but it is found directly in the m2e Marketplace.

Once in the pom.xml, just importing the project into Eclipse will create the relevant source folders automatically. However the ‘Locked’ and ‘Derived’ flags on the target/generated-sources/antlr3 folder are stored in the workspace .metadata, so these flags have to be set manually for each workspace.

### The easier way

If all the above seems tedious, it is because it is. The antlr3-maven-archetype will generate much of it, but not for instance the additional source folders.

I have the kind of laziness that causes me to spend hours trying to save a few minutes later on, so I created my own archetype, a trivial little thing whose only purpose is to get the basic setup in place quickly.

It does not really do much, and perhaps should best seen as a template, which is why the best use is to download it, adjust it to your own need, then install it locally.

Hope this helps.

]]>