Wakatta!

Like Eureka!, only cooler

Psychic Modeling (Fast Version)

In Psychic Modeling, I described a reasonably understandable implementation of a ticket generator for the Psychic Modeling Problem. While this version is not overly slow, it is not amazingly fast either.

As I’m refreshing my C skills, I thought it would be interesting to try and implement a version as fast as possible.

Design

I represent a subset as bit patterns in a 32-bits integer. This means I am limited to 32 different values (in other words, $n$ must be no larger than 32). The upside is that I have extremely fast intersection (&) and union (|) operations, among others.

Memory Management

I use a work memory allocated at the beginning of the search; additional memory is allocated on the stack (using C99 features), and the selected tickets are just printed to avoid having to remember them.

The work memory is large enough to store $1 + \beta$ times a block large that can hold the complete set of $j$-subsets. The first block keeps the remaining $j$-subsets, and there’s an extra block for each random ticket: each time (for a total of $\beta$) a random ticket is generated, the $j$-subsets that are not covered yet is computed for this ticket; after I have generated $\beta$ tickets, I copy the work block of the best one over the first one.

I could have use just 3 blocks, a reference, the best so far, and one for the current random ticket, and copy from the current to the best each time the current ticket is better. There would be more copy operations, but perhaps less movement between the cache and the memory. The current design requires less than 2M, and only one copy operation per random ticket.

Non portable features

I am using a few GCC built-in bit-level operations (number of bits, index of least significant 1 bit, and count of trailing zeroes); Bit Twiddling Hacks and Hacker’s Delight have portable alternatives.

I also use /dev/random as a source of random numbers; replacing dev_random by random would restore portability (but the output would always be the same, and the random state is reset when the program starts).

Performance

So, is it fast?

1
2
3
4
5
$ time ./psychic 18 10 7 6 > output.txt 

real    0m0.289s
user    0m0.238s
sys     0m0.050s

The program found 71 tickets covering all 7-subsets with at least 6 numbers in less than a second. Even when the conditions are not that good, it remains fast:

1
2
3
4
5
$ time ./psychic 18 7 7 6 > output.txt 

real    0m5.445s
user    0m5.007s
sys     0m0.430s

Here it generated 1077 tickets using the smaller ticket size from Younas and Skiena paper; the paper had a 1080 tickets solution, so my version is effective.

Of course, it would be useless and unfair to compare the speed of this version against the numbers from the paper; more relevant is the difference with the Haskell version: while the latter was not meant to be fast, it is hundreds of times slower. I suppose it would be interesting to try and make it faster, but I suspect it would be just as ugly or uglier than the C version. And I like to keep using Haskell as a design and exploratory tool.

Overview of the code

solve

The main function, solve, is more complex than in the Haskell version. It allocates the work memory, and fills it with init. A first ticket is used in init to filter out $j$-subsets.

Then the loop for the other tickets starts. It of course stops when there are no remaining $j$-subsets.

The subset of remaining numbers is computed with funion (fold union), and the digits array prepared to be used in sample. It consists of the individual bits of the number representing the remaining numbers subset. It is computed by repeatedly isolating the rightmost 1 bit (with d & -d), then clearing this bit (with d &= d -1).

A first ticket is randomly generated and its uncovered set computed. It is also set as the best new ticket (and indeed is the best so far). Then for the remaining $\beta-1$ new tickets, the uncovered set is computed as well, and if the new set is smaller than the best’s, the new ticket becomes the best as well.

The best ticket is printed, the main work memory is updated with the best uncovered set, and if there are any remaining $j$-subsets to find, we loop.

solve
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
void solve(UINT n, UINT k, UINT j, UINT l)
{
    UINT r = combi(n, j);
    UINT *work = malloc(sizeof(UINT) * (BETA+1) * r);

    UINT t = first_perm(k);

    show_set(t);

    r = init(r, j, l, t, work);

    while (r) {
        UINT d = funion(r, work);
        int bc = bits_count(d);

        UINT digits[bc];

        int i = 0;

        while (d) {
            digits[i++] = d & -d;
            d &= d - 1;
        }

        UINT best_ticket, best_remaining, best_pos = 0;

        best_ticket = sample(k, bc, digits);
        best_remaining = check_cover(r, l, best_ticket, work, work+r);

        for (UINT i = 1; i < BETA; ++i) {
            UINT new_ticket = sample(k, bc, digits);
            UINT new_remaining = check_cover(r, l, new_ticket,
                                             work, work+((1+i)*r));

            if (new_remaining < best_remaining) {
                best_ticket = new_ticket;
                best_remaining = new_remaining;
                best_pos = i;
            }
        }

        show_set(best_ticket);
        if (best_remaining)
            memcpy(work, work+(best_pos+1)*r, sizeof(UINT) * best_remaining);
        r = best_remaining;
    }

    free(work);
}

init

init’s purpose it to avoid wasting a loop over the $j$-subsets by merging the generation of $j$-subsets with the coverage of a first permutation (defined as [1..k] in solve). The returned value is not size of the not yet covered set of $j$-subsets.

If all tickets had to be generated randomly, 0 could be passed instead of a ticket to keep all $j$-subsets.

init
1
2
3
4
5
6
7
8
9
10
11
12
13
UINT init(UINT c, UINT n, UINT l, UINT k, UINT w[])
{
    UINT i = 0;
    UINT v = first_perm(n);

    while (c--) {
        if (bits_count(v & k) < l)
            w[i++] = v;
        v = next_perm(v);
    }

    return i;
}

check_cover

check_cover has a similar design as init, but reads the $j$-subsets from the work memory from instead of generating them.

check_cover
1
2
3
4
5
6
7
8
9
10
UINT check_cover(UINT r, UINT l, UINT t, UINT from[], UINT to[])
{
    UINT i = 0;

    for (UINT j = 0; j < r; ++j)
        if (bits_count(from[j] & t) < l)
            to[i++] = from[j];

    return i;
}

sample

sample is very similar to the Hashell version (indeed they are both based on the same algorithm); here the digits array plays the role that ds played in the Haskell version.

sample
1
2
3
4
5
6
7
8
9
10
11
12
13
UINT sample(UINT k, UINT n, UINT ds[]) {
    if (k == 0)
        return 0;
    if (n == 0)
        return 0;

    UINT s = sample(k-1, n-1, ds);
    UINT p = ds[randomR(n)];
    if (p & s)
        return ds[n-1] | s;
    else
        return p | s;
}

next_perm

The next_perm is from Bit Twiddling Hacks, and explained here.

next_perm
1
2
3
4
5
UINT next_perm(UINT v)
{
    UINT t = v | (v - 1);
    return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}

Compiling and running

Using gcc, the necessary option is -std=c99 to activate C99 support; -O3 gives much (really) better performance, while -Wall is in general a good idea:

1
$ gcc-4.6.2 -Wall -O3 -std=c99 psychic.c -o psychic

To run it, just pass the $n$, $k$, $j$ and $l$ parameters on the command line. There is no checks, so avoid mistakes. The program outputs the generated tickets:

1
2
3
$ ./psychic 5 3 3 2
1, 2, 3
1, 4, 5

Wrapping up

After I completed the Haskell version, I found it not overly difficult to implement the C one. I was lucky to have discovered Bit Twiddling Hacks the week before; the code fragments there were very helpful in writing efficient set oriented functions over words.

Surprisingly, I had just one bug to track (I was using a variable both as parameter and temporary storage in one of the function); that was lucky as I’m not sure I could have debugged such code.

Complete Code

Psychic Modeling Fast Version (psychic.c) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef unsigned int UINT;
#define BETA 100

#define first_perm(n) ((1 << n) - 1);

/* Mac OS X/Linux specific */
FILE *drand;

UINT def_random() {
    UINT r;
    fread(&r, sizeof(UINT), 1, drand);
    return r;
}

#define _random() def_random()

/* GCC specific definitions */
#define bits_count(v) __builtin_popcount(v)
#define least_one(v) __builtin_ffs(v)

/* http://www-graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation */
/* return next lexicographic permutation */
UINT next_perm(UINT v)
{
    UINT t = v | (v - 1);
    return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}

/* following
   http://mikeash.com/pyblog/friday-qa-2011-03-18-random-numbers.html */
/*
 * return a random UINT 0 <= r < n
 */
UINT randomR(UINT n)
{
    UINT two31 = 1U << 31;
    UINT max = (two31 / n) * n;

    while (true) {
        UINT r = _random();
        if (r < max)
            return r % n;
    }
}

/* http://rosettacode.org/wiki/Evaluate_binomial_coefficients#C */
/*
 * binomial coefficient
 */
UINT combi(UINT n, UINT k)
{
    UINT r = 1;
    UINT d = n - k;

    if (d > k) {
        k = d;
        d = n - k;
    }

    while (n > k) {
        r *= n--;
        while (d && !(r % d))
            r /= d--;
    }

    return r;
}

/*
 * select k digits from ds. n is length of ds
 */
UINT sample(UINT k, UINT n, UINT ds[]) {
    if (k == 0)
        return 0;
    if (n == 0)
        return 0;

    UINT s = sample(k-1, n-1, ds);
    UINT p = ds[randomR(n)];
    if (p & s)
        return ds[n-1] | s;
    else
        return p | s;
}

/*
 * displays subset expressed as bit positions
 */
void show_set(UINT v)
{
    bool f = true;
    while (v) {
        if (!f)
            printf(", ");
        printf("%d", least_one(v));
        f = false;
        v &= v - 1;
    }

    printf("\n");
}

/*
 * union of all subsets
 */
UINT funion(UINT c, UINT w[])
{
    UINT r = 0;

    for (UINT j = 0; j < c; ++j)
        r |= w[j];

    return r;
}

/*
 * init work memory with subsets that are far from
 * initial ticket (defined as first_perm)
 */
UINT init(UINT c, UINT n, UINT l, UINT k, UINT w[])
{
    UINT i = 0;
    UINT v = first_perm(n);

    while (c--) {
        if (bits_count(v & k) < l)
            w[i++] = v;
        v = next_perm(v);
    }

    return i;
}

/*
 * copies subsets that are far from passed ticket
 */
UINT check_cover(UINT r, UINT l, UINT t, UINT from[], UINT to[])
{
    UINT i = 0;

    for (UINT j = 0; j < r; ++j)
        if (bits_count(from[j] & t) < l)
            to[i++] = from[j];

    return i;
}

void solve(UINT n, UINT k, UINT j, UINT l)
{
    UINT r = combi(n, j);
    UINT *work = malloc(sizeof(UINT) * (BETA+1) * r);

    UINT t = first_perm(k);

    show_set(t);

    r = init(r, j, l, t, work);

    while (r) {
        UINT d = funion(r, work);
        int bc = bits_count(d);

        UINT digits[bc];

        int i = 0;

        while (d) {
            digits[i++] = d & -d;
            d &= d - 1;
        }

        UINT best_ticket, best_remaining, best_pos = 0;

        best_ticket = sample(k, bc, digits);
        best_remaining = check_cover(r, l, best_ticket, work, work+r);

        for (UINT i = 1; i < BETA; ++i) {
            UINT new_ticket = sample(k, bc, digits);
            UINT new_remaining = check_cover(r, l, new_ticket,
                                             work, work+((1+i)*r));

            if (new_remaining < best_remaining) {
                best_ticket = new_ticket;
                best_remaining = new_remaining;
                best_pos = i;
            }
        }

        show_set(best_ticket);
        if (best_remaining)
            memcpy(work, work+(best_pos+1)*r, sizeof(UINT) * best_remaining);
        r = best_remaining;
    }

    free(work);
}

int main(int argc, char *argv[])
{
    UINT n, k, j, l;

    sscanf(argv[1], "%u", &n);
    sscanf(argv[2], "%u", &k);
    sscanf(argv[3], "%u", &j);
    sscanf(argv[4], "%u", &l);

    drand = fopen("/dev/random", "r");

    solve(n, k, j, l);

    fclose(drand);

    return 0;
}

Comments