/**
* Hmmm, not a very interesting problem for me, but I think I tried my best. First of all,
* it's clear that the first subproblem we need to solve is to find the min of two numbers.
* Comparing two ints involves finding the most significant bit in which they differ, and
* pick the num with a 1 on that bit. A bit difficult (without control statements), right?
*
* I may have overcomplicated stuff, but here's how I solved it. Two-complement is the way
* most modern computers store negative numbers. If we have b bits, we can store
* [-2^{b-1}, 2^{b-1}). In memory, non-negative numbers are just stored in their binary
* format. -1, however, is 11...11. Why? Can't it just be 100...001 (using the first bit
* to repr negative)? Indeed we can. But two-complement allows us to do sth really cool!
* 11...11 is chosen to be -1 because it IS -1 modulo 2^b. This means addition involving
* negative numbers can be carried out in the same way as non-negative numbers (where
* overflow is omitted, i.e. addition under modulo), if we know how to find the negation
* of a value! I believe this is the most important understanding of two-complement! The
* remaining stuff can be easily found online.
*
* So, when we want to compare two int, we'll just move them into two separate registers.
* Under two-complement, it's well-known that -x = ~x + 1, so we can do subtraction. Note
* that our values are in [0, 2^b). After subtraction, it should be (-2^b, 2^b), but in
* memory, it's [0, 2^{b+1}). As usual, we can see if the (b+1)-th bit is a 1 to see if a
* num is negative modulo 2^{b+1}. Do a bit shift to filter out that bit and we will have
* compared two integers.
*
* For simplicity and extensibility, I chose to implement swap() for both s=0 and s=1.
* swap() swaps two num iff the num in front is greater than the num at the back. Most
* importantly, the part of comparing and swapping of different pairs can be carried out
* simultaneously. Therefore, we can do divide-and-conquer to finish [S1-4]. In practice,
* there're some more things to consider. First, we can impl a "multiply by 0/1" operation:
* by taking advantage of -1 -> 11...11 and -0 -> 00...0, we can do a AND operation with
* a num and a 0/1's negation. Finally, in order to swap (x,y), I first compute their xor,
* and multiply this by the 0/1 representing whether the element is front is greater, and
* change both x,y with the xor. This technique of using xor is also well-known:
* swap(x,y): x ^= y --> y ^= x --> x ^= y
*
* [After some hints]
* Ahhh, bubble sort can be parallelized in a few ways. For example, we can do these swap
* pairs first: (0,1)(2,3)(4,5).... Then, in another phase, do (1,2)(3,4)(5,6)... Repeat
* for n times.
*
* Find min: 21 * ceil(log_2(n)) + 3 (tight for me, failed [S2])
* Sort: 21 * n (not tight)
* Implementation 1 (Full solution except [S2]; bit operations, maths, two-complement)
*/
#include <bits/stdc++.h>
#include "registers.h"
typedef std::vector<int> vec;
const int b = 2000;
// Do a min-swap with 21 steps on all the following pairs:
// {first_idx[0], first_idx[0] + diff}, {first_idx[1], first_idx[1] + diff}, ...
void swap(const vec& first_idx, int diff, int k) {
// Setup the masks
std::vector<bool> mask(b, 0), mask2(b, 0), ones(b, 0);
for (int f : first_idx) {
ones[f * k] = 1;
for (int i = f * k; i < (f + 1) * k; i++)
mask[i] = mask2[i] = 1;
mask2[(f + 1) * k] = 1;
}
append_store(3, mask);
append_store(4, ones);
append_store(5, mask2);
// Move the first elements to register 1, the second elements to register 2.
append_move(1, 0);
append_right(2, 0, diff * k);
append_and(1, 1, 3);
append_and(2, 2, 3);
// append_print(1);
// append_print(2);
if (k >= 2) {
// Subtraction with two-complement
append_not(6, 1); // flip
append_and(6, 6, 5);
append_add(6, 6, 2);
append_add(6, 6, 4); // +1
// append_print(6);
// Extract sign
append_right(7, 6, k);
append_and(7, 7, 4);
append_not(7, 7);
append_and(7, 7, 5);
append_add(7, 7, 4);
// append_print(7);
} else { // k=1 is a special case
append_not(7, 2);
append_and(7, 7, 1);
}
// Compute xor if a swap is needed
append_xor(8, 1, 2);
append_and(8, 8, 7);
// append_print(8);
// Swap by xor
append_left(9, 8, diff * k);
append_or(9, 9, 8);
append_xor(0, 0, 9);
}
void construct_instructions(int s, int n, int k, int q) {
if (s == 0) {
// Extend the array with INF s.t. its len is a power of 2
int two_pow = 1;
while (two_pow < n)
two_pow *= 2;
if (two_pow > n) {
append_not(1, 1); // register 1 is initially 0
append_left(1, 1, n * k);
append_or(0, 0, 1);
}
n = two_pow;
// Divide-and-conquer to find min
for (int step = 2; step <= n; step *= 2) {
vec indices;
for (int a = 0; a < n; a += step)
indices.push_back(a);
swap(indices, step / 2, k);
}
} else {
for (int r = 0; r < n; r++) {
vec indices;
for (int i = 0; i < n - 1; i++) {
if (i % 2 == r % 2)
indices.push_back(i);
}
swap(indices, 1, k);
}
}
}
# |
Verdict |
Execution time |
Memory |
Grader output |
1 |
Correct |
0 ms |
212 KB |
Output is correct |
# |
Verdict |
Execution time |
Memory |
Grader output |
1 |
Incorrect |
0 ms |
212 KB |
Wrong answer detected in grader |
2 |
Halted |
0 ms |
0 KB |
- |
# |
Verdict |
Execution time |
Memory |
Grader output |
1 |
Correct |
0 ms |
212 KB |
Output is correct |
2 |
Correct |
0 ms |
340 KB |
Output is correct |
3 |
Correct |
1 ms |
340 KB |
Output is correct |
4 |
Correct |
1 ms |
340 KB |
Output is correct |
# |
Verdict |
Execution time |
Memory |
Grader output |
1 |
Correct |
1 ms |
340 KB |
Output is correct |
2 |
Correct |
1 ms |
340 KB |
Output is correct |
3 |
Correct |
1 ms |
340 KB |
Output is correct |
4 |
Correct |
1 ms |
340 KB |
Output is correct |
5 |
Correct |
1 ms |
340 KB |
Output is correct |
6 |
Correct |
0 ms |
340 KB |
Output is correct |
# |
Verdict |
Execution time |
Memory |
Grader output |
1 |
Correct |
1 ms |
340 KB |
Output is correct |
2 |
Correct |
1 ms |
340 KB |
Output is correct |
# |
Verdict |
Execution time |
Memory |
Grader output |
1 |
Correct |
1 ms |
340 KB |
Output is correct |
2 |
Correct |
1 ms |
340 KB |
Output is correct |
3 |
Correct |
3 ms |
1220 KB |
Output is correct |
4 |
Correct |
4 ms |
980 KB |
Output is correct |
5 |
Correct |
2 ms |
852 KB |
Output is correct |
6 |
Correct |
2 ms |
852 KB |
Output is correct |
7 |
Correct |
2 ms |
852 KB |
Output is correct |