Concept question since it's a slow night here.
Quick aside for those not familiar with any numbering system besides base 10, base 2 means every digit in a number can be either 1 or 0, giving two possibilities. Base 4 would allow 3, 2, 1 or 0.
Changing it into base 10 is the sum of number*base^(position-1) where position starts at 1 at the right side of the number and increases as you go to the left, 11 in base 3 would be 1 * 3^0 or 1 + 1 * 3^1 or 3, for a total of 4.
There's not a whole lot of lexicon for what I'm trying to ask, so for some examples, given a base of 4, you can subdivide it into 2 'bits' of base 2, and the max value of that (11 in base 2) is equal to the max value of one digit of base 4 (3)
For terms, I'm going to describe the number of digits in the lower base as the 'count'. For the reduced base, in the above case base 2, I'm going to call the 'subbase' since it's a subdivide of the original base 4.
So for 4, the only subbase available is 2, with a count of 2 bits.
However, if we raise that number, say to a base of 16, we have another interesting development:
You can still use a subbase of 2, and with a count of 4, we end up with the same as the max single digit value of base 16, but we can also use a base of 4, and with a count of 2 (33) we end up again with the max single digit value of base 16.
There's another layer to that as well, a base of (2^2) can be split into a subbase of 2 with two digits, and a base of (2^(2^2)) can be split up into a subbase of 2 with a count of 4, or a subbase of 4 with a count of two. This applies up to 256 with subbase:count of 2:8, 4:4, and 16:2.
And just for good measure, this applies to other bases as well, so 9, base 3^2 can be split into a subbase of 3 with a count of 2.
So what do you do when you have have an almost prime (94 splits into 47 (prime) and 2) and you're trying to find the best ways to split it up into subbases? What can I subbase with to keep most of the potential of base 94, while still subdividing it into more values per digit of the original base.
And no, subbasing with 94 doesn't count.
For why I'm doing this, data stores accept 94 characters without padding (Making them larger than one character), so I'm trying to get the most mileage I can of an effective base 94 save file of 260k bytes. If I can split it into two values, I can save 520k values so long as they do not exceed the new subbase's max value.
The fundamental problem is converting a number from one base to another base (base 94 ? base 2)
A simple representation of a number in an arbitrary base might look like this:
-- represents four thousand three hundred twenty one {base = 10, 1, 2, 3, 4} -- represents fivein base 2 {base = 2, 1, 0, 1}
I'll call numbers of this form a long number. To illustrate, here's a function that turns a "long number" into a regular Lua number:
local function asNumber(d) local sum = 0 for i = #d, 1, -1 do sum = sum * d.base sum = sum + d[i] end return sum end
Unfortunately, if you're planning on using base 94, you probably shouldn't turn the message into an actual Lua number. That's because you'll only get about 53 bits of precision (about 8 base-94 digits), and after than lose all your data.
So we need to be able to convert directly from long number base to another.
The above function gives us a hint how to do it. We need to be able to add a small Lua number to a long number, and we need to be able to scale a long number by a small Lua number. Both aren't very tricky.
local function overflow(num) local i = 1 while num[i] do local overflow = math.floor(num[i] / num.base) if overflow ~= 0 then num[i] = num[i] % num.base num[i + 1] = (num[i + 1] or 0) + overflow end i = i + 1 end end local function multiply(num, constant) -- First, we multiply each digit separately. -- This will calculate the correct number, but -- it will usually need digits that are outside of -- the base. for i = 1, #num do num[i] = num[i] * constant end -- Now, we need to "overflow" the digits that are -- too big. overflow(num) end
The numbers involved here won't get too big (since overflow is divided by base each time, it doesn't build-up in any substantial way. The numbers are only on the order of the scale
)
Addition looks extremely similar, but is overall simpler:
local function add(num, constant) num[1] = num[1] + constant overflow(num) end
Now we just implement asNumber
in terms of these new functions, and call it convertBase
:
local function convertBase(source, toBase) local sum = {base = toBase, 0} for i = #source, 1, -1 do multiply(sum, source.base) add(sum, source[i]) end return sum end
So now if you have a bit-string, you can convert it to base 94 and back. Do be careful that you'll lose "most significant zeros", since we're just treating this as a number. You may want to pre-fix your encoded data with its length in bits, and pad the decoded result with the necessary number of zeroes.