Generate short UUID

2020-01-29

We all love UUIDs.

They are great as:

BUT, they are pretty LONG, for example if you want to include your ID in a URL, it's just TOO long.

http://devtoolsdaily.com/examples/graphviz_examples/6572a1e2-6b2a-4588-a97a-0d685bb01d5f

  • UUIDv4 is most common one and it's random.
  • UUID is 128bit random number.
  • UUID is represented usually as 123e4567-e89b-12d3-a456-426614174000 (32 hexadedimal numbers + 4 hyphens).

so UUID we usually generate looks like 36 characters, and there is plenty of room to compact it if we want to use it in the url.

We'll use python for playing with UUID.

>>> from uuid import uuid4
>>> id = uuid4()
>>> id
UUID('6572a1e2-6b2a-4588-a97a-0d685bb01d5f')

string representation is 36 characters.

>>> len(str(id))
36

if the same number is represented as a number is even longer.

>>> id.int
134847232822826388955179878208578264415
>>> len(str(id.int))
39

Why is it it longer?

because traditional representation is using HEX (base 16), the number representation above is using base 10, so it take more digits to represent same number.

So to decrease the length of this number, we need to increase number's base, lets try to represent this number using different bases.

python doesn't have native base converter, so here is random code from StackOverflow.

def numberToBase(n, b):
    if n == 0:
        return [0]
    digits = []
    while n:
        digits.append(int(n % b))
        n //= b
    return digits[::-1]

lets see how many characters we need with different base:

>>> len(numberToBase(id.int, 16))
32
>>> len(numberToBase(id.int, 32))
26
>>> len(numberToBase(id.int, 52))
23
>>> len(numberToBase(id.int, 62))
22
>>> len(numberToBase(id.int, 64))
22
>>> len(numberToBase(id.int, 66))
21
>>> len(numberToBase(id.int, 128))
19
>>> len(numberToBase(id.int, 256))
16

but if we want to use generated string in the URL, we can only use a subset of characters.

Here is a list of allowed characters from stackoverflow or https://perishablepress.com/stop-using-unsafe-characters-in-urls/

these are a-z, A-Z, 0-9 and _, -, ~, . (66 total).

~ and . are less common in IDs, so you can exclude them if you prefer.

with 66 characters, let's define our alphabet and translate our number to this alphabet:

>>> urlsafe_66_alphabet = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_-.~'
>>> ''.join(urlsafe_66_alphabet[x] for x in numberToBase(id.int, 66))
'ssLIVEdkjwNEjRxFatJ7j'

now the url can be a little smaller:

http://devtoolsdaily.com/examples/graphviz_examples/ssLIVEdkjwNEjRxFatJ7j


This article was originally published in DevToolsDaily blog