2021 Supporting Vanity Urls (Custom Domains) in B2

Goals of This Document:
Specify the UI and some thoughts around implementation of how to support Vanity Urls (also called "custom domains", also called "CNAME" support) in B2.

What is the Feature?
The ability for a B2 customer to have these a URL of type #4 below serve the same content as all the other URLs:

A "vanity URL" is one term for a custom domain. B2 already has functionality to fetch any of the URLs #1 - #3 above, this feature is implementing URL #4.

Isn't this possible already?
Not by using Backblaze B2 alone. It is possible to configure 3rd party systems on top of B2, like put a CDN such as Cloudflare in front of a B2 bucket to kind of achieve the same END USER result, but it is difficult, error prone, and not many people have ever pulled it off, and this isn't an option for regular people. Configuring a CDN is something only professional IT people do, and then you deal with two separate companies, including the bills from two companies.

Is this building a CDN?
No. A CDN is something completely different. In the past we have used a CDN as a total hack to achieve this for certain large customers, and this has brain damaged the way people think. This is just another URL to access the same one file. We already had 3 URLs to access a file, we want a 4th URL to access a file. It doesn't change the performance like a CDN does, it doesn't increase or decrease uptime like a CDN, it doesn't do "edge caching" (geography based caching) like a CDN does, it literally has nothing at all to do with a CDN. Nothing.

Is this building a Hosting Feature?
No. Hosting is something completely different. This is just another URL to access the same one file. This has zero to do with hosting. This is not hosting any more than the S3 URL is "hosting" or the friendly URL is "hosting" or the Native URL is "hosting". Why would you think serving up another URL is "hosting"? I feel you aren't thinking clearly.

Why Do This Feature?
There are a couple of reasons to do this feature:

What is This Feature NOT DOING?
This feature WILL NOT implementing hosting, and WILL NOT be implementing a CDN. That is not a goal. This is just one more URL to a file, that's all.

How A Customer With Their Own DNS Configures This:
BrianW uses "hover" as the domain registration for kookbeach.com (kookbeach.com is a vanity domain, for a vanity URL). So here are the steps after logging into hover.com:

PLEASE NOTE: the name of the B2 bucket I'm configuring with a Vanity URL is "catsanddogs", and the vanity URL is https://sharefiles.kookbeach.com

Ok, so at this point you are all finished configuring your DNS. Try "ping sharefiles.kookbeach.com" (it might take a few hours for this to work because DNS propagation is slow) and it should say:

And also, if you try "ping catsanddogs.s3.eu-central-003.backblazeb2.com" it says the same IP address:

GUI Changes to Backblaze Web Interface:
We need the ability to specify that this bucket accepts a particular CNAME. Here is what that would look like:

Now, when the customer pops up the "Details" information for any file in that bucket, there is one additional URL added. See below:

Store the mapping of "acceptCNAME" to "S3_Domain" in cluster specific Cassandra in Table called "VanityUrlMappingTable":
We need to store the acceptCNAME mapping in two places, and we need to keep them in sync:

To keep #1 and #2 in sync, whenever any customer changes their BucketInfo, and also has an "acceptCNAME" in that BucketInfo, it is also set in this VannityUrlMappingTable. The change to BucketInfo is rejected if it fails to be set in VanityUrlMappingTable for whatever reason.

IMPLEMENTATION CHOICE: one idea is that we do NOT create a new "custom" table inside of Cassandra like above. Instead, like we sometimes do for the server side implementation of B1 products, we could simply create a brand new private B2 bucket in our reserved namespace called "bzvanityurlmapingtable". INSIDE of that bucket would contain files named these things to represent the above table:

          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/sharefiles.kookbeach.com/myS3domain.txt (this file would contain the string "catsanddogs.s3.eu-central-003.backblazeb2.com")
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/sharefiles.kookbeach.com/myBucketId.txt (this file would contain the string "5c5a30101287102070b00013")

          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/downloads.larryphotography.com/myS3domain.txt (this file would contain the string "larrybucket.s3.eu-central-003.backblazeb2.com")
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/downloads.larryphotography.com/myBucketId.txt (this file would contain the string "2070b000135c5a3010128710")

          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/tom-jones-meatpackers.com/myS3domain.txt (this file would contain the string "porkloin.s3.eu-central-003.backblazeb2.com")
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/tom-jones-meatpackers.com/myBucketId.txt (this file would contain the string "0002070b135c5a3028710101")

In this fashion, we would have the fast lookups we need, without a "custom" Cassandra table. Stored INSIDE of B2.

What The Underlying Functionality Does:
There are three things that have to be implemented.

Of these three steps, I believe #1 and #2 are fairly straightforward. I also think we can build #1 and #2 to get that part working, then tackle #3 once everything else is working correctly. So the section below here is dedicated to how to do #3...

Quick explanation of the HTTP-01 ACME challenge: to prove the Backblaze API/Download servers have the "rights" to serve up content over SSL/HTTPS, the API/Download servers contact the LetsEncrypt servers with a request for an HTTPS certficate for sharefiles.kookbeach.com like this, and the LetsEncrypt servers respond with the name of a file to write out to a certain location, with certain signed contents:

After the Backblaze Download/API servers create that file in the user's bucket and are willing to serve it up on HTTP (notice it is not HTTPS), then LetsEncrypt fetches the contents of http://sharefiles.kookbeach.com/.well-known/acme-challenge/8303 (notice that is not HTTPS yet) and verifies that the challenge was completed correctly which proves the Backblaze API/Download servers are authorized to do certificate management for sharefiles.kookbeach.com and that's the concept of the HTTP-01 ACME challenge.

First, we have to turn on HTTP access (no SSL) to the API/Download servers, because there is a chicken and egg problem: how do you get an SSL cert if you don't have one to communicate with? When requests come into port 80 (HTTP) the Java code should be carefully written to only accept requests that appear to be LetsEncrypt requests for vanity domain certs. So for example, if "https://sharefiles.kookbeach.com" is the only vanity domain enabled in cluster 003, then ONLY requests to http://sharefiles.kookbeach.com/ (port 80, no SSL, not HTTPS) the Java code should only respond/allow that vanity domain on port 80, and furthermore only this URL should be allowed: http://sharefiles.kookbeach.com/.well-known/acme-challenge/<TOKEN> on HTTP on port 80. Furthermore, this is ONLY on the API/Download servers in the cluster that this bucket is in.

The API/Download servers all need to have the program "certbot" installed. To install certbot on Debian, an admin types these commands:

This should be translated into our Backblaze system of however we get software installed on API/Download systems.

Ok, so we ALSO need an "internal-only accessible Java servlet" written. There are examples of this in the way the cluster authority interacts with Yoda for example, or the cluster authority interacts with the restore servers. These are *NOT* publicly accessible from the internet Below is the pseudo-code for the Java Servlet that fires when a URL https://localhost/write_certbot_verify_into_bucket is fetched:

public static boolean write_certbot_verify_into_bucket(String certbot_token, String certbot_domain) throws WebApiError {

Step 1: Look up hguid and S3_Domain from VanityUrlMappingTable using the passed in argument "certbot_domain" as the row to lookup. The Table is found in the cluster specific cassandra (see description elsewhere in this document).

         String hguid;
         String theS3domain;
         .... do the Cassandra lookup here, and the values discovered are seen below ...
         hguid="5c5a30101287102070b00013"
         theS3domain="catsanddogs.s3.eu-central-003.backblazeb2.com"

      Step 2: Store the certbot_token in B2 so that it appears here if fetched from either location (notice one of these is "http" with no SSL):
         https://catsanddogs.s3.eu-central-003.backblazeb2.com/.well-known/acme-challenge/<value of certbot_token>
         http://sharefiles.kookbeach.com/.well-known/acme-challenge/<value of certbot_token>

After that is done and the API/Download servers have certbot available to them, then when a customer configures the bucket to contain properties on the bucket of { "acceptCNAME": "sharefiles.kookbeach.com" } then the Java code runs this command to get a valid cert for sharefiles.kookbeach.com:

Finally, the files "cert.pem", "chain.pem", and "privkey.pem" need to be copied to CATALINA_BASE/conf and the permission set correctly:

# cd /etc/letsencrypt/live/sharefiles.kookbeach.com
# cp cert.pem /opt/tomcat/conf
# cp chain.pem /opt/tomcat/conf
# cp privkey.pem /opt/tomcat/conf 
#  
# chown tomcat:tomcat *.pemp chain.pem

Now, because the API/Download servers are a load balanced set of servers, this cert needs to be copied/sent to the other API/Download servers. Chris Bergeron says he would prefer we collect them in a central location for him to deploy. I'm hoping that is a straight-forward task the Java programmers at Backblaze can figure out. One idea is to store them in a private B2 bucket somewhere. If we did the "B2 IMPLEMENTATION CHOICE" for the VanityUrlMappingTable then we could store these in the following locations which WOULD make them available to BOTH Chris Bergeron AND ALSO the other API/Download servers:

          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/sharefiles.kookbeach.com/certs/cert.pem
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/sharefiles.kookbeach.com/certs/chain.pm
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/sharefiles.kookbeach.com/certs/privkey.pem

          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/downloads.larryphotography.com/cert.pem
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/downloads.larryphotography.com/chain.pm
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/downloads.larryphotography.com/certs/privkey.pem

          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/tom-jones-meatpackers.com/certs/cert.pem
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/tom-jones-meatpackers.com/certs/chain.pm
          https://f003.backblazeb2.com/file/bzvanityurlmapingtable/mappings/tom-jones-meatpackers.com/certs/privkey.pem

If this isn't clear, just ask BrianW to explain. BrianW can also describe several other different mechanisms that would work.

You should also make sure the Tomcat server.xml is set correctly, this is the part of the server.xml which looks like this:

<Connector port="443"
protocol="org.apache.coyote.http11.Http11NioProtocol" maxThreads="150" SSLEnabled="true">
<SSLHostConfig>  
   <Certificate certificateFile="conf/cert.pem" certificateKeyFile="conf/privkey.pem" certificateChainFile="conf/chain.pem" />
</SSLHostConfig>
</Connector>

That's it! The next time Tomcat is restarted then this file can be fetched with SSL/HTTPS:

https://sharefiles.kookbeach.com/cutedogs/cute_puppy.jpg

Now, to start with we can wait until the Thursday push to restart Tomcat, customers can just wait for it. Alternatively, we could restart Tomcat on the API/Download servers of that particular cluster once every two hours in a rotating fashion (restart the first API/Download server's Tomcat, make sure it works and came back, and then restart the next API/Download server, etc) IF AND ONLY IF one customer on that cluster has added a vanity domain. This would pick up all the vanity domains that had been added in that last two hours. I really don't expect customers to be adding many of these, we're talking about maybe 100 vanity URLs in the first year at most.

Refreshing the LetsEncrypt Certificates:
The LetsEncrypt certificates expire after 90 days, so they need to be refreshed maybe once every 30 days. To refresh you just run the same set of commands as the first time.

acceptCNAME	S3_Domain	bucketId
sharefiles.kookbeach.com	catsanddogs.s3.eu-central-003.backblazeb2.com	5c5a30101287102070b00013
downloads.larryphotography.com	larrybucket.s3.eu-central-003.backblazeb2.com	2070b000135c5a3010128710
tom-jones-meatpackers.com	porkloin.s3.eu-central-003.backblazeb2.com	0002070b135c5a3028710101

2021 Supporting Vanity Urls (Custom Domains, DNS CNAMES) in B2