2020 Backblaze Client
Architecture
(written 8/16/2020) by Brian Wilson
|
|
Backblaze Personal
Backup (internally to Backblaze it has the nickname "B1") is software that
runs on a customer's laptop or desktop and pushes a copy of all their files to
the Backblaze datacenter in Sacramento, California or the Backblaze datacenter
in Amsterdam, Netherlands. Backblaze charges a fixed price of $6/month per
laptop for this service. The flat fee includes any external drives as long
as they are physically connected to the laptop or desktop. This document
describes in detail the architecture of the Backblaze client that runs on the
customer's laptop or desktop computer. This includes what executables
exist on disk and what they do, the data structures Backblaze uses, and the flow
of how the backups work.
Terminology:
- "Laptop" - this document uses the term "laptop" to mean "customer
laptop or desktop", because it is clumsy to mention "customer laptop or
desktop" in every sentence, and more than half the computers that Backblaze
backs up are laptops nowadays. A "laptop" is always a customer's
computer. Using the term "laptop" is also helpful to clarify that it
is a customer's computer, because "datacenters" (see below) run by Backblaze
never have any laptops in them.
- "Datacenter" or "Backblaze datacenter" - the physical location
and the "servers" (see below) that Backblaze run to store the backups for
retrieval later. These datacenters are in Sacramento, California and in
Phoenix, Arizona, and in Amsterdam, Netherlands.
- "Region" or "Backblaze region" - Backblaze supports
storing your backup in a selected region. When this document was
written, there are two regions: "US West" (Sacramento), and "EU Central"
(Amsterdam). Notice that there might be more than one "datacenter" in
a single region - that is because when Backblaze (the company) runs out of
space in one datacenter, Backblaze (the company) has to go find more space
in an additional datacenter that is "very near the other datacenters in that
region".
- "Server" or "Backblaze Servers" - the computers that run in
the Backblaze datacenters. The "client" (see below) communicates to the
servers.
- "Client" or "Backblaze Client" - the collection of executable
programs that make up what runs on a customer laptop. This collection of
executables includes a GUI (graphical interface) in one executable, and separate
from that are a collection of executables that send the files found on the
laptop to the Backblaze servers over the HTTPS protocol.
- "SSD" - Solid State Drive - for all intents and purposes you can
substitute "Hard Drive" where-ever you see "SSD" if appropriate in your
computer. Most modern laptops only use SSDs, hard drives are
disappearing from the world other than in datacenters. But if you own
a laptop that is 10 years old it might not contain an SSD, it might be an
old fashion very slow hard drive instead.
- "Account" or "Backblaze account" or "Backblaze Web Account"
- customers create an account on the Backblaze website, and their account is
where their data can be retrieved later, and where the customers specify payment
for the service with a Credit Card. Backblaze accounts each have a 12
hexadecimal digit unique id such as: "fd123d6faf4a" assigned at account creation
time that never changes, however all accounts have exactly 1 customer email
address associated with them, and that email address is the "username" that the
customers use to sign in, so we often say "Backblaze accounts are defined by an
email address". Email addresses for accounts are absolutely globally
unique, we never allow a second customer to create an account with an email
address that is currently "in use" on any other Backblaze account anywhere in
the world. Customers sign into their account at:
https://secure.backblaze.com/user_signin.htm
- "Host" or "Computer" - unfortunately these terms are used
interchangeably for the customer's laptop and/or the backup of that laptop.
The regrettable term "host" was a term we used early in Backblaze's history
for the customer's laptop, so it is hard to get rid of from our terminology.
- "HGUID" or "hguid" - stands for "Host Globally Unique
Identifier" - this is a 24 digit hexadecimal number that describes one
"laptop backup". An example is "bf654d292a61ca0e193e908f".
Hguids are globally unique, no two backups will ever have the same hguid.
One customer account (one email address) can have multiple backups inside
it. For example, if one customer with email address "joe@corp.com"
owns a Macintosh laptop running Backblaze and a Windows laptop running
Backblaze, that customer's one Backblaze account will have two separate
hguids inside of it - they define two different backups. When
preparing a restore, the customer would sign into their one Backblaze
account, and choose which of the two backups to restore from.
- "volume" or "attached volume" or "partition" - A
Macintosh customer laptop's SSDs or hard drives are organized by what are
called "volumes" on the Macintosh. Usually there is 1 volume per SSD
or hard drive, but technically advanced customers sometimes create two or
more "volumes" on one physical device (one SSD or hard drive) to make it
appear as if two separate SSDs exist to the computer instead of just 1 SSD.
On Windows and Linux it is often called a "partition" but it is the
IDENTICAL concept. In this document I use the word "volume" or
"partition" interchangeably, and you have to substitute the appropriate word
for the platform you are working with at the time. Also, it is so
common that there is 1 volume per SSD, that when I write "SSD" it means a
logical volume. As far as the Operating System is concerned, two
volumes on one SSD are simply two different SSDs. On Windows each
partition gets a separate drive letter, so if a customer has three
partitions it might appear as if they have 3 physical SSDs attached to the
computer named "C:\", "E:\" and "G:\" or something like that, when in
reality it is one SSD. Backblaze would treat that situation as if
there were 3 entirely separate physical SSDs with one partition each.
Backblaze only cares about volumes (partitions), in reality Backblaze does
not care about the number of different physical SSDs attached, Backblaze
deals at the "logical level" of volumes and partitions.
- "Volume Guid" or "vguid" or "BzVolumeGuid" - a 28
character globally unique identifier of one customer's laptop's volumes, and
it ALWAYS starts with the letter "v". Here is an example:
v000c0101f6fb58de90a713a0e19 The laptop's "boot volume" (also
known as the "system volume") ALWAYS starts with the characters "v000", and
then subsequent volumes start with "v001" and "v002", etc to make it easy to
get a little information quickly (like which is the boot volume).
- "Java Time" or "Milliseconds Since 1970" - The Backblaze
client has to communicate with the Backblaze datacenters, and most of the
code in the Backblaze datacenters is written in the programming language
"Java". Java has a pretty standard measure of time which is the number
of milliseconds since 1970. The Backblaze client uses this measure of
time for everything (on both Macintosh and Windows) such as the "time a file
was last modified". This usually looks like a string of 16
hexidecimals units such as "00000173f855ec2b". You can find web pages
all over the internet which will convert that to a human readable date and
time for you by copying and pasting the "Java Time" into the web page.
- "Utf8" or "Utf-8" or "Unicode" - Filenames on all
modern computers (Windows, Macintosh, Linux, iOS, Android) are encoded in
"Unicode", and all modern web browsers use Unicode. All that means is
you can type a single string with letters from different languages, like
this: "Hello,여보세요, こんにちは, 你好". The document you are reading is in
Unicode. The Backblaze client always uses "Utf-8" encoding for
everything, which is one of the most standard forms of Unicode. For
most English speakers this just looks like (and acts like) regular old text
and most people reading this document don't need to worry about what it
means to be in "Unicode" or "Utf-8", so if this doesn't make sense to you,
just ignore it. This is a technical point for people who speak other
languages, and their filenames are in other languages (which Backblaze
profoundly supports).
- "bz_done" or "bz_done files" - These are a record stored
on the customer's laptop of what has been backed up (what has been "done"
already). These files are the most important data structure on the
customer's laptop and cannot ever be edited or deleted by the customer, or
the backup is hopelessly corrupted. On Windows you can find these
files in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\ and on
Macintosh in /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/ and you
can think of it as one bz_done file is started every 3 days, with names like
this: "bz_done_20200813_0.dat". The date in the filename is when it
was started, and it rolls to the next "date" after 3 or 4 days.
AGAIN, DO NOT CHANGE ONE SINGLE CHARACTER IN ANY
bz_done FILE, IT WILL CORRUPT THE BACKUP. You can
safely make a copy of one of these files into some other folder like
C:\temp\ and then open the copy safely in WordPad on Windows, or TextEdit on
the Mac. The "bz_done" files are an APPEND ONLY format, by definition
they can only grow, because you append new information to the end of them.
They are a complete record of everything that has been sent to the Backblaze
datacenter. The "bz_done" files are periodically encrypted and then
sent to Backblaze's datacenter, and when a customer signs into the website
and goes to "View/Restore" files, the "tree of files" that is seen on the
"View/Restore" web page is literally created by reading through the bz_done
file that was sent earlier. You can watch a 57 minute tutorial on how
to understand the internals of bz_done files here:
https://www.youtube.com/watch?v=MOlz36nLbwA You can view a slide
used in that presentation here that documents what every column does by
clicking this
link.
- "bz_comb" or "bzcomb files" - When the Backblaze client
sends the bz_done files to the Backblaze datacenter, it first reads them
from the local laptop SSD into laptop RAM, then compresses the bz_done file,
then encrypts it the same way the Backblaze client encrypts all user files
before transmitting them, and sends the encrypted blob of data up through
HTTPS (double encryption) to Backblaze. These encrypted blobs that
contain bz_done files are named "bz_comb" files inside the Backblaze
datacenter. Customers will never hear this phrase, this term is mostly
used internally to Backblaze, and it's just important to know it is
identically in every way to a bz_done file, but compressed and encrypted.
The name "bz_comb" is very unfortunate, it means "combined file", and is a
historical term. Inside of the bz_comb format, the file NAMES of
customer files are encrypted for absolute privacy, but a small amount of the
other information contained in bz_done files that is used for various
cleanup processes by the Backblaze datacenter are not, and the two separate
files are "combined" into one file by appending them end on end - simply to
increase upload speed over transmitting them separately as two separate
HTTPS requests.
- "Inherit Backup State" - The "backup state" is primarily the "bz_done"
files on the customer's laptop, plus 3 or 4 other settings files. It
is a list of what has already been backed up. "Inherit Backup State"
is a feature in the Backblaze client where a customer can purchase a new
laptop, and avoid re-uploading all of their data files from scratch to the
Backblaze datacenter. The customer avoids the re-upload by installing
a Backblaze trial, then the "Inherit" feature downloads the copy of the
bz_done files that are stored in the Backblaze datacenter to their current
laptop. After that, if the customer made any local changes to their
directory structure or added new files, the normal backup processes detect
those changes within a few hours and happily continue forward doing
incremental backups to incorporate those new changes.
- "log files" or "customer logs" - Friendly, easy to read
log files created on the customer laptop that are safe to browse (or even
change, they are absolutely not used by the backup program in any way other
than informational. These files are found in this folder on the
customer laptop: C:\ProgramData\Backblaze\bzdata\bzlogs\ on Windows,
and on Macintosh in /Library/Backblaze.bzpkg/bzdata/bzlogs/ Then the name of
the executable that created the log file, like "bztransmit" (see below this
terminology section for a list of executables), then there is one log file
for each day the client performs a backup. For instance, on Windows
there might be a log file named C:\ProgramData\Backblaze\bzdata\bzlogs\bztransmit\bztransmit16.log
which are all the logs bztransmit created on the 16th day of the month.
The log files are compressed after 2 days to save customer disk space, and
the log files are deleted after 26 days to not grow forever on the customer
laptop. You can open the log files with WordPad on Windows, and
TextEdit on the Mac - make the window very very wide and turn off line
wrapping to make the logs format better and be more readable. If a
customer has an issue, I usually start debugging that issue by opening the
bztransmit log files and searching for the word "ERROR" all in capitals.
Just because a line says "ERROR" doesn't mean it is the customer's issue - for
example if in the middle of an HTTPS transmit of one customer data file to
the Backblaze datacenter the customer WiFi is turned off, this will say
"ERROR" in the log file. But if there are HUNDREDS of the same ERROR
in the log files it usually points to the problem.
Overview of Executables that Make Up the Backblaze Client:
The "Backblaze client" is actually a collection of the 8 executables listed
below. Each executable runs at different times and for entirely different
reasons.
- bzserv - the core Backblaze service that MUST run all the
time for any backup to occur. It launches other executables (see
below) to perform the actual backup. bzserv has no UI components, and
is therefore largely cross platform between Windows and Macintosh.
Location on Disk Windows:
C:\Program Files (x86)\Backblaze\bzserv.exe
Location on Disk Macintosh:
/Library/Backblaze.bzpkg/bzserv
Purpose of bzserv: The process "bzserv" runs all the time as a
"service" on the laptop, in order to launch the OTHER executables at the
correct times. The other executables (see below) perform the actual
backup, bzserv can be thought of as "a scheduler" who's primary job is to
NOT take any CPU and NOT take any RAM and NOT take any SSD
performance, but to keep running at all costs. bzserv is required to
run as a service all the time the laptop is running (not shut down), even
when the customer is logged out of their laptop, even if the customer is
using the backup schedule "Only When I Click <Backup Now>" or "Once Per
Day". Any customer who shuts down bzserv is provably insane, doesn't
know what they are doing, and is not qualified to operate a computer because
bzserv takes no computer resources, period. The only valid way of
stopping bzserv from running is to entirely uninstall the Backblaze client.
If bzserv is not running, that is what is causing ALL of the customer's
problems, and it is the number one problem to solve, the first problem to
solve, and the ONLY problem to solve. bzserv runs as the user "SYSTEM"
on Windows, and as the user "root" on Macintosh.
Resource Load
bzserv puts on customer laptop: 0.00000001% of one core of CPU (bzserv
is one single thread), and 0.000001% extra load on the SSD, and about 3
MBytes of RAM (0.0375% of an 8 GByte RAM computer - far far less than 1% of
the customer RAM - 3.7 hundredths of 1% of the customer RAM).
Click here for a deeper description and analysis of
what bzserv does.
- bzfilelist - walks the entire file system on the SSD on the
customer's laptop looking for new and changed files to backup.
bzfilelist is always launched by bzserv. bzfilelist creates lists of
files, but absolutely does not transmit them anywhere. bzfilelist
completely lacks the ability to do network HTTPS communication, it
profoundly cannot do anything but create lists of files for other
executables to consume. bzfilelist has no UI components, and is
therefore largely cross platform between Windows and Macintosh.
bzfilelist runs as the user "SYSTEM" on Windows, and as the user "root" on
Macintosh because it is always launched by parent process "bzserv" (see
above).
Location on Disk Windows: C:\Program Files
(x86)\Backblaze\bzfilelist.exe
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzfilelist
Purpose of bzfilelist: The primary purpose of "bzfilelist" is to
create the complete list of filenames with associated modification dates for
each attached SSD or hard drive. Each SSD's (each "volume's") list of files is stored in
a separate file. These lists of filenames with their last modification
date are found at C:\ProgramData\Backblaze\bzdata\bzfilelists\ on Windows,
and /Library/Backblaze.bzpkg/bzdata/bzfilelists/ on the Macintosh. The
name of the list of files starts with the BzVolumeGuid. For the
primary boot (system) volume that BzVolumeGuid begins with letters "v000",
then subsequent drives start with "v001" and then "v002" and so on.
Here is an example of the list's filename from Windows: C:\ProgramData\Backblaze\bzdata\bzfilelists\v000c0101f6fb58de90a713a0e19_c____filelist.dat
and you can open that with WordPad on Windows, or TextEdit on the Macintosh.
The name "v000c0101f6fb58de90a713a0e19_c____filelist.dat" always starts with
the "volume guid" then has an underbar, then a friendly description of the
drive (in my example above this is "_c____" to indicate this is the "C:\"
Windows drive (on Macintosh the system boot drive would have the string
"_root_"), then always ends with "filelist.dat". These "per drive
lists of files" are produced approximately once per hour, but it might be
once every two hours, or even longer for customers with extremely large
volumes. There is a guarantee that the list of files with the name
above is ALWAYS VALID and ALWAYS PRESENT for other programs to read and use,
but it might be 1 or 2 hours "out of date" waiting for the next list of
files to be produced. If a new list of files is being produced by
bzfilelist the new INCOMPLETE list of files has the same name, but at the
end of it is appended "_future".
Inside of one of these lists named
things like "v000c0101f6fb58de90a713a0e19_c____filelist.dat" the very first
line inside that file is when that list of files was created, it looks like:
# GmtMillisThisListWasStarted: 00000173f855ec2b, GmtDateTime: 20200816173407
Those are actually the identical date and time, the first one is the number
of milliseconds since 1970, and the second one is human readable and says it
is year "2020", month "08", day "16", then hours, minutes, and seconds.
After that first line, the rest of the contents are pretty self explanatory.
The first letter on each line is an "f" for a file, a <tab> character, then
the last modified timestamp (in milliseconds since 1970), then another <tab>
character, then the number of bytes contained in the file, then another
<tab> character, then the filename in completely pure (non-encoded) Utf8.
When the character '\n' (end of line) is encountered, that marks the end of
that one filename. Because this Utf-8 is not encoded in any way, this
is extremely fast, there is no encode or decode step.
Resource
Load bzfilelist puts on customer
laptop: bzfilelist only runs for maybe 10 minutes once an hour on most
customer's laptops. It is designed to use less than 1% of one core of
CPU (bzfilelist is one single thread), and less than 1% extra load on the
SSD, and while it is running bzfilelist might use about 20 MBytes of RAM or
less (0.25% of an 8 GByte RAM computer - one fourth of 1% of the customer
RAM).
Click here for a deeper description and analysis
of what bzfilelist does.
- bztransmit - sends copies of new and changed files through HTTPS
to the Backblaze datacenter. bztransmit is always launched by bzserv.
This is the main work horse of the Backblaze client, and where most of the
logic surrounding the backups occurs. bztransmit has no UI components,
and is therefore largely cross platform between Windows and Macintosh.
bztransmit runs as the user "SYSTEM" on Windows, and as the user "root" on
Macintosh because it is always launched by parent process "bzserv" (see
above).
Location on Disk Windows: C:\Program Files
(x86)\Backblaze\bztransmit.exe (and a 64 bit version at C:\Program Files
(x86)\Backblaze\x64\bztransmit64.exe)
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bztransmit
(the Macintosh version is ONLY 64 bit, Apple has not shipped a 32 bit laptop
in over a decade)
Purpose of bztransmit: The primary purpose of "bztransmit" is
to make the logical decision of which files to backup (based on the lists
prepared by bzfilelist), read those customer files into RAM, compress them
(lossless compression), then encrypt the compressed file (still held in
RAM), then send this encrypted file over HTTPS (double encryption) to the
Backblaze datacenter as a backup. The bztransmit executable also
updates the local laptop's record of what files with what "last modification
date" have already been sent to Backblaze so that it doesn't have to back up
a file twice. The records of what files have already been backed up is
called the "bz_done" files.
The encryption bztransmit uses
is symmetric
AES-128
to encrypt each file, with new AES keys and
Initialization
Vector for every customer data file being different. The AES keys
themselves are then encrypted with
2048 bit
public/private key encryption using the customer's public key which is
stored on the client in C:\Program Files (x86)\Backblaze\userPub.pem on
Windows, and /Library/Backblaze.bzpkg/userPub.pem on Macintosh. The
"private key" is stored on the Backblaze servers ITSELF (the private key)
encryped with a totally standard OpenSSL "passphrase". By default this
is a closely guarded secret passphrase only known by Backblaze and rotated
(changed) regularly for security reasons. However, the customer can
optionally choose to setup a "Private Encryption Key" in the Backblaze
client which changes the passphrase to something only the customer knows.
This passphrase is required to restore any files from the backup, and is not
recoverable in any way, so the customers who set this up need to remember it
or their backup is useless.
What Order does bztransmit upload files? Which files are backed up
first? In general, Backblaze backs up in file size order, small
files first. Backblaze DOES NOT backup folders, or folders of files,
or group the backup into folders. All files are individual files.
So the client will backup 1 small file from one folder, then backup another
small file from a different folder next, then return to the first folder to
backup any larger files. This often confuses customers who are closely
watching their backup, and they think Backblaze has "skipped over" one of
their files in a folder, when in reality Backblaze will loyally return to
that folder when it is backing up larger files. There are exceptions
to the rule of "small files first" as follows: 1) Backblaze really wants to
backup at least one file from every different volume FIRST, so the smallest
file from each of the attached SSDs or Hard Drives is put to the start of
the queue - this is to give the customer immediate feedback that Backblaze
saw and acknowledged that each volume is part of the backup and has been
refreshed recently, and 2) If the backup was paused or the laptop shut down
in the middle of transmitting a large file, Backblaze attempts to complete
the transmission of that large file before going back to start at the small
files. This is to avoid wasting all the effort that went into backing
up half of a very large file every time the laptop goes to sleep.
How does bztransmit handle really small files? For any files less
than 15 MBytes in one file, bztransmit "batches" up to 999 of these files
into one packed datastructure for more efficient transmitting. The
HTTPS set and tear down murder performance for small files, so doing an
individual HTTPS POST for each 1 byte or 2 byte file is painfully slow.
So bztransmit fully prepares the files in all the standard ways (read the
file, compress the file, encrypt the file) then appends the finished package
end-on-end, and transmits it to the Backblaze datacenter as one HTTPS POST
operation. It still adheres to the "small files first", so initially
this can be 999 files each of which are 1, 2, or 3 bytes each, so the
finished HTTPS POST operation is still relatively small at 1 - 3 KBytes in
total length. But this is approximately 1,000 time faster than doing
each file individually, so it is worth it. Later on, as the "batch of
files" is being assembled, bztransmit stops appending more compressed,
encrypted small files when the size of any one "batch" file gets larger than
30 MBytes in size. So there might only be 2 files in a single "batch"
HTTPS POST when bztransmit is transmitting 14 MByte files. Once each
individual file is larger than 15 MBytes the setup and teardown of HTTPS
shrinks to be less than 1% of the transmission time, so bztransmit does one
HTTPS POST for each file 15 MBytes or larger. The cutoff of 15 MBytes
was originally decided so that a single HTTPS POST would not time out even
on the very slowest customer connections.
How bztransmit handles large files: For any files between 15 MBytes - 100 MBytes, bztransmit reads the file from disk, compresses the file, encrypts the file,
and transmits it to the Backblaze datacenter as one unit. For large
files this becomes impractical, so for files larger than 100 MBytes
bztransmit FIRST makes an entire copy of the file broken down into 10 MByte
"chunks" that are found in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\bzcurrentlargefile\
on Windows, and /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/bzcurrentlargefile/
on Macintosh. That folder can be changed to an external drive if the
customer changes the "Temporary data drive" in the "Settings..." client
panel. The original "cutoff" for self contained files was 30 MBytes
(not the current 100 MBytes) for two reasons: 99% of customer files were
smaller than 30 MBytes each, and that was small enough where the HTTPS POST
of a single file did not time out on even the slowest customer connections.
I raised this to 100 MBytes in 2018 for both reasons had changed. The
very slowest upload connections any customer had was now at least 3x faster
making it possible, and a lot of large images and some music files were no
longer fitting inside 30 MBytes in a single file, but 99% of individual
files still fit inside of 100 MBytes. So that is the modern cutoff
point.
Threading and Bandwidth Utilization in bztransmit: bztransmit has two
modes: threaded and non-threaded. If a customer sets the number of
threads to "1" in the Backblaze "Settings..." (see the "Performance" tab)
then the customer can control the amount of bandwidth the client uses
(change the "Throttle" slider) to be as low as 128 Kbits/sec upload rate, up
to about 10 Mbits/sec which is the approximate maximum upload speed of 1
thread (unthrottled). However, the maximum upload speed for 1 thread
varies depending on how far the customer's laptop is from the Backblaze
datacenter due to latency issues (for example, the maximum upload speed from
New Zealand might be only 1 Mbit/sec), and other things can affect the
maximum upload speed like the size of files (small files murder uploaded
performance due to the setup and tear down overhead of HTTPS). Setting
the client to use 1 upload thread is also the most SSD efficient setting,
the very minimum number of copies of any file are made this way, usually
this is "zero copies" - bztransmit reads the file from SSD into RAM,
compresses it in RAM, encrypts it in RAM, and transmits it from RAM through
HTTPS to the Backblaze datacenter. Ok, so the OTHER mode of bztransmit
occurs when the customer sets the number of threads to "2" or more, and it
can go up to 30 threads. Each thread runs at maximum speed, so with 30
threads at 10 Mbits/sec it is possibly for the customer to use up to 300
Mbits/sec of upload capacity (if they are close enough to the Backblaze
datacenter, and if they have a fast enough SSD that can keep up). When
using 2 - 30 threads, Backblaze makes a copy of each file before handing the
copy off to a unique thread, so the threaded mode of operation requires 1
more temporary copy of each file be made on the SSD.
A note about
thread names: bztransmit uses "full memory protected processes" to implement
threading. Just so the names of the threads are unique, the Backblaze
installer makes IDENTICAL (down to the last byte) copies of the bztransmit
executable named unique things like
this on Windows:
C:\Program Files (x86)\Backblaze\x64\bztrans_thread00.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread01.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread02.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread03.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread04.exe
.... etc ....
C:\Program Files (x86)\Backblaze\x64\bztrans_thread18.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread19.exe
It is the same on the Macintosh, but found in the folder /Library/Backblaze.bzpkg/
with the same names as above. By assigning uniquely named executables
to do each task, customers can watch the (now "named") threads come and go
in "Activity Monitor" on the Macintosh, and "Task Manager" on Windows.
Now, you might notice there are only 20 of these executable names numbered
"00" - "19" and you can use up to 30 threads. At some point this
system is silly and wastes customer disk space, so when bztransmit is using
21 - 30 threads it assigns a task that is supposed to be done by
"thread25.exe" to a unique thread of course, but it uses the executable
named "bztrans_thread05.exe" to do the task. In other words, if the
thread is numbered #20-29 then subtract 10 from the actual thread number to
know which executable name was used.
Resource
Load bzfilelist puts on customer laptop:
bztransmit does all the encryption, and all the network communication for
Backblaze's client, and depending on the customer settings and the size
of the customer data it can use as little as
100 MBytes of RAM and a very small CPU load (5% of one core), or it can cause quite a bit of
load and RAM use, and use all 16 cores of CPU at the same time. The
worst case situation is this: the customer is using 30 threads and they have
a lot of 100 MByte files, this means each bztransmit process could be
holding 100 MBytes EACH, for a total of 3 GBytes RAM use just for the data
in memory, and it might actually come close to using 4 GBytes of RAM use
when you include the extra data structures to figure out what to backup.
Now that is up to the customer, and a customer who wants to backup all night
long and has a modern laptop with 16 GBytes of RAM won't even notice it.
But a customer with only a slightly old 8 GByte RAM laptop trying to use
their laptop during the middle of the day might want to set Backblaze to
only use 10 threads to keep their RAM use way down low at less than 1 GByte.
And any laptop not in the massive initial upload state can EASILY keep up
with only 4 or 5 threads backing up in the middle of the day and the load
will be quite minimal.
Click here for a deeper description and analysis
of what bztransmit does.
- bzbui (called "bzbmenu" on the Macintosh Activity Monitor)
- this is the client's local laptop GUI (Graphical User Interface).
Because bzbui is all UI components, it written in different languages between Windows
(C++) and Macintosh (Objective C), and shares very little code. For a
customer to bring up the bzbui GUI, in Windows they click on a "Backblaze
red flame" icon in the system tray. On the Macintosh, they pull down
the "black flame" icon along the very top right of their monitor, or go to
the Macintosh System Preferences and click on the "Backblaze" system pref.
bzbui (bzbmenu on the Macintosh) runs as the current user logged in, so that
it has permissions to access the keyboard and mouse for input. bzbui (bzbmenu
on the Macintosh) does not run AT ALL unless the user is currently fully
logged into their laptop with their local laptop's username and password
(completely different than the Backblaze account username which is an email
address and Backblaze account password).
Location on Disk Windows: C:\Program Files
(x86)\Backblaze\bzbui.exe
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzbmenu.app
(and a Macintosh System Pref Panel)
Purpose of bzbui: The primary purpose of "bzbui" is to present
the customer with a local interface to the Backblaze client and local
controls for things running on their local laptop. The most essential
thing that bzbui does is edit the file "bzinfo.xml" which is found at C:\ProgramData\Backblaze\bzdata\bzinfo.xml
on Windows, and /Library/Backblaze.bzpkg/bzdata/bzinfo.xml on the Macintosh.
The file "bzinfo.xml" is the configuration and instructions for how all the
OTHER (background) client executables behave. For example, if a
customer adds a folder to exclude using bzbui, that excluded folder path is
added to bzinfo.xml, and so on. Most everything that occurs in the GUI
presented by bzbui simply edits the file bzinfo.xml on the local laptop's
SSD.
The executable bzbui (bzbmenu on the Macintosh) runs as the
current user logged in, so that it has access to the GUI. It is
COMPLETELY unnecessary for this to run for the backup to continue, as proven
by logging out of the local laptop's account and the backup will continue
just fine (better even) than when the user is signed in and bzbui/bzbmenu is
running. It is silly to disable/kill this process as it is so
ridiculously light weight, but the process is completely optional and
killing it will not affect the backup's progress at all. Sometimes
customers are confused by this, they feel like if they kill this process the
backup should stop, but it has literally nothing to do with the backup
progress other than writing out configuration files.
One of the other things bzbui (bzbmenu on the Macintosh) does is that it can
"Pause" a running backup (by clicking the GUI button <Pause Backup>) and it
can unpause (start the backup again) later if you click the <Backup Now>
button.
Another responsibility of bzbui (bzbmenu on the
Macintosh) is to pop up warning and error dialogs if something is wrong,
like if the backup is not progressing for some reason. For example, if
the customer's credit card is totally maxed out at the limit, and the
payment to Backblaze fails, then Backblaze will both send emails (from the
datacenter), and also pop up dialogs on the client to explain the customer
needs to fix the billing problem. In general the customer has 45 days
to fix a billing problem, but if they refuse to pay Backblaze for more than
45 days their backup will be deleted from the Backblaze servers to free up
space for other (paying) customers. Another thing bzbui/bzbmenu will
pop up a warning dialog about is if the customer has gone too long without
plugging in one of their external drives that is "selected for backup" and
runs some danger of losing the backup of that one drive. Another
important aspect of bzbui/bzbmenu is to monitor that bzserv is running.
The way it does this is bzserv writes out a "heartbeat" file once every 10
minutes as a kind of "dead man's switch" to prove it is running properly.
If the heartbeat file is missing (not updated) for more than 30 minutes,
bzbui/bzbmenu pops up an error dialog explaining there is a VERY PROFOUND
problem that must be fixed or the backup cannot continue - since bzserv is
required to be resident and running so that it can launch the other backup
processes.
The bzbui/bzbmenu process has a few other
miscellaneous tasks available in its small pull down menu such as "Inherit
Backup State" and displaying an "About..." dialog with the version of the
client that is currently installed.
Resource
Load bzbui / bzbmenu puts on customer
laptop: bzbui / bzbmenu is extremely small and efficient, and ESPECIALLY when the
interface is not up on the screen (which is how most customers run Backblaze
99.9999% of the time when not changing any configurations). It is designed to use less than
0.001% of one core of
CPU (bzbui is one single thread), and less than 0.001% extra load on the SSD,
and it might use at most might use about 30 MBytes - 40 MBytes of RAM or
less (0.5% of an 8 GByte RAM computer - one half of 1% of the customer
RAM). It should be one of the smallest RAM uses of any process on a
customer's laptop.
Click here for a deeper description and analysis
of what bzbzui / bzbmenu does.
- Honorable Mention: bzfclean - This process is never run for any
reason normally. It is only run as the very very final step of
"Uninstall" of the entire Backblaze client.
Location on Disk Windows: C:\Program Files
(x86)\Backblaze\bzfclean.exe
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzfclean
Purpose of bzfclean: This absolutely tiny (4 KBytes) program has no
GUI, and it is run as the very final step when a customer uninstalls the
entire Backblaze client from their local laptop. Backblaze prides
itself on a completely clean uninstall - no registry entries left behind,
and zero files or folders left behind on the customer's laptop. On
Windows computers, it is difficult to uninstall the very last executable
running the uninstaller, because running an executable on Windows means you
cannot also delete it. An executable cannot delete itself. To
work around this issue, Backblaze copies bzfclean to a temporary folder that
Windows will clean up automatically at a later date and RUNS IT FROM THAT
LOCATION. When the bzfclean executable is run as the final step in the
uninstaller, the uninstaller runs bzfclean and then IMMEDIATELY exits itself
(unlocking it's own executable). So when bzfclean runs, it first wakes
up as a running process, and then it very consciously "pauses" itself for 2
or 3 seconds to let the uninstaller exit and quit running, then this tiny
little executable reaches back and deletes the uninstaller, leaving no trace
behind.
- Honorable Mention: bzdoinstall - The Backblaze client installer
is a self contained executable that also includes all the files and
executables to be installed inside of itself. This is called a "Self
Extracting Archive" in old computer science terms. Backblaze does not
use an "off the shelf installer" like "InstallShield"
or "Wise Installer",
the installer is written and maintained in house by the client software
engineers. When the Backblaze client installer runs, the self extracting
program (interally in the Backblaze build tree this is called "bzserlfextractor")
unpacks all of it's internally contained files to install into a TEMPORARY
folder first, including "bzdoinstall". Then the self extracting
program is all done with it's primary task, and the final step is to launch
"bzdoinstall" which has a GUI to present to the customer so the customer can
enter their customer email address and Backblaze password to complete the
install. The executable "bzdoinstall" authenticates with the Backblaze
website, copies the executables to their correct final locations, and
finally presents a progress dialog to the customer as the laptop's SSD is
scanned for the very first time for the initial list of files to upload.
- Honorable Mention: bzdownloader - This is technically not part of
the Backblaze client in that it has nothing at all to do with backing up the
computer. bzdownloader does not run EVER as part of the backup
process. What bzdownloader does is help customers download their free
ZIP file restores they have prepared on the Backblaze website. Right
after we finished the Backblaze Personal Backup client and the web restore
process 13 years ago we thought we finally had a complete product. In
product development terms we would have called this a "minimum viable
product" - a product that doesn't satisfy everybody and has some rough
edges, but you could sell it for money and some customers would find it
useful. However, IMMEDIATELY a profound problem appeared that AT THAT
TIME no web browser could download any file larger than 2 GBytes.
Period. This was before the
HTTP Range Header
had been implemented by any web browsers (the "Range" header specifies a sub
range of a large file, and was finally adopted in 2014 - 7 years after the
Backblaze client was launched). So for any customer to get back more
than 2 GBytes of restored files (which is comically small) the Backblaze
client team of 1 Windows programmer and 1 Macintosh programmer had to
furiously create "bzdownloader" while ignoring any backup client bug fixes.
The bzdownloader can use up to 30 threads to download different parts (what
would later be called "different ranges") of a file at the same time.
All of the Backblaze restore servers have 10 Gbit/sec ethernet on them, and
the vast majority of USA customers have 1 Gbit/sec download capacity now, so
bzdownloader is designed around 40 MByte blocks where it can download 30 of
them at the same time, to reach speeds of something like 1 Gbit/sec OR
HIGHER to download restores. The bzdownloader is offered up as an
executable program that doesn't even have an installer of any kind when a
customer goes to download their web based ZIP restore. Finally, on
Windows the bzdownloader has one additional responsibility - to unzip the
ZIP restore once it has finished downloading it. The reason for this
is that up until Windows 10, the "unzip" functionality built into Windows
was atrocious. Up until Windows 10, the built in Windows Explorer had
"Unzip", but if you ever clicked "Unzip" on any ZIP file larger than 2
GBytes, it would run for 30 minutes then crash. ACTUALLY CRASH.
Not one of the 20,000 programmers at Microsoft could be bothered to check
the ZIP file size with 2 lines of code and pop up an error dialog with
ANOTHER 4 lines of code that said "Microsoft Is Unable to Unzip files larger
than 2 GBytes - go Install WinZip or something." So the bzdownloader
on Windows bundles a free program called "7-zip"
that does a pretty good job, and the bzdownloader uses the command line
version of 7-zip to unzip the newly downloaded ZIP files. This
additional functionality is only needed on Windows, the Macintosh Finder
does a pretty good job of handling unzipping without additional software.
- Honorable Mention: the bztrans_thread (0 - 19) executables -
Every one of these executables is an identical copy to the others, and is a
copy of the original "bztransmit" executable. If you see these
processes in "Activity Monitor" on the Macintosh, or "Task Manager" on
Windows, they are explicitly for transmitting files to the Backblaze
datacenter when a customer is running with more than one thread. See
the notes in the "bztransmit" section above.
Future Idea for notes around the Backblaze Personal Backup Client:
Look through brianwski's reddit posts, copy and paste contents into here, or
link directly to them.
Some Notes on International Strings in the Backblaze Client:
<fill out some info here> - Maybe possibly link to internationalization
video (might need edits): <redacted>
Documentation around bz_done files on the Backblaze Client:
You can watch a 57 minute tutorial on how to understand the internals of bz_done files here:
https://www.youtube.com/watch?v=MOlz36nLbwA You can view a slide
used in that presentation here that documents what every column does by
clicking this
link.
All done.
Return to Random Stufff
Return to Ski-Epic home page.