2020 bztransmit executable(written 8/16/2020) by Brian Wilson
|
bztransmit - sends copies of new and changed files through HTTPS to the Backblaze datacenter. See this parent 2020 Backblaze Personal Backup architecture page for terminology, and some context for what this VERY SPECIFIC web page is about.
NOTE: this page is currently a repeat of the content on that above page. THIS PAGE IS A PLACE HOLDER that BrianW needs to fill out even more.
bztransmit - sends copies of new and changed files through HTTPS
to the Backblaze datacenter. bztransmit is always launched by bzserv.
This is the main work horse of the Backblaze client, and where most of the
logic surrounding the backups occurs. bztransmit has no UI components,
and is therefore largely cross platform between Windows and Macintosh.
bztransmit runs as the user "SYSTEM" on Windows, and as the user "root" on
Macintosh because it is always launched by parent process "bzserv" (see
above).
Location on Disk Windows: C:\Program Files
(x86)\Backblaze\bztransmit.exe (and a 64 bit version at C:\Program Files
(x86)\Backblaze\x64\bztransmit64.exe)
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bztransmit
(the Macintosh version is ONLY 64 bit, Apple has not shipped a 32 bit laptop
in over a decade)
Purpose of bztransmit: The primary purpose of "bztransmit" is
to make the logical decision of which files to backup (based on the lists
prepared by bzfilelist), read those customer files into RAM, compress them
(lossless compression), then encrypt the compressed file (still held in
RAM), then send this encrypted file over HTTPS (double encryption) to the
Backblaze datacenter as a backup. The bztransmit executable also
updates the local laptop's record of what files with what "last modification
date" have already been sent to Backblaze so that it doesn't have to back up
a file twice. The records of what files have already been backed up is
called the "bz_done" files.
The encryption bztransmit uses
is symmetric
AES-128
to encrypt each file, with new AES keys and
Initialization
Vector for every customer data file being different. The AES keys
themselves are then encrypted with
2048 bit
public/private key encryption using the customer's public key which is
stored on the client in C:\Program Files (x86)\Backblaze\userPub.pem on
Windows, and /Library/Backblaze.bzpkg/userPub.pem on Macintosh. The
"private key" is stored on the Backblaze servers ITSELF (the private key)
encryped with a totally standard OpenSSL "passphrase". By default this
is a closely guarded secret passphrase only known by Backblaze and rotated
(changed) regularly for security reasons. However, the customer can
optionally choose to setup a "Private Encryption Key" in the Backblaze
client which changes the passphrase to something only the customer knows.
This passphrase is required to restore any files from the backup, and is not
recoverable in any way, so the customers who set this up need to remember it
or their backup is useless.
What Order does bztransmit upload files? Which files are backed up
first? In general, Backblaze backs up in file size order, small
files first. Backblaze DOES NOT backup folders, or folders of files,
or group the backup into folders. All files are individual files.
So the client will backup 1 small file from one folder, then backup another
small file from a different folder next, then return to the first folder to
backup any larger files. This often confuses customers who are closely
watching their backup, and they think Backblaze has "skipped over" one of
their files in a folder, when in reality Backblaze will loyally return to
that folder when it is backing up larger files. There are exceptions
to the rule of "small files first" as follows: 1) Backblaze really wants to
backup at least one file from every different volume FIRST, so the smallest
file from each of the attached SSDs or Hard Drives is put to the start of
the queue - this is to give the customer immediate feedback that Backblaze
saw and acknowledged that each volume is part of the backup and has been
refreshed recently, and 2) If the backup was paused or the laptop shut down
in the middle of transmitting a large file, Backblaze attempts to complete
the transmission of that large file before going back to start at the small
files. This is to avoid wasting all the effort that went into backing
up half of a very large file every time the laptop goes to sleep.
How does bztransmit handle really small files? For any files less
than 15 MBytes in one file, bztransmit "batches" up to 999 of these files
into one packed datastructure for more efficient transmitting. The
HTTPS set and tear down murder performance for small files, so doing an
individual HTTPS POST for each 1 byte or 2 byte file is painfully slow.
So bztransmit fully prepares the files in all the standard ways (read the
file, compress the file, encrypt the file) then appends the finished package
end-on-end, and transmits it to the Backblaze datacenter as one HTTPS POST
operation. It still adheres to the "small files first", so initially
this can be 999 files each of which are 1, 2, or 3 bytes each, so the
finished HTTPS POST operation is still relatively small at 1 - 3 KBytes in
total length. But this is approximately 1,000 time faster than doing
each file individually, so it is worth it. Later on, as the "batch of
files" is being assembled, bztransmit stops appending more compressed,
encrypted small files when the size of any one "batch" file gets larger than
30 MBytes in size. So there might only be 2 files in a single "batch"
HTTPS POST when bztransmit is transmitting 14 MByte files. Once each
individual file is larger than 15 MBytes the setup and teardown of HTTPS
shrinks to be less than 1% of the transmission time, so bztransmit does one
HTTPS POST for each file 15 MBytes or larger. The cutoff of 15 MBytes
was originally decided so that a single HTTPS POST would not time out even
on the very slowest customer connections.
How bztransmit handles large files: For any files between 15 MBytes - 100 MBytes, bztransmit reads the file from disk, compresses the file, encrypts the file,
and transmits it to the Backblaze datacenter as one unit. For large
files this becomes impractical, so for files larger than 100 MBytes
bztransmit FIRST makes an entire copy of the file broken down into 10 MByte
"chunks" that are found in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\bzcurrentlargefile\
on Windows, and /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/bzcurrentlargefile/
on Macintosh. That folder can be changed to an external drive if the
customer changes the "Temporary data drive" in the "Settings..." client
panel. The original "cutoff" for self contained files was 30 MBytes
(not the current 100 MBytes) for two reasons: 99% of customer files were
smaller than 30 MBytes each, and that was small enough where the HTTPS POST
of a single file did not time out on even the slowest customer connections.
I raised this to 100 MBytes in 2018 for both reasons had changed. The
very slowest upload connections any customer had was now at least 3x faster
making it possible, and a lot of large images and some music files were no
longer fitting inside 30 MBytes in a single file, but 99% of individual
files still fit inside of 100 MBytes. So that is the modern cutoff
point.
Threading and Bandwidth Utilization in bztransmit: bztransmit has two
modes: threaded and non-threaded. If a customer sets the number of
threads to "1" in the Backblaze "Settings..." (see the "Performance" tab)
then the customer can control the amount of bandwidth the client uses
(change the "Throttle" slider) to be as low as 128 Kbits/sec upload rate, up
to about 10 Mbits/sec which is the approximate maximum upload speed of 1
thread (unthrottled). However, the maximum upload speed for 1 thread
varies depending on how far the customer's laptop is from the Backblaze
datacenter due to latency issues (for example, the maximum upload speed from
New Zealand might be only 1 Mbit/sec), and other things can affect the
maximum upload speed like the size of files (small files murder uploaded
performance due to the setup and tear down overhead of HTTPS). Setting
the client to use 1 upload thread is also the most SSD efficient setting,
the very minimum number of copies of any file are made this way, usually
this is "zero copies" - bztransmit reads the file from SSD into RAM,
compresses it in RAM, encrypts it in RAM, and transmits it from RAM through
HTTPS to the Backblaze datacenter. Ok, so the OTHER mode of bztransmit
occurs when the customer sets the number of threads to "2" or more, and it
can go up to 30 threads. Each thread runs at maximum speed, so with 30
threads at 10 Mbits/sec it is possibly for the customer to use up to 300
Mbits/sec of upload capacity (if they are close enough to the Backblaze
datacenter, and if they have a fast enough SSD that can keep up). When
using 2 - 30 threads, Backblaze makes a copy of each file before handing the
copy off to a unique thread, so the threaded mode of operation requires 1
more temporary copy of each file be made on the SSD.
A note about
thread names: bztransmit uses "full memory protected processes" to implement
threading. Just so the names of the threads are unique, the Backblaze
installer makes IDENTICAL (down to the last byte) copies of the bztransmit
executable named unique things like
this on Windows:
C:\Program Files (x86)\Backblaze\x64\bztrans_thread00.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread01.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread02.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread03.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread04.exe
.... etc ....
C:\Program Files (x86)\Backblaze\x64\bztrans_thread18.exe
C:\Program Files (x86)\Backblaze\x64\bztrans_thread19.exe
It is the same on the Macintosh, but found in the folder /Library/Backblaze.bzpkg/
with the same names as above. By assigning uniquely named executables
to do each task, customers can watch the (now "named") threads come and go
in "Activity Monitor" on the Macintosh, and "Task Manager" on Windows.
Now, you might notice there are only 20 of these executable names numbered
"00" - "19" and you can use up to 30 threads. At some point this
system is silly and wastes customer disk space, so when bztransmit is using
21 - 30 threads it assigns a task that is supposed to be done by
"thread25.exe" to a unique thread of course, but it uses the executable
named "bztrans_thread05.exe" to do the task. In other words, if the
thread is numbered #20-29 then subtract 10 from the actual thread number to
know which executable name was used.
Resource
Load bzfilelist puts on customer laptop:
bztransmit does all the encryption, and all the network communication for
Backblaze's client, and depending on the customer settings and the size
of the customer data it can use as little as
100 MBytes of RAM and a very small CPU load (5% of one core), or it can cause quite a bit of
load and RAM use, and use all 16 cores of CPU at the same time. The
worst case situation is this: the customer is using 30 threads and they have
a lot of 100 MByte files, this means each bztransmit process could be
holding 100 MBytes EACH, for a total of 3 GBytes RAM use just for the data
in memory, and it might actually come close to using 4 GBytes of RAM use
when you include the extra data structures to figure out what to backup.
Now that is up to the customer, and a customer who wants to backup all night
long and has a modern laptop with 16 GBytes of RAM won't even notice it.
But a customer with only a slightly old 8 GByte RAM laptop trying to use
their laptop during the middle of the day might want to set Backblaze to
only use 10 threads to keep their RAM use way down low at less than 1 GByte.
And any laptop not in the massive initial upload state can EASILY keep up
with only 4 or 5 threads backing up in the middle of the day and the load
will be quite minimal.
All done.