2020 Backblaze Client 
		Architecture
		(written 8/16/2020) by Brian Wilson 
		  
   | 
		
		  | 
	
Backblaze Personal 
Backup (internally to Backblaze it has the nickname "B1") is software that 
runs on a customer's laptop or desktop and pushes a copy of all their files to 
the Backblaze datacenter in Sacramento, California or the Backblaze datacenter 
in Amsterdam, Netherlands.  Backblaze charges a fixed price of $6/month per 
laptop for this service.  The flat fee includes any external drives as long 
as they are physically connected to the laptop or desktop.  This document 
describes in detail the architecture of the Backblaze client that runs on the 
customer's laptop or desktop computer.  This includes what executables 
exist on disk and what they do, the data structures Backblaze uses, and the flow 
of how the backups work.
Terminology:
	- "Laptop" - this document uses the term "laptop" to mean "customer 
	laptop or desktop", because it is clumsy to mention "customer laptop or 
	desktop" in every sentence, and more than half the computers that Backblaze 
	backs up are laptops nowadays.  A "laptop" is always a customer's 
	computer.  Using the term "laptop" is also helpful to clarify that it 
	is a customer's computer, because "datacenters" (see below) run by Backblaze 
	never have any laptops in them.
   
- "Datacenter" or "Backblaze datacenter" - the physical location 
and the "servers" (see below) that Backblaze run to store the backups for 
retrieval later.  These datacenters are in Sacramento, California and in 
Phoenix, Arizona, and in Amsterdam, Netherlands.  
  
	- "Region" or "Backblaze region" - Backblaze supports 
	storing your backup in a selected region.  When this document was 
	written, there are two regions: "US West" (Sacramento), and "EU Central" 
	(Amsterdam).  Notice that there might be more than one "datacenter" in 
	a single region - that is because when Backblaze (the company) runs out of 
	space in one datacenter, Backblaze (the company) has to go find more space 
	in an additional datacenter that is "very near the other datacenters in that 
	region".
  
- "Server" or "Backblaze Servers" - the computers that run in 
the Backblaze datacenters.  The "client" (see below) communicates to the 
servers.
  
- "Client" or "Backblaze Client" - the collection of executable 
programs that make up what runs on a customer laptop.  This collection of 
executables includes a GUI (graphical interface) in one executable, and separate 
from that are a collection of executables that send the files found on the 
laptop to the Backblaze servers over the HTTPS protocol.
  
	- "SSD" - Solid State Drive - for all intents and purposes you can 
	substitute "Hard Drive" where-ever you see "SSD" if appropriate in your 
	computer.  Most modern laptops only use SSDs, hard drives are 
	disappearing from the world other than in datacenters.  But if you own 
	a laptop that is 10 years old it might not contain an SSD, it might be an 
	old fashion very slow hard drive instead.  
  
- "Account" or "Backblaze account" or "Backblaze Web Account" 
- customers create an account on the Backblaze website, and their account is 
where their data can be retrieved later, and where the customers specify payment 
for the service with a Credit Card.  Backblaze accounts each have a 12 
hexadecimal digit unique id such as: "fd123d6faf4a" assigned at account creation 
time that never changes, however all accounts have exactly 1 customer email 
address associated with them, and that email address is the "username" that the 
customers use to sign in, so we often say "Backblaze accounts are defined by an 
email address".  Email addresses for accounts are absolutely globally 
unique, we never allow a second customer to create an account with an email 
address that is currently "in use" on any other Backblaze account anywhere in 
the world.  Customers sign into their account at:
https://secure.backblaze.com/user_signin.htm 
  
	- "Host" or "Computer" - unfortunately these terms are used 
	interchangeably for the customer's laptop and/or the backup of that laptop.  
	The regrettable term "host" was a term we used early in Backblaze's history 
	for the customer's laptop, so it is hard to get rid of from our terminology.
  
	- "HGUID" or "hguid" - stands for "Host Globally Unique 
	Identifier" - this is a 24 digit hexadecimal number that describes one 
	"laptop backup".  An example is "bf654d292a61ca0e193e908f".  
	Hguids are globally unique, no two backups will ever have the same hguid.  
	One customer account (one email address) can have multiple backups inside 
	it.  For example, if one customer with email address "joe@corp.com" 
	owns a Macintosh laptop running Backblaze and a Windows laptop running 
	Backblaze, that customer's one Backblaze account will have two separate 
	hguids inside of it - they define two different backups.  When 
	preparing a restore, the customer would sign into their one Backblaze 
	account, and choose which of the two backups to restore from.
  
	- "volume" or "attached volume" or "partition" - A 
	Macintosh customer laptop's SSDs or hard drives are organized by what are 
	called "volumes" on the Macintosh.  Usually there is 1 volume per SSD 
	or hard drive, but technically advanced customers sometimes create two or 
	more "volumes" on one physical device (one SSD or hard drive) to make it 
	appear as if two separate SSDs exist to the computer instead of just 1 SSD.  
	On Windows and Linux it is often called a "partition" but it is the 
	IDENTICAL concept.  In this document I  use the word "volume" or 
	"partition" interchangeably, and you have to substitute the appropriate word 
	for the platform you are working with at the time.  Also, it is so 
	common that there is 1 volume per SSD, that when I write "SSD" it means a 
	logical volume.  As far as the Operating System is concerned, two 
	volumes on one SSD are simply two different SSDs.  On Windows each 
	partition gets a separate drive letter, so if a customer has three 
	partitions it might appear as if they have 3 physical SSDs attached to the 
	computer named "C:\", "E:\" and "G:\" or something like that, when in 
	reality it is one SSD.  Backblaze would treat that situation as if 
	there were 3 entirely separate physical SSDs with one partition each.  
	Backblaze only cares about volumes (partitions), in reality Backblaze does 
	not care about the number of different physical SSDs attached, Backblaze 
	deals at the "logical level" of volumes and partitions.
  
	- "Volume Guid" or "vguid" or "BzVolumeGuid" - a 28 
	character globally unique identifier of one customer's laptop's volumes, and 
	it ALWAYS starts with the letter "v".  Here is an example: 
	v000c0101f6fb58de90a713a0e19   The laptop's "boot volume" (also 
	known as the "system volume") ALWAYS starts with the characters "v000", and 
	then subsequent volumes start with "v001" and "v002", etc to make it easy to 
	get a little information quickly (like which is the boot volume).
  
	- "Java Time" or "Milliseconds Since 1970" - The Backblaze 
	client has to communicate with the Backblaze datacenters, and most of the 
	code in the Backblaze datacenters is written in the programming language 
	"Java".  Java has a pretty standard measure of time which is the number 
	of milliseconds since 1970.  The Backblaze client uses this measure of 
	time for everything (on both Macintosh and Windows) such as the "time a file 
	was last modified".  This usually looks like a string of 16 
	hexidecimals units such as "00000173f855ec2b".  You can find web pages 
	all over the internet which will convert that to a human readable date and 
	time for you by copying and pasting the "Java Time" into the web page.
  
	- "Utf8" or "Utf-8" or "Unicode" - Filenames on all 
	modern computers (Windows, Macintosh, Linux, iOS, Android) are encoded in 
	"Unicode", and all modern web browsers use Unicode.  All that means is 
	you can type a single string with letters from different languages, like 
	this: "Hello,여보세요, こんにちは, 你好".  The document you are reading is in 
	Unicode.  The Backblaze client always uses "Utf-8" encoding for 
	everything, which is one of the most standard forms of Unicode.  For 
	most English speakers this just looks like (and acts like) regular old text 
	and most people reading this document don't need to worry about what it 
	means to be in "Unicode" or "Utf-8", so if this doesn't make sense to you, 
	just ignore it.  This is a technical point for people who speak other 
	languages, and their filenames are in other languages (which Backblaze 
	profoundly supports).
  
	- "bz_done" or "bz_done files" - These are a record stored 
	on the customer's laptop of what has been backed up (what has been "done" 
	already).  These files are the most important data structure on the 
	customer's laptop and cannot ever be edited or deleted by the customer, or 
	the backup is hopelessly corrupted.  On Windows you can find these 
	files in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\ and on 
	Macintosh in /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/ and you 
	can think of it as one bz_done file is started every 3 days, with names like 
	this: "bz_done_20200813_0.dat".  The date in the filename is when it 
	was started, and it rolls to the next "date" after 3 or 4 days. 
	AGAIN, DO NOT CHANGE ONE SINGLE CHARACTER IN ANY 
	bz_done FILE, IT WILL CORRUPT THE  BACKUP.  You can 
	safely make a copy of one of these files into some other folder like 
	C:\temp\ and then open the copy safely in WordPad on Windows, or TextEdit on 
	the Mac.  The "bz_done" files are an APPEND ONLY format, by definition 
	they can only grow, because you append new information to the end of them.  
	They are a complete record of everything that has been sent to the Backblaze 
	datacenter.  The "bz_done" files are periodically encrypted and then 
	sent to Backblaze's datacenter, and when a customer signs into the website 
	and goes to "View/Restore" files, the "tree of files" that is seen on the 
	"View/Restore" web page is literally created by reading through the bz_done 
	file that was sent earlier.  You can watch a 57 minute tutorial on how 
	to understand the internals of bz_done files here:
	
	https://www.youtube.com/watch?v=MOlz36nLbwA  You can view a slide 
	used in that presentation here that documents what every column does by
	clicking this 
	link.
  
	- "bz_comb" or "bzcomb files" - When the Backblaze client 
	sends the bz_done files to the Backblaze datacenter, it first reads them 
	from the local laptop SSD into laptop RAM, then compresses the bz_done file, 
	then encrypts it the same way the Backblaze client encrypts all user files 
	before transmitting them, and sends the encrypted blob of data up through 
	HTTPS (double encryption) to Backblaze.  These encrypted blobs that 
	contain bz_done files are named "bz_comb" files inside the Backblaze 
	datacenter.  Customers will never hear this phrase, this term is mostly 
	used internally to Backblaze, and it's just important to know it is 
	identically in every way to a bz_done file, but compressed and encrypted.  
	The name "bz_comb" is very unfortunate, it means "combined file", and is a 
	historical term.  Inside of the bz_comb format, the file NAMES of 
	customer files are encrypted for absolute privacy, but a small amount of the 
	other information contained in bz_done files that is used for various 
	cleanup processes by the Backblaze datacenter are not, and the two separate 
	files are "combined" into one file by appending them end on end - simply to 
	increase upload speed over transmitting them separately as two separate 
	HTTPS requests.
  
	- "Inherit Backup State" - The "backup state" is primarily the "bz_done" 
	files on the customer's laptop, plus 3 or 4 other settings files.  It 
	is a list of what has already been backed up.  "Inherit Backup State" 
	is a feature in the Backblaze client where a customer can purchase a new 
	laptop, and avoid re-uploading all of their data files from scratch to the 
	Backblaze datacenter.  The customer avoids the re-upload by installing 
	a Backblaze trial, then the "Inherit" feature downloads the copy of the 
	bz_done files that are stored in the Backblaze datacenter to their current 
	laptop.  After that, if the customer made any local changes to their 
	directory structure or added new files, the normal backup processes detect 
	those changes within a few hours and happily continue forward doing 
	incremental backups to incorporate those new changes.
  
	- "log files" or "customer logs" - Friendly, easy to read 
	log files created on the customer laptop that are safe to browse (or even 
	change, they are absolutely not used by the backup program in any way other 
	than informational.  These files are found in this folder on the 
	customer laptop:  C:\ProgramData\Backblaze\bzdata\bzlogs\ on Windows, 
	and on Macintosh in /Library/Backblaze.bzpkg/bzdata/bzlogs/ Then the name of 
	the executable that created the log file, like "bztransmit" (see below this 
	terminology section for a list of executables), then there is one log file 
	for each day the client performs a backup.  For instance, on Windows 
	there might be a log file named C:\ProgramData\Backblaze\bzdata\bzlogs\bztransmit\bztransmit16.log 
	which are all the logs bztransmit created on the 16th day of the month.  
	The log files are compressed after 2 days to save customer disk space, and 
	the log files are deleted after 26 days to not grow forever on the customer 
	laptop.  You can open the log files with WordPad on Windows, and 
	TextEdit on the Mac - make the window very very wide and turn off line 
	wrapping to make the logs format better and be more readable.  If a 
	customer has an issue, I usually start debugging that issue by opening the 
	bztransmit log files and searching for the word "ERROR" all in capitals.  
	Just because a line says "ERROR" doesn't mean it is the customer's issue - for 
	example if in the middle of an HTTPS transmit of one customer data file to 
	the Backblaze datacenter the customer WiFi is turned off, this will say 
	"ERROR" in the log file.  But if there are HUNDREDS of the same ERROR 
	in the log files it usually points to the problem.
  
 
Overview of Executables that Make Up the Backblaze Client:
The "Backblaze client" is actually a collection of the 8 executables listed 
below.  Each executable runs at different times and for entirely different 
reasons.
	- bzserv - the core Backblaze service that MUST run all the 
	time for any backup to occur.  It launches other executables (see 
	below) to perform the actual backup.  bzserv has no UI components, and 
	is therefore largely cross platform between Windows and Macintosh.
 
Location on Disk Windows: 
	C:\Program Files (x86)\Backblaze\bzserv.exe
Location on Disk Macintosh: 
	/Library/Backblaze.bzpkg/bzserv
 
	Purpose of bzserv: The process "bzserv" runs all the time as a 
	"service" on the laptop, in order to launch the OTHER executables at the 
	correct times.  The other executables (see below) perform the actual 
	backup, bzserv can be thought of as "a scheduler" who's primary job is to 
	NOT take any CPU and NOT take any RAM and NOT take any SSD 
	performance, but to keep running at all costs.  bzserv is required to 
	run as a service all the time the laptop is running (not shut down), even 
	when the customer is logged out of their laptop, even if the customer is 
	using the backup schedule "Only When I Click <Backup Now>" or "Once Per 
	Day".  Any customer who shuts down bzserv is provably insane, doesn't 
	know what they are doing, and is not qualified to operate a computer because 
	bzserv takes no computer resources, period.  The only valid way of 
	stopping bzserv from running is to entirely uninstall the Backblaze client. 
	If bzserv is not running, that is what is causing ALL of the customer's 
	problems, and it is the number one problem to solve, the first problem to 
	solve, and the ONLY problem to solve.  bzserv runs as the user "SYSTEM" 
	on Windows, and as the user "root" on Macintosh.
Resource Load 
	bzserv puts on customer laptop: 0.00000001% of one core of CPU (bzserv 
	is one single thread), and 0.000001% extra load on the SSD, and about 3 
	MBytes of RAM (0.0375% of an 8 GByte RAM computer - far far less than 1% of 
	the customer RAM - 3.7 hundredths of 1% of the customer RAM).
 
	Click here for a deeper description and analysis of 
	what bzserv does.
  
	- bzfilelist - walks the entire file system on the SSD on the 
	customer's laptop looking for new and changed files to backup.  
	bzfilelist is always launched by bzserv.  bzfilelist creates lists of 
	files, but absolutely does not transmit them anywhere.  bzfilelist 
	completely lacks the ability to do network HTTPS communication, it 
	profoundly cannot do anything but create lists of files for other 
	executables to consume.  bzfilelist has no UI components, and is 
	therefore largely cross platform between Windows and Macintosh.  
	bzfilelist runs as the user "SYSTEM" on Windows, and as the user "root" on 
	Macintosh because it is always launched by parent process "bzserv" (see 
	above).
 
Location on Disk Windows: C:\Program Files 
	(x86)\Backblaze\bzfilelist.exe
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzfilelist
	 
	Purpose of bzfilelist: The primary purpose of "bzfilelist" is to 
	create the complete list of filenames with associated modification dates for 
	each attached SSD or hard drive.  Each SSD's (each "volume's") list of files is stored in 
	a separate file.  These lists of filenames with their last modification 
	date are found at C:\ProgramData\Backblaze\bzdata\bzfilelists\ on Windows, 
	and /Library/Backblaze.bzpkg/bzdata/bzfilelists/ on the Macintosh.  The 
	name of the list of files starts with the BzVolumeGuid.  For the 
	primary boot (system) volume that BzVolumeGuid begins with letters "v000", 
	then subsequent drives start with "v001" and then "v002" and so on.  
	Here is an example of the list's filename from Windows: C:\ProgramData\Backblaze\bzdata\bzfilelists\v000c0101f6fb58de90a713a0e19_c____filelist.dat 
	and you can open that with WordPad on Windows, or TextEdit on the Macintosh.  
	The name "v000c0101f6fb58de90a713a0e19_c____filelist.dat" always starts with 
	the "volume guid" then has an underbar, then a friendly description of the 
	drive (in my example above this is "_c____" to indicate this is the "C:\" 
	Windows drive (on Macintosh the system boot drive would have the string 
	"_root_"), then always ends with "filelist.dat".  These "per drive 
	lists of files" are produced approximately once per hour, but it might be 
	once every two hours, or even longer for customers with extremely large 
	volumes.  There is a guarantee that the list of files with the name 
	above is ALWAYS VALID and ALWAYS PRESENT for other programs to read and use, 
	but it might be 1 or 2 hours "out of date" waiting for the next list of 
	files to be produced.  If a new list of files is being produced by 
	bzfilelist the new INCOMPLETE list of files has the same name, but at the 
	end of it is appended "_future".
Inside of one of these lists named 
	things like "v000c0101f6fb58de90a713a0e19_c____filelist.dat" the very first 
	line inside that file is when that list of files was created, it looks like:
	# GmtMillisThisListWasStarted: 00000173f855ec2b, GmtDateTime: 20200816173407
	Those are actually the identical date and time, the first one is the number 
	of milliseconds since 1970, and the second one is human readable and says it 
	is year "2020", month "08", day "16", then hours, minutes, and seconds.
	After that first line, the rest of the contents are pretty self explanatory.  
	The first letter on each line is an "f" for a file, a <tab> character, then 
	the last modified timestamp (in milliseconds since 1970), then another <tab> 
	character, then the number of bytes contained in the file, then another 
	<tab> character, then the filename in completely pure (non-encoded) Utf8.  
	When the character '\n' (end of line) is encountered, that marks the end of 
	that one filename.  Because this Utf-8 is not encoded in any way, this 
	is extremely fast, there is no encode or decode step.
Resource 
	Load bzfilelist puts on customer 
	laptop: bzfilelist only runs for maybe 10 minutes once an hour on most 
	customer's laptops.  It is designed to use less than 1% of one core of 
	CPU (bzfilelist is one single thread), and less than 1% extra load on the 
	SSD, and while it is running bzfilelist might use about 20 MBytes of RAM or 
	less (0.25% of an 8 GByte RAM computer - one fourth of 1% of the customer 
	RAM).
 
	Click here for a deeper description and analysis 
	of what bzfilelist does.
  
	- bztransmit - sends copies of new and changed files through HTTPS 
	to the Backblaze datacenter.  bztransmit is always launched by bzserv.  
	This is the main work horse of the Backblaze client, and where most of the 
	logic surrounding the backups occurs.  bztransmit has no UI components, 
	and is therefore largely cross platform between Windows and Macintosh.  
	bztransmit runs as the user "SYSTEM" on Windows, and as the user "root" on 
	Macintosh because it is always launched by parent process "bzserv" (see 
	above).
 
Location on Disk Windows: C:\Program Files 
	(x86)\Backblaze\bztransmit.exe (and a 64 bit version at C:\Program Files 
	(x86)\Backblaze\x64\bztransmit64.exe)
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bztransmit 
	(the Macintosh version is ONLY 64 bit, Apple has not shipped a 32 bit laptop 
	in over a decade)
	 
	Purpose of bztransmit: The primary purpose of "bztransmit" is 
	to make the logical decision of which files to backup (based on the lists 
	prepared by bzfilelist), read those customer files into RAM, compress them 
	(lossless compression), then encrypt the compressed file (still held in 
	RAM), then send this encrypted file over HTTPS (double encryption) to the 
	Backblaze datacenter as a backup.  The bztransmit executable also 
	updates the local laptop's record of what files with what "last modification 
	date" have already been sent to Backblaze so that it doesn't have to back up 
	a file twice.  The records of what files have already been backed up is 
	called the "bz_done" files.
The encryption bztransmit uses 
	is symmetric
	AES-128 
	to encrypt each file, with new AES keys and
	Initialization 
	Vector for every customer data file being different.  The AES keys 
	themselves are then encrypted with
	2048 bit 
	public/private key encryption using the customer's public key which is 
	stored on the client in C:\Program Files (x86)\Backblaze\userPub.pem on 
	Windows, and /Library/Backblaze.bzpkg/userPub.pem on Macintosh.  The 
	"private key" is stored on the Backblaze servers ITSELF (the private key) 
	encryped with a totally standard OpenSSL "passphrase".  By default this 
	is a closely guarded secret passphrase only known by Backblaze and rotated 
	(changed) regularly for security reasons.  However, the customer can 
	optionally choose to setup a "Private Encryption Key" in the Backblaze 
	client which changes the passphrase to something only the customer knows.  
	This passphrase is required to restore any files from the backup, and is not 
	recoverable in any way, so the customers who set this up need to remember it 
	or their backup is useless.
	 
	What Order does bztransmit upload files?  Which files are backed up 
	first?  In general, Backblaze backs up in file size order, small 
	files first.  Backblaze DOES NOT backup folders, or folders of files, 
	or group the backup into folders.  All files are individual files.  
	So the client will backup 1 small file from one folder, then backup another 
	small file from a different folder next, then return to the first folder to 
	backup any larger files.  This often confuses customers who are closely 
	watching their backup, and they think Backblaze has "skipped over" one of 
	their files in a folder, when in reality Backblaze will loyally return to 
	that folder when it is backing up larger files.  There are exceptions 
	to the rule of "small files first" as follows: 1) Backblaze really wants to 
	backup at least one file from every different volume FIRST, so the smallest 
	file from each of the attached SSDs or Hard Drives is put to the start of 
	the queue - this is to give the customer immediate feedback that Backblaze 
	saw and acknowledged that each volume is part of the backup and has been 
	refreshed recently, and 2) If the backup was paused or the laptop shut down 
	in the middle of transmitting a large file, Backblaze attempts to complete 
	the transmission of that large file before going back to start at the small 
	files.  This is to avoid wasting all the effort that went into backing 
	up half of a very large file every time the laptop goes to sleep.
	 
	How does bztransmit handle really small files? For any files less 
	than 15 MBytes in one file, bztransmit "batches" up to 999 of these files 
	into one packed datastructure for more efficient transmitting.  The 
	HTTPS set and tear down murder performance for small files, so doing an 
	individual HTTPS POST for each 1 byte or 2 byte file is painfully slow.  
	So bztransmit fully prepares the files in all the standard ways (read the 
	file, compress the file, encrypt the file) then appends the finished package 
	end-on-end, and transmits it to the Backblaze datacenter as one HTTPS POST 
	operation.  It still adheres to the "small files first", so initially 
	this can be 999 files each of which are 1, 2, or 3 bytes each, so the 
	finished HTTPS POST operation is still relatively small at 1 - 3 KBytes in 
	total length.  But this is approximately 1,000 time faster than doing 
	each file individually, so it is worth it.  Later on, as the "batch of 
	files" is being assembled, bztransmit stops appending more compressed, 
	encrypted small files when the size of any one "batch" file gets larger than 
	30 MBytes in size.  So there might only be 2 files in a single "batch" 
	HTTPS POST when bztransmit is transmitting 14 MByte files.  Once each 
	individual file is larger than 15 MBytes the setup and teardown of HTTPS 
	shrinks to be less than 1% of the transmission time, so bztransmit does one 
	HTTPS POST for each file 15 MBytes or larger.  The cutoff of 15 MBytes 
	was originally decided so that a single HTTPS POST would not time out even 
	on the very slowest customer connections.
	 
	How bztransmit handles large files: For any files between 15 MBytes - 100 MBytes, bztransmit reads the file from disk, compresses the file, encrypts the file, 
	and transmits it to the Backblaze datacenter as one unit.  For large 
	files this becomes impractical, so for files larger than 100 MBytes 
	bztransmit FIRST makes an entire copy of the file broken down into 10 MByte 
	"chunks" that are found in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\bzcurrentlargefile\ 
	on Windows, and /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/bzcurrentlargefile/ 
	on Macintosh.  That folder can be changed to an external drive if the 
	customer changes the "Temporary data drive" in the "Settings..." client 
	panel.  The original "cutoff" for self contained files was 30 MBytes 
	(not the current 100 MBytes) for two reasons: 99% of customer files were 
	smaller than 30 MBytes each, and that was small enough where the HTTPS POST 
	of a single file did not time out on even the slowest customer connections.  
	I raised this to 100 MBytes in 2018 for both reasons had changed.  The 
	very slowest upload connections any customer had was now at least 3x faster 
	making it possible, and a lot of large images and some music files were no 
	longer fitting inside 30 MBytes in a single file, but 99% of individual 
	files still fit inside of 100 MBytes.  So that is the modern cutoff 
	point.
	 
	Threading and Bandwidth Utilization in bztransmit: bztransmit has two 
	modes: threaded and non-threaded.  If a customer sets the number of 
	threads to "1" in the Backblaze "Settings..." (see the "Performance" tab) 
	then the customer can control the amount of bandwidth the client uses 
	(change the "Throttle" slider) to be as low as 128 Kbits/sec upload rate, up 
	to about 10 Mbits/sec which is the approximate maximum upload speed of 1 
	thread (unthrottled).  However, the maximum upload speed for 1 thread 
	varies depending on how far the customer's laptop is from the Backblaze 
	datacenter due to latency issues (for example, the maximum upload speed from 
	New Zealand might be only 1 Mbit/sec), and other things can affect the 
	maximum upload speed like the size of files (small files murder uploaded 
	performance due to the setup and tear down overhead of HTTPS).  Setting 
	the client to use 1 upload thread is also the most SSD efficient setting, 
	the very minimum number of copies of any file are made this way, usually 
	this is "zero copies" - bztransmit reads the file from SSD into RAM, 
	compresses it in RAM, encrypts it in RAM, and transmits it from RAM through 
	HTTPS to the Backblaze datacenter.  Ok, so the OTHER mode of bztransmit 
	occurs when the customer sets the number of threads to "2" or more, and it 
	can go up to 30 threads.  Each thread runs at maximum speed, so with 30 
	threads at 10 Mbits/sec it is possibly for the customer to use up to 300 
	Mbits/sec of upload capacity (if they are close enough to the Backblaze 
	datacenter, and if they have a fast enough SSD that can keep up).  When 
	using 2 - 30 threads, Backblaze makes a copy of each file before handing the 
	copy off to a unique thread, so the threaded mode of operation requires 1 
	more temporary copy of each file be made on the SSD.
A note about 
	thread names: bztransmit uses "full memory protected processes" to implement 
	threading.  Just so the names of the threads are unique, the Backblaze 
	installer makes IDENTICAL (down to the last byte) copies of the bztransmit 
	executable named unique things like 
	this on Windows:
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread00.exe
	     C:\Program Files (x86)\Backblaze\x64\bztrans_thread01.exe
	     C:\Program Files (x86)\Backblaze\x64\bztrans_thread02.exe
	     C:\Program Files (x86)\Backblaze\x64\bztrans_thread03.exe
	     C:\Program Files (x86)\Backblaze\x64\bztrans_thread04.exe
	       .... etc ....
	     C:\Program Files (x86)\Backblaze\x64\bztrans_thread18.exe
	     C:\Program Files (x86)\Backblaze\x64\bztrans_thread19.exe
 
	It is the same on the Macintosh, but found in the folder /Library/Backblaze.bzpkg/ 
	with the same names as above.  By assigning uniquely named executables 
	to do each task, customers can watch the (now "named") threads come and go 
	in "Activity Monitor" on the Macintosh, and "Task Manager" on Windows.   
	Now, you might notice there are only 20 of these executable names numbered 
	"00" - "19" and you can use up to 30 threads.  At some point this 
	system is silly and wastes customer disk space, so when bztransmit is using 
	21 - 30 threads it assigns a task that is supposed to be done by 
	"thread25.exe" to a unique thread of course, but it uses the executable 
	named "bztrans_thread05.exe" to do the task.  In other words, if the 
	thread is numbered #20-29 then subtract 10 from the actual thread number to 
	know which executable name was used.
	Resource 
	Load bzfilelist puts on customer laptop: 
	bztransmit does all the encryption, and all the network communication for 
	Backblaze's client, and depending on the customer settings and the size 
	of the customer data it can use as little as 
	100 MBytes of RAM and a very small CPU load (5% of one core), or it can cause quite a bit of 
	load and RAM use, and use all 16 cores of CPU at the same time.  The 
	worst case situation is this: the customer is using 30 threads and they have 
	a lot of 100 MByte files, this means each bztransmit process could be 
	holding 100 MBytes EACH, for a total of 3 GBytes RAM use just for the data 
	in memory, and it might actually come close to using 4 GBytes of RAM use 
	when you include the extra data structures to figure out what to backup.  
	Now that is up to the customer, and a customer who wants to backup all night 
	long and has a modern laptop with 16 GBytes of RAM won't even notice it.  
	But a customer with only a slightly old 8 GByte RAM laptop trying to use 
	their laptop during the middle of the day might want to set Backblaze to 
	only use 10 threads to keep their RAM use way down low at less than 1 GByte.  
	And any laptop not in the massive initial upload state can EASILY keep up 
	with only 4 or 5 threads backing up in the middle of the day and the load 
	will be quite minimal.
 
	Click here for a deeper description and analysis 
	of what bztransmit does.
  
	- bzbui (called "bzbmenu" on the Macintosh Activity Monitor) 
	- this is the client's local laptop GUI (Graphical User Interface).  
	Because bzbui is all UI components, it written in different languages between Windows 
	(C++) and Macintosh (Objective C), and shares very little code.  For a 
	customer to bring up the bzbui GUI, in Windows they click on a "Backblaze 
	red flame" icon in the system tray.  On the Macintosh, they pull down 
	the "black flame" icon along the very top right of their monitor, or go to 
	the Macintosh System Preferences and click on the "Backblaze" system pref.  
	bzbui (bzbmenu on the Macintosh) runs as the current user logged in, so that 
	it has permissions to access the keyboard and mouse for input.  bzbui (bzbmenu 
	on the Macintosh) does not run AT ALL unless the user is currently fully 
	logged into their laptop with their local laptop's username and password 
	(completely different than the Backblaze account username which is an email 
	address and Backblaze account password).
 
Location on Disk Windows: C:\Program Files 
	(x86)\Backblaze\bzbui.exe
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzbmenu.app 
	(and a Macintosh System Pref Panel)
	 
	Purpose of bzbui: The primary purpose of "bzbui" is to present 
	the customer with a local interface to the Backblaze client and local 
	controls for things running on their local laptop.  The most essential 
	thing that bzbui does is edit the file "bzinfo.xml" which is found at C:\ProgramData\Backblaze\bzdata\bzinfo.xml 
	on Windows, and /Library/Backblaze.bzpkg/bzdata/bzinfo.xml on the Macintosh.  
	The file "bzinfo.xml" is the configuration and instructions for how all the 
	OTHER (background) client executables behave.  For example, if a 
	customer adds a folder to exclude using bzbui, that excluded folder path is 
	added to bzinfo.xml, and so on.  Most everything that occurs in the GUI 
	presented by bzbui simply edits the file bzinfo.xml on the local laptop's 
	SSD.
The executable bzbui (bzbmenu on the Macintosh) runs as the 
	current user logged in, so that it has access to the GUI.  It is 
	COMPLETELY unnecessary for this to run for the backup to continue, as proven 
	by logging out of the local laptop's account and the backup will continue 
	just fine (better even) than when the user is signed in and bzbui/bzbmenu is 
	running.  It is silly to disable/kill this process as it is so 
	ridiculously light weight, but the process is completely optional and 
	killing it will not affect the backup's progress at all.  Sometimes 
	customers are confused by this, they feel like if they kill this process the 
	backup should stop, but it has literally nothing to do with the backup 
	progress other than writing out configuration files.
	One of the other things bzbui (bzbmenu on the Macintosh) does is that it can 
	"Pause" a running backup (by clicking the GUI button <Pause Backup>) and it 
	can unpause (start the backup again) later if you click the <Backup Now> 
	button.  
  
Another responsibility of bzbui (bzbmenu on the 
	Macintosh) is to pop up warning and error dialogs if something is wrong, 
	like if the backup is not progressing for some reason.  For example, if 
	the customer's credit card is totally maxed out at the limit, and the 
	payment to Backblaze fails, then Backblaze will both send emails (from the 
	datacenter), and also pop up dialogs on the client to explain the customer 
	needs to fix the billing problem.  In general the customer has 45 days 
	to fix a billing problem, but if they refuse to pay Backblaze for more than 
	45 days their backup will be deleted from the Backblaze servers to free up 
	space for other (paying) customers.  Another thing bzbui/bzbmenu will 
	pop up a warning dialog about is if the customer has gone too long without 
	plugging in one of their external drives that is "selected for backup" and 
	runs some danger of losing the backup of that one drive.  Another 
	important aspect of bzbui/bzbmenu is to monitor that bzserv is running.  
	The way it does this is bzserv writes out a "heartbeat" file once every 10 
	minutes as a kind of "dead man's switch" to prove it is running properly.  
	If the heartbeat file is missing (not updated) for more than 30 minutes, 
	bzbui/bzbmenu pops up an error dialog explaining there is a VERY PROFOUND 
	problem that must be fixed or the backup cannot continue - since bzserv is 
	required to be resident and running so that it can launch the other backup 
	processes.
 
The bzbui/bzbmenu process has a few other 
	miscellaneous tasks available in its small pull down menu such as "Inherit 
	Backup State" and displaying an "About..." dialog with the version of the 
	client that is currently installed.
Resource 
	Load bzbui / bzbmenu puts on customer 
	laptop: bzbui / bzbmenu is extremely small and efficient, and ESPECIALLY when the 
	interface is not up on the screen (which is how most customers run Backblaze 
	99.9999% of the time when not changing any configurations).  It is designed to use less than 
	0.001% of one core of 
	CPU (bzbui is one single thread), and less than 0.001% extra load on the SSD, 
	and it might use at most might use about 30 MBytes - 40 MBytes of RAM or 
	less (0.5% of an 8 GByte RAM computer - one half of 1% of the customer 
	RAM).  It should be one of the smallest RAM uses of any process on a 
	customer's laptop.
 
	Click here for a deeper description and analysis 
	of what bzbzui / bzbmenu does. 
  
	- Honorable Mention: bzfclean - This process is never run for any 
	reason normally.  It is only run as the very very final step of 
	"Uninstall" of the entire Backblaze client.
 
Location on Disk Windows: C:\Program Files 
	(x86)\Backblaze\bzfclean.exe
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzfclean 
	 
	Purpose of bzfclean: This absolutely tiny (4 KBytes) program has no 
	GUI, and it is run as the very final step when a customer uninstalls the 
	entire Backblaze client from their local laptop.  Backblaze prides 
	itself on a completely clean uninstall - no registry entries left behind, 
	and zero files or folders left behind on the customer's laptop.  On 
	Windows computers, it is difficult to uninstall the very last executable 
	running the uninstaller, because running an executable on Windows means you 
	cannot also delete it.  An executable cannot delete itself.  To 
	work around this issue, Backblaze copies bzfclean to a temporary folder that 
	Windows will clean up automatically at a later date and RUNS IT FROM THAT 
	LOCATION.  When the bzfclean executable is run as the final step in the 
	uninstaller, the uninstaller runs bzfclean and then IMMEDIATELY exits itself 
	(unlocking it's own executable).  So when bzfclean runs, it first wakes 
	up as a running process, and then it very consciously "pauses" itself for 2 
	or 3 seconds to let the uninstaller exit and quit running, then this tiny 
	little executable reaches back and deletes the uninstaller, leaving no trace 
	behind.
  
	- Honorable Mention: bzdoinstall - The Backblaze client installer 
	is a self contained executable that also includes all the files and 
	executables to be installed inside of itself.  This is called a "Self 
	Extracting Archive" in old computer science terms.  Backblaze does not 
	use an "off the shelf installer" like "InstallShield" 
	or "Wise Installer", 
	the installer is written and maintained in house by the client software 
	engineers. When the Backblaze client installer runs, the self extracting 
	program (interally in the Backblaze build tree this is called "bzserlfextractor") 
	unpacks all of it's internally contained files to install into a TEMPORARY 
	folder first, including "bzdoinstall".  Then the self extracting 
	program is all done with it's primary task, and the final step is to launch 
	"bzdoinstall" which has a GUI to present to the customer so the customer can 
	enter their customer email address and Backblaze password to complete the 
	install.  The executable "bzdoinstall" authenticates with the Backblaze 
	website, copies the executables to their correct final locations, and 
	finally presents a progress dialog to the customer as the laptop's SSD is 
	scanned for the very first time for the initial list of files to upload.
  
	- Honorable Mention: bzdownloader - This is technically not part of 
	the Backblaze client in that it has nothing at all to do with backing up the 
	computer.  bzdownloader does not run EVER as part of the backup 
	process.  What bzdownloader does is help customers download their free 
	ZIP file restores they have prepared on the Backblaze website.  Right 
	after we finished the Backblaze Personal Backup client and the web restore 
	process 13 years ago we thought we finally had a complete product.  In 
	product development terms we would have called this a "minimum viable 
	product" - a product that doesn't satisfy everybody and has some rough 
	edges, but you could sell it for money and some customers would find it 
	useful.  However, IMMEDIATELY a profound problem appeared that AT THAT 
	TIME no web browser could download any file larger than 2 GBytes.  
	Period.  This was before the
	HTTP Range Header 
	had been implemented by any web browsers (the "Range" header specifies a sub 
	range of a large file, and was finally adopted in 2014 - 7 years after the 
	Backblaze client was launched).  So for any customer to get back more 
	than 2 GBytes of restored files (which is comically small) the Backblaze 
	client team of 1 Windows programmer and 1 Macintosh programmer had to 
	furiously create "bzdownloader" while ignoring any backup client bug fixes.  
	The bzdownloader can use up to 30 threads to download different parts (what 
	would later be called "different ranges") of a file at the same time.  
	All of the Backblaze restore servers have 10 Gbit/sec ethernet on them, and 
	the vast majority of USA customers have 1 Gbit/sec download capacity now, so 
	bzdownloader is designed around 40 MByte blocks where it can download 30 of 
	them at the same time, to reach speeds of something like 1 Gbit/sec OR 
	HIGHER to download restores.  The bzdownloader is offered up as an 
	executable program that doesn't even have an installer of any kind when a 
	customer goes to download their web based ZIP restore.  Finally, on 
	Windows the bzdownloader has one additional responsibility - to unzip the 
	ZIP restore once it has finished downloading it.  The reason for this 
	is that up until Windows 10, the "unzip" functionality built into Windows 
	was atrocious.  Up until Windows 10, the built in Windows Explorer had 
	"Unzip", but if you ever clicked "Unzip" on any ZIP file larger than 2 
	GBytes, it would run for 30 minutes then crash.  ACTUALLY CRASH.  
	Not one of the 20,000 programmers at Microsoft could be bothered to check 
	the ZIP file size with 2 lines of code and pop up an error dialog with 
	ANOTHER 4 lines of code that said "Microsoft Is Unable to Unzip files larger 
	than 2 GBytes - go Install WinZip or something."  So the bzdownloader 
	on Windows bundles a free program called "7-zip" 
	that does a pretty good job, and the bzdownloader uses the command line 
	version of 7-zip to unzip the newly downloaded ZIP files.  This 
	additional functionality is only needed on Windows, the Macintosh Finder 
	does a pretty good job of handling unzipping without additional software.
  
	- Honorable Mention: the bztrans_thread (0 - 19) executables - 
	Every one of these executables is an identical copy to the others, and is a 
	copy of the original "bztransmit" executable.  If you see these 
	processes in "Activity Monitor" on the Macintosh, or "Task Manager" on 
	Windows, they are explicitly for transmitting files to the Backblaze 
	datacenter when a customer is running with more than one thread.  See 
	the notes in the "bztransmit" section above.
  
Future Idea for notes around the Backblaze Personal Backup Client:
Look through brianwski's reddit posts, copy and paste contents into here, or 
link directly to them.  
Some Notes on International Strings in the Backblaze Client:
<fill out some info here> - Maybe possibly link to internationalization 
video (might need edits): <redacted>
Documentation around bz_done files on the Backblaze Client:
You can watch a 57 minute tutorial on how to understand the internals of bz_done files here:
	
	https://www.youtube.com/watch?v=MOlz36nLbwA  You can view a slide 
	used in that presentation here that documents what every column does by
	clicking this 
	link.
 
All done.
Return to Random Stufff
Return to Ski-Epic home page.