TAFT, the Transport Agnostic File Transfer protocol, provides easy programmatic access to remote files. It is intended as a substitute for the FTP protocol, which is highly complex and has many well-known problems. TAFT is a stateless, sessionless protocol that has simplicity as its main design goal.
While TAFT is not a network protocol, it is anticipated that it will be implemented atop a secure network protocol, which is almost certainly HTTPS. The HTTPS protocol already knows how to upload and download files, so the role of TAFT is merely to specify exactly how this is done.
Files are provided by a TAFT server and accessed by a TAFT client. The server makes available a collection of files and directories arranged in a hierarchy; the client downloads the files and lists the directories. Optionally, the server may allow the client to upload, replace, rename, move, and delete both files and directories.
The client and the server communicate by exchanging messages. The syntax and semantics of the messages are defined by the protocol, but TAFT does not define a mechanism for transmitting the messages, and this is the sense in which it is transport agnostic.
Each version of the TAFT protocol is identified by a positive integer that increases by one with each revision of the protocol. To date there has only been one version of the TAFT protocol and it is version 1.
The client and the server must agree on the version of the protocol that they will use. Communication is always initiated by the client, which must tell the server which version of the protocol it wants to use. If the server does not support the version requested by the client, the server will respond with a message that indicates an error.
A TAFT server has the option of providing wide open public access or restricted private access, or both. To access the public files provided by a server, the client does not have to do anything special. To access the private files of a server, the client must supply an access key.
The access key is the only means by which the server knows the identity of the client. The access key is a shared secret of the client and the server and must be guarded as carefully as would a password or a private encryption key. If the client supplies an access key, and if the access key is known to the server and is considered by the server to be valid, the server allows the client to access the private files. The view of the file hierarchy, and the actions that the client is allowed to perform within it, are determined by the server based on the access key.
As far as the protocol is concerned, the access key is simply a non-empty string; the protocol places no restrictions on the length or the content of the string. It is recommended that the access key be generated from a secure source of random data and that its length be at least 16 characters.
The protocol does not specify how access keys are created, managed, or stored, or about their lifetimes, or about how they are bound to user identities or permissions. All of these details are left up to the implementation.
The client and the server communicate by exchanging messages. A message that is sent from the client to the server is called a request, while a message that is sent from the server to the client is called a response.
Message exchange is always initiated by the client. For each request that it receives from a client, the server sends a single response back to the client. In other words, there is a one-to-one correspondence between requests and responses, and the server never sends a response without first receiving a request from the client.
Each message has two parts: the head, which is mandatory, and the body, which is optional. The head consists of a single JSON object in plain text, while the body, if present, consists of file content encoded in Base64.
In a request, the head describes what the client is asking the server to do. If the request includes a command to upload a file, then the body is present and supplies the content of the uploaded file. In a response, the head indicates the success or failure of the request, and includes any metadata asked for by the request. If the request includes a command to download a file, then the body is present and supplies the content of the downloaded file.
The head of a request includes a mandatory
property named command
and will
usually include a property named version
.
Here is an example of a typical request head:
{ "version" : 1, "command" : "download", "path" : "/data/2022/final.csv" }
The value of the version
property is a positive integer stating
the version of the protocol that the
client wants to use.
The value of the command
property is a string containing the
name of the command that the client
is asking the server to perform.
Each command has a list of zero or
more additional properties that serve
as arguments to the command,
some mandatory and some optional.
For example,
the download
command shown
above requires the path
property,
but also has some additional optional properties.
Here is an example of a request that has both a head and a body:
{ "version" : 1, "command" : "upload", "path" : "/test/jabberwocky.txt" } 4oCZVHdhcyBicmlsbGlnLCBhbmQgdGhlIHNsaXRoeSB0b3ZlcwogICAgICBEaWQgZ3lyZSBhbmQg Z2ltYmxlIGluIHRoZSB3YWJlOgpBbGwgbWltc3kgd2VyZSB0aGUgYm9yb2dvdmVzLAogICAgICBB bmQgdGhlIG1vbWUgcmF0aHMgb3V0Z3JhYmUuCgrigJxCZXdhcmUgdGhlIEphYmJlcndvY2ssIG15 IHNvbiEKICAgICAgVGhlIGphd3MgdGhhdCBiaXRlLCB0aGUgY2xhd3MgdGhhdCBjYXRjaCEKQmV3 YXJlIHRoZSBKdWJqdWIgYmlyZCwgYW5kIHNodW4KICAgICAgVGhlIGZydW1pb3VzIEJhbmRlcnNu YXRjaCHigJ0KCkhlIHRvb2sgaGlzIHZvcnBhbCBzd29yZCBpbiBoYW5kOwogICAgICBMb25nIHRp bWUgdGhlIG1hbnhvbWUgZm9lIGhlIHNvdWdodOKAlApTbyByZXN0ZWQgaGUgYnkgdGhlIFR1bXR1 bSB0cmVlCiAgICAgIEFuZCBzdG9vZCBhd2hpbGUgaW4gdGhvdWdodC4KCkFuZCwgYXMgaW4gdWZm aXNoIHRob3VnaHQgaGUgc3Rvb2QsCiAgICAgIFRoZSBKYWJiZXJ3b2NrLCB3aXRoIGV5ZXMgb2Yg ZmxhbWUsCkNhbWUgd2hpZmZsaW5nIHRocm91Z2ggdGhlIHR1bGdleSB3b29kLAogICAgICBBbmQg YnVyYmxlZCBhcyBpdCBjYW1lIQo=
The body is a block of Base64-encoded data.
The body follows the head,
with optional white space between the two.
In this example,
the body is the first four verses of
the Lewis Carroll poem Jabberwocky,
which the upload
command is
asking the server to save in a file
named by the path
property.
The message formatting shown here is illustrative only. The JSON object can be formatted in any style, and the Base64 block can have any amount of leading, trailing, or interior white space.
The commands are listed later, and each one is explained in detail.
As with a request,
a response consists of a mandatory
head and an optional body.
The head is a plaintext JSON
object that has a mandatory
property named status
.
The value of this property is a
string that is "Success"
after a successful request or a
short error message after a failed request.
The name of a file or directory is called a path. A path always appears as the value of a JSON property and is always a non-empty string. Here are some examples of paths:
"/" "/README.txt" "/rabbit.jpg" "/data" "/data/annual" "/data/annual/2024" "/data/annual/2024/Final Summary.csv"
A path begins with a slash and is followed by zero or more components separated by slashes. Each component of a path is a string of one or more characters, with all characters allowed except for the slash and the NUL character. White space before and after a path is ignored, but white space within a path is taken verbatim as part of the path. Letter case is significant.
All paths are absolute. Relative paths are not supported. There is no concept of a current directory or a working directory. Wildcards are not supported.
All path components are treated literally
as the names of files or directories.
Path components that have special meanings
in other contexts, such as
"~"
, "."
and ".."
,
do not receive special treatment.
The only way to traverse the file
hierarchy is to descend by one level
for each component of the path.
This table provides an exhaustive list of the commands provided by version 1 of the protocol. Each command is explained in detail in a section of its own below.
Command | Purpose | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hello
| Causes the server to describe itself. | ||||||||||||
list
| Lists the contents of a directory or describes a single file. | ||||||||||||
download
|
The protocol defines four levels, with each level expanding upon the capabilities of the level beneath it.
Level zero. The client or server provides no meaningful services at all, just the following command:
hello
Level one. The client or server is capable of listing directories and downloading files. To be level-one compliant, the client or server must implement at least the following commands:
hello
list
download
Level two. The client or server also supports file upload. To be level-two compliant, the client or server must implement all of the commands from level one plus at least the following command:
upload
Level three. The client or server also supports the management of files and directories. To be level-three compliant, the client or server must implement all of the commands from levels one and two plus all of the following commands:
rename
move
delete
mkdir
rmdir
When asked to list files, the server includes a timestamp as part of the description of each file or directory that is listed. Timestamps are in UTC. The server acts as the sole timekeeper. The client is never asked to supply a timestamp. This means that timestamps are never converted from local times to UTC, and there is no need to store zone information with the timestamps. The client can convert UTC times to local times for local display purposes, using the local timezone offset that was in effect at the time represented by the timestamp.
The timestamp of a regular file is the date and time at which the content of the file was created or was most recently modified. The timestamp does not change if the file is renamed, or if the file is moved to another position in the hierarchy.
The timestamp of a directory is the date and time at which any file or subdirectory directly beneath it was created, deleted, or renamed.
Note that files and directories can be manipulated by the server operator without necessarily going through the protocol, and that these manipulations can cause timestamps to change.
The first step in processing a request is to read the head, which is a JSON object in plain text. The JSON is parsed, and if there are any lexical or syntactic errors in the JSON object, the server must respond with this status:
Malformed request head
If the JSON object is parsed successfully, then analysis proceeds to check the structure and content of the head object.
Every request must include in its
head a property named command
that specifies the action that the
client wants the server to perform.
The server processes this property as follows:
command
then the
server must stop processing the request
and respond with this status:
Missing command
command
property is not a string then the server
must stop processing the request and
respond with this status:
Malformed command
command
property is allowed to arrive with
leading and/or trailing white space.
If any is present then remove it now.
"hello" "list" "download" "upload" "rename" "move" "delete" "mkdir" "rmdir"If the value is not one of these strings then the server must stop processing the request and respond with the following status:
No such command
A request may include in its head an
optional property named version
to specify the protocol version that
the client wishes to use.
Most requests include this property,
as it is required by every command other
than hello
.
For a command that requires this property,
the server processes the property as follows:
version
then the
server must stop processing the request
and respond with this status:
Missing protocol version
version
property is not a number,
or if the value is a number that has a
non-zero fraction,
or if the value is an integer that
is less than 1,
then the server must stop processing the
request and respond with this status:
Malformed protocol version
version
property,
including the case that the version
number is higher than any existing
version of the protocol,
then the server must stop processing the
request and respond with this status:
Unsupported protocol version
A request may include in its head an
optional property named accessKey
.
The presence of this property indicates
that the client is attempting to
access the private files,
while the absence of this property
indicates that the client is attempting
to access the public files.
The server processes this property as follows:
accessKey
property is
necessarily an error,
so this step is merely one of discovery.
accessKey
property is
absent but the server does not allow
public access then the server must stop
processing the request and respond with this status:
No public access
accessKey
property is
absent and the server allows public access
then no further processing of this property
is required and the remaining steps below
are not performed.
accessKey
property is present.
If the server does not allow private access
then the server must stop processing the
request and respond with this status:
No private access
accessKey
property is not a string then the server
must stop processing the request and
respond with this status:
Malformed access key
accessKey
property retains all white space verbatim.
accessKey
property is an empty string then the
server must stop processing the request
and respond with this status:
Malformed access key
accessKey
property is not known to the server then
the server must stop processing the request
and respond with this status:
Access key unknown
Access key rejected
A request may include in its head the
path
property to specify
the location of a file or directory
if required by the command.
While each command may apply specific
additional processing to the path property,
all of them begin processing the
property as follows:
path
property is not
present then the server must stop processing
the request and respond with this status:
Missing path
path
property is not a string then the server
must stop processing the request and
respond with this status:
Malformed path
path
property is allowed to arrive with
leading and/or trailing white space.
If any is present then remove it now.
path
property is an empty string,
or if it does not begin with a slash,
or if it contains two or more adjacent slashes,
or if it ends with slash,
then the server must stop processing the
request and respond with this status:
Malformed path
Every server must implement at least
the hello
command.
This command is versionless,
as it must be possible to query the
capabilities of the server regardless
of which versions of the protocol
are supported by the client or the server.
Therefore, the version
property,
if supplied in the request head,
is ignored.
The head for this command is normally just this:
{ "command": "hello" }
All other properties included in the
head are ignored.
Here is a typical response to the
hello
command:
{ "status": "Success", "operator": "National Bureau of Climate Studies", "description": "This is the public server for climate data sets provided by the National Bureau of Climate Studies. Access is free for all users. See file /README.txt for more information.", "public" : 1, "private" : 0, "versions" : [1, 2, 3] }
The status of the response is always
"Success"
,
as the hello
command
should never fail.
All of the properties shown here
are always present,
but the server can produce additional
properties if it wants to.
The operator
and
description
properties
are self explanatory;
their values are strings,
and either or both of them can be an
empty string or null
if
the server does not wish to describe itself.
The public
property tells
the client whether or not the server
provides public access,
and, if it does,
at what level of the protocol.
This server provides public access at level 1,
meaning that it is restricted to listing
and downloading files,
which is typical of public servers.
The private
property has the
same purpose but for private access;
this server offers no private access,
which is the meaning of level 0.
The versions
property is a
non-empty array of positive integers
specifying the protocol versions that
the server is able to speak.
When requesting any command other
than hello
,
the client must specify one of these
numbers in the version
property of the request head.
The versions are always listed in ascending order.
The list
command is used
to list the contents of a directory
and to describe single files and directories.
For example:
{ "version" : 1, "command" : "list", "path" : "/assets/images" }
Supposing that the directory contains two files and one directory, the response might be:
{ "status": "Success", "list": [ { "type": "file", "name": "bunny.jpg", "size": 40439, "time": "2024-11-03T18:04:33Z" }, { "type": "file", "name": "logo-64x64.png", "size": 8304, "time": "2023-06-29T06:22:58Z" }, { "type": "directory", "name": "old", "size": 23, "time": "2024-11-03T18:04:33Z" } ] }
The list
property is an
array containing one element for each
item in the directory.
Each element of the array is an object
that describes a file or a directory.
Each of these objects has the same four properties:
Property | Value |
---|---|
type
|
The type of the item,
which is "file" for a
regular file or "directory"
for a directory.
|
name
| The name of the file or directory. Only the last component of the path is given, to avoid needless repetition. |
size
| The size of the file in bytes, or, in the case of a directory, the number of items in the directory. |
time
| The time that the item or directory was created or most recently modified, UTC, in ISO 8601 format. |
To obtain a description of a single file,
or to obtain a description of a directory
itself without listing its contents,
include the property self
with value true
.
The most basic function of the server
is to download a file to the client.
The command to do this is named
download
and it accepts
a file path as an argument:
{ "version": 1, "command": "download", "path": "/assets/images/bunny.jpg" }
Assuming that the file exists, the answer is:
{ "status": "Success", "time": "2024-11-03T18:04:33Z", "size": 40439, "hash": "6d4c3bdf350ef06c60369425890a8c03da0c0e11beb8519a50206d1216116b1f" } /9j/4AAQSkZJRgABAAEBLAEsAAD//gAfTEVBRCBUZWNobm9sb2dpZXMgSW5jLiBW MS4wMQD/2wCEAAkGBwgHBgkIBwgKCgkLDhgPDg0NDh0VFhEYIh4kJCIeISEmKzcu Jig0KSEhMEEwNDk6PT49JS5DSEM8SDc8PTsBCgoKDgwOHA8PHDsnISc7Ozs7Ozs7 Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7O//EAaIA AAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKCwEAAwEBAQEBAQEBAQAAAAAAAAEC AwQFBgcICQoLEAACAQMDAgQDBQUEBAAAAX0BAgMABBEFEiExQQYTUWEHInEUMoGR oQgjQrHBFVLR8CQzYnKCCQoWFxgZGiUmJygpKjQ1Njc4OTpDREVGR0hJSlNUVVZX WFlaY2RlZmdoaWpzdHV2d3h5eoOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0 tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4eLj5OXm5+jp6vHy8/T19vf4+foRAAIB AgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy 0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpz dHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXG x8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29/j5+v/AABEIAYMB9AMBEQACEQED EQH/2gAMAwEAAhEDEQA/AOSslH2dWbqe9cUt7m6JUTAarg7gx7R4K4/ClFdQZKse 0hG64zxWbd3oaJDGck7B2rRLXUzGAEtzwKHYZcjjQLgVmOxZkJjiGPTFNO7JZWa2 AiBximpe8UtB2wmNIwelPm1E0PKlBgnOKce4upIMG2yDzXPe8insQIMxk+tdCegh YwFwAMVMii0kQLBzycVjK7Vh2AKLmJscelC90TI3gMaoqn7w5ojK+5d9CFUMcorb Sxmx6cuFI4JpTKiSrCsZKqPes73Vi0QsSXOOp7VcdEQ3qPCkSIn40PUOpO6lSvbn mlHYGPRcSDHehMlodgRlvQnNTNFIVEV+TUlFiPaygDpTYo6ERjUSkY6VS0C+gGMl xjqTzS0uGxeljVFAHAxUK9wuVol3Ng076Da1I2hAAPfdinFg3oMnjIAx1qUirksa Fk4Harv3ItYibO/Z3olsOC1JIwGjKfnRF6CnZiF/KVY1HApczuPl0I5FbdmtN0Qt ⋮
The head contains the timestamp, the size of the file in bytes, and the SHA-256 digest of the file in hexadecimal. The content of the file follows in Base64.
If you are downloading a large file,
you may prefer to obtain it in chunks,
making one request for each chunk.
This can be done for any file by
including the offset
and length properties in the request head.
For example, suppose we have a media file that is 5,307,294,188 bytes in size, or around 5.3 GB, and that the file is to be downloaded is chunks of 1 MiB at a time. The first request is:
{ "version" : 1, "command" : "download", "path" : "/movies/plan-9-from-outer-space.mpg", "offset" : 0, "length" : 1048576 }
The offset
property is 0
to indicate the start of the file,
and the length property is 1048576 to
indicate that the chunk should be
1 MiB in size.
For the second chunk,
add the length to the offset,
and you have:
"offset": 1048576, "length": 1048576
About halfway through the movie there will be a request like this:
"offset": 2462056448, "length": 1048576
The request for the final chunk would be:
"offset": 5306843136, "length": 1048576
Each response would give you another 1 MiB of the file, until the final chunk, which will likely be undersized, and could have a size of zero:
"size": 541052,
If the size comes back smaller than the requested length, the end of the file has been reached. If it happened that the size of the file was an exact multiple of 1 MiB, then the final request would have returned a zero size.
TBD. The list Command PURPOSE Lists the contents of a directory, returning the names, types, sizes, and timestamps of the files and subdirectories located within the directory whose path is specified. PROTOCOL LEVEL Level one. HEAD PROPERTIES version Number. Required. command String. Required. Must be "list". accessKey String. Optional. path String. Required. self Boolean. Optional. Defaults to false. BEHAVIOR See section 1.12, Parsing the head. See section 1.13, Processing the version property. See section 1.14, Processing the command property. See section 1.15, Processing the accessKey property. See section 1.20, Processing the path property for an existing directory. See section 1.31, Processing the self property. Determine which files and directories to include in the list. If the path property identifies a regular file, then only the named file is listed. If the path property identifies a directory and the self property is present and has value true, then only the named directory is listed (and not its contents). If the path property identifies a directory and the self property is absent or is present and has value false, then each item within the named directory is listed. Obtain the file and directory metadata.