The BlobStore API is a portable means of managing key-value storage providers such as Microsoft Azure Blob Service, Amazon S3, or OpenStack Object Storage. It offers a synchronous API to your data.
Our APIs are dramatically simplified from the providers, yet still offer enough sophistication to perform most work in a portable manner.
Like other components in jclouds, you always have means to gain access to the provider-specific interface if you need functionality that is not available in our abstraction.
Our location API helps you to portably identify a container within context, such as "Americas" or "Europe".
We use the same model across the ComputeGuide which allows you to facilitate collocation of processing and data.
Using our BlobRequestSigner
, you can portably generate HTTP requests that can be passed to
external systems for execution or processing. Use cases include JavaScript client-side loading, and
curl based processing on the bash prompt. Be creative!
Our in-memory provider allows you to test your storage code without credentials or a credit card!
Our filesystem provider allows you to use the same API when persisting to disk, memory, or a
remote BlobStore
.
jclouds supports a wide range of blobstore providers that can be used equally in any BlobStore
.
Please refer to the Supported BlobStore Providers page for more information.
A blobstore is a key-value storage service, such as Amazon S3, where your account exists, and where you can create containers and blobs. A container is a namespace for your data, and you can have many of them. Inside your container, you store data as a blob referenced by a name. In all blobstores, the combination of your account, container, and blob relates directly to an HTTP URL.
Here are some key points about blobstores:
A container is a namespace for your objects.
Depending on the service, the scope can be global, region, account, or sub-account scoped. For example, in Amazon S3, containers are called buckets, and they must be uniquely named such that no-one else in the world conflicts.
Everything in a BlobStore is stored in a container, which is an HTTP accessible location (similar to a website) referenced by a URL.
For example, using Amazon S3, creating a container named jclouds
would be referenced as
https://jclouds.s3.amazonaws.com
. Storing a photo with the key mymug.jpg
, will be accessible
through https://jclouds.s3.amazonaws.com/mymug.jpg
In other blobstores, the naming convention of the container is less strict. All blobstores allow you to list your containers and also the contents within them. These contents can either be blobs, folders, or a virtual path.
A blob is unstructured data that is stored in a container.
Some blobstores refer to them as objects, blobs, or files. You access a blob in a container by a text key, which often relates directly to the HTTP URL used to manipulate it. Blobs can be zero length or larger, with some providers limiting blobs to a maximum size, and others not restricting at all.
Finally, blobs can have metadata in the form of text key-value pairs you can store alongside the data. When a blob is container in a folder, its name is either relative to that folder, or its full path.
A folder is a subcontainer and can contain blobs or other folders.
The names of items in a folder are basenames
. Blob names incorporate folders via a path separator
"/"
and is similar to accessing a file in a typical filesystem.
A virtual path can either be a marker file or a prefix.
In either case, they are purely used to give the appearance of a hierarchical structure in a flat blobstore. When you perform a list at a virtual path, the blob names returned are absolute paths.
By default, every item you put into a container is private, if you are interested in giving access to others, you will have to explicitly configure that. Exposing public containers is provider-specific.
Each blobstore has its own limitations. Please see the provider guides for blobstore-specific limitations and tips.
A connection to a BlobStore
in jclouds is called a BlobStoreContext
. It is thread-safe and
should be reused for multiple requests to the service.
A BlobStoreContext
associates an identity for a provider to a set of network connections.
At a minimum, you need to specify an identity and credential when creating a BlobStoreContext
.
In the case of Amazon S3, your identity is the Access Key ID and credential is the Secret
Access Key.
Once you have this information, connecting to your BlobStore
service is easy:
BlobStoreContext context = ContextBuilder.newBuilder("aws-s3")
.credentials(identity, credential)
.buildView(BlobStoreContext.class);
This will give you a connection to the blobstore, and if it is remote, it will be SSL unless unsupported by the provider. Everything you access from this context will use the same credentials.
When you are finished with a BlobStoreContext
, you should close it accordingly:
context.close();
There are many options available for creating a Context
. Please see the
ContextBuilder Javadocs for
a detailed description.
Here is an example of the synchronous BlobStore
interface:
// Initialize the BlobStoreContext
context = ContextBuilder.newBuilder("aws-s3")
.credentials(accesskeyid, secretaccesskey)
.buildView(BlobStoreContext.class);
// Access the BlobStore
blobStore = context.getBlobStore();
// Create a Container
blobStore.createContainerInLocation(null, "mycontainer");
// Create a Blob
ByteSource payload = ByteSource.wrap("blob-content".getBytes(Charsets.UTF_8));
blob = blobStore.blobBuilder("test") // you can use folders via blobBuilder(folderName + "/sushi.jpg")
.payload(payload)
.contentLength(payload.size())
.build();
// Upload the Blob
blobStore.putBlob(containerName, blob);
// Don't forget to close the context when you're done!
context.close()
If you don't already have a container, you will need to create one.
First, get a BlobStore
from your context:
BlobStore blobstore = context.getBlobStore();
Location is a region, provider, or another scope in which a container can be created to ensure data
locality. If you don't have a location concern, pass null
to accept the default.
boolean created = blobStore.createContainerInLocation(null, container);
if (created) {
// the container didn't exist, but does now
} else {
// the container already existed
}
Providers may implement multipart upload for large or very large files. Here's an example of multipart
upload, using aws-s3
provider, which allows uploading files as large as
5TB.
import static org.jclouds.blobstore.options.PutOptions.Builder.multipart;
// Initialize the BlobStoreContext
context = ContextBuilder.newBuilder("aws-s3")
.credentials(accesskeyid, secretaccesskey)
.buildView(BlobStoreContext.class);
// Access the BlobStore
BlobStore blobStore = context.getBlobStore();
// Create a Container
blobStore.createContainerInLocation(null, "mycontainer");
// Create a Blob
ByteSource payload = Files.asByteSource(new File(fileName));
Blob blob = blobStore.blobBuilder(objectName)
.payload(payload)
.contentDisposition(objectName)
.contentLength(payload.size())
.contentType(MediaType.OCTET_STREAM.toString())
.build();
// Upload the Blob
String eTag = blobStore.putBlob(containerName, blob, multipart());
// Don't forget to close the context when you're done!
context.close()
Please refer to the logging page for more information on how to configure logging in jclouds.
The above examples show how to use the BlobStore
API in Java. The same API can be used from Clojure!
lein new mygroup/myproject
In the myproject
directory, edit the project.clj
to include the following:
(defproject mygroup/myproject "1.0.0"
:description "FIXME: write description"
:dependencies [[org.clojure/clojure "1.3.0"]
[org.clojure/core.incubator "0.1.0"]
[org.clojure/tools.logging "0.2.3"]
[org.apache.jclouds/jclouds-allcompute "1.7.1"]]
:repositories {"apache-snapshots" "https://repository.apache.org/content/repositories/snapshots"})
Execute lein deps
to download the specified dependencies.
Execute lein repl
to get a repl, then paste the following or write your own code. Clearly, you
need to substitute your accounts and keys below.
(use 'org.jclouds.blobstore2)
(def *blobstore* (blobstore "azureblob" account encodedkey))
(create-container *blobstore* "mycontainer")
(put-blob *blobstore* "mycontainer" (blob "test" :payload "testdata"))
This section covers advanced topics typically needed by developers of clouds.
HttpRequest request = context.getSigner().signGetBlob("adriansmovies", "sushi.avi");
(let [request (sign-blob-request "adriansmovies" "sushi.avi" {:method :get})])
There are two multipart upload implementations of that jclouds employs for uploading objects to a BlobStore service. Amazon S3 and OpenStack Swift both support these strategies.
By default, jclouds uses a parallel upload strategy that will split an object up in to individual parts and upload them in parallel to the BlobStore. There are two configurable properties for this strategy:
jclouds.mpu.parallel.degree
the number of threads (default is 4)
jclouds.mpu.parts.size
the size of a part (default is 32MB)
Similar to the parallel strategy, the sequential strategy will split an object up into parts and upload them to the BlobStore sequentially.
A listing is a set of metadata about items in a container. It is normally associated with a single GET request against your container.
Large lists are those who exceed the default or maximum list size of the blob store. In S3, Azure, and Swift, this is 1000, 5000, and 10000 respectively. Upon hitting this threshold, you need to continue the list in another HTTP request.
For continued iteration of large lists, the BlobStore list()
API returns a PageSet
that allows
to access the next marker identifier. The getNextMarker()
method will either return the next
marker, or null
if the page size is less than the maximum.
The marker object can then be used as input to afterMarker
in the ListContainerOptions
class.
Marker files allow you to establish presence of directories in a flat key-value store. Azure, S3,
and OpenStack Swift all use pseudo-directories, but in a different ways. For example, some tools
look for a content type of application/directory
, while others look for naming patterns such as a
trailing slash /
or the suffix _$folder$
.
In jclouds, we attempt to detect whether a blob is pretending to be a directory, and if so, type it
as StorageType.RELATIVE_PATH
. Then, in a list()
command, it will appear as a normal directory.
The two strategies responsible for this are IfDirectoryReturnNameStrategy
and MkdirStrategy
.
The challenge with this approach is that there are multiple ways to suggest presence of a
directory. For example, it is entirely possible that both the trailing slash /
and _$folder$
suffixes exist. For this reason, a simple remove, or rmdir
will not work, as it may be the case that
there are multiple tokens relating to the same directory.
For this reason, we have a DeleteDirectoryStrategy
strategy. The default version of this used for
flat trees removes all known types of directory markers.
You may be using jclouds to upload some photos to the cloud, show thumbnails of them to the user via a website, and allow to download the original image.
When the user clicks on the thumbnail, a download dialog appears. To control the name of the file in the "Save As" dialog, you must set Content Disposition. Here's how you can do it with the BlobStore API:
ByteSource payload = Files.asByteSource(new File("sushi.jpg"));
Blob blob = context.getBlobStore().blobBuilder("sushi.jpg")
.payload(payload) // or InputStream
.contentDisposition("attachment; filename=sushi.jpg")
.contentMD5(payload.hash(Hashing.md5()).asBytes())
.contentLength(payload.size())
.contentType(MediaType.JPEG.toString())
.build();
All APIs, provider-specific or abstraction, must return null when an object is requested, but not found. Throwing exceptions is only appropriate when there is a state problem. For example, requesting an object from a container that does not exist is a state problem, and should throw an exception.
As long as you use either ByteSource
or File
as the payload for your blob, you should
be fine. Note that in S3, you must calculate the length ahead of time, since it doesn't support
chunked encoding.
It is usually better to use a repeatable payload like ByteSource
instead of InputStream
,
since this allows parallel uploads and retrying on errors.
Our integration tests ensure that we don't rebuffer in memory on upload: testUploadBigFile.
This is verified against all of our HTTP clients, conceding that it isn't going to help limited environments such as Google App Engine.
A blob you've downloaded via blobstore.getBlob()
can be accessed via
blob.getPayload().openStream()
. Since this is streaming, you shouldn't have a problem
with memory unless you rebuffer the payload.