Using Python, Django, and Boto3 with Scaleway Object Storage

This is post is an excerpt as part of my own journey in making NewShots, a not-so-simple news outlet screenshot capture site.

More specifically, this excerpt simply exists to help you understand how to use the popular boto3 library to work with Scaleway's Object Storage. Their aim is to offer an Amazon S3-compatible file/objects storage system.

Personal Opinion Warning

In my personal opinion (I am not paid or compensated in any way by Scaleway), it works well and as expected for my simple use cases of CRUD (create/retrieve/update/delete) on objects. The security model is simple and straight-forward. I find it much easier to work with versus AWS, but the tradeoff is probably with security and other big enterprisey features.

My original use case

I need to save images that come in over webhooks from the screenshot service.

  1. The screenshot service fires a webhook to the backend.
  2. I unwrap the JSON payload and look at the file-location of the image
  3. I pull that image down and immediately re-upload it into my own s3-compatible bucket.

The initial MVP saved them to disk, but my little VM ran out of space after only a few thousand hi-res screenshots started piling up. This was the first scaling issue I hit – disk space.

Let's get started!

  1. Assemble your configuration options
  2. Create the session object
  3. ready the content to upload/update
  4. Perform your operations
  5. Cleanup

Assemble your config options

I prefer to put access keys and config options into either settings.py or using django-solo. The former is out of scope for this document, but I like it.

This is a little over-simplified because I prefer to configer settings like this via environment variables  (again, out of scope for this post).

settings.py

AWS_ACCESS_KEY_ID = 'myaccessid'
AWS_SECRET_ACCESS_KEY = 'mysecretkey'
AWS_STORAGE_BUCKET_NAME = 'mybucket-2020'
AWS_DEFAULT_ACL = 'public-read'
AWS_S3_REGION_NAME = 'nl-ams'
AWS_S3_ENDPOINT_URL = 'https://s3.nl-ams.scw.cloud'

Let's break this down a little bit.

  • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY can be obtained from the credential control panel under  API TOKENS.
  • AWS_STORAGE_BUCKET_NAME is the name of the bucket you create on objects administration page
  • AWS_DEFAULT_ACL is set to public-read so that the objects can be pulled from a URL without any access keys or time-limited signatures.
  • AWS_S3_REGION_NAME and AWS_S3_ENDPOINT_URL should be configured so that boto3 knows to point to Scaleways resources. (We are not actually using AWS, afterall)

All of this is references in the Scaleways docs on Object Storage.

Creating a session object

Ok, now that we have our credentials and settings done, we are ready to access a session object that makes all the operations possible. In the following code we we simple import the settings module and the instantiate the client.

from django.conf import settings

s3 = boto3.client('s3',
                  region_name=settings.AWS_S3_REGION_NAME,
                  endpoint_url=settings.AWS_S3_ENDPOINT_URL,
                  aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
                  aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY
                  )

Before we dig deeper...

This is really where the tricky part ends. The rest is standard use of the Boto3 library, and I think you should look at the documentation here:

Create/ Upload an object

OK, we have the session object, now we can do something. Let's start with uploing

    file_bytes = requests.get(image_url).content
    s3_object_name = 'myobjectname.jpg'
    
    s3.put_object(
        Body=file_bytes,
        Bucket=f"{settings.AWS_STORAGE_BUCKET_NAME}",
        Key=s3_object_name,
        ACL='public-read',
        CacheControl='max-age=31556926' # 1 year
    )

I cheat here a bit as we will simple use the popular requests library to download an image from somewhere. the raw bytes are put into the file_bytes variable.

Now the good part – using the session object s3 , we put the object and name it using the s3_object_name variable. It is that easy.

Note – I am lazy here because there is no error handling. The operation can fail and leave our app in an unknown state (most probably a crash with unhandled exception)

Retrieve (Download) an Object

I prefer to grab the object directly using HTTPS, but you can do the same with the session object.

Or, you can refer to:

Delete an object

Deleting is pretty straght-forward. Using the same session object we can delete an object by passing in the Bucket and Key names.

 s3.delete_object(Bucket=settings.AWS_STORAGE_BUCKET_NAME, Key=s3_object_name) 

More Django-centric options

Wow, so you made it this far. Thanks. I should mention that using boto3 directly with Django works well. I have no complaints, but if you are looking tighter integration with Django, then you might want to consider django-storages. It offers convenient tie-ins with the way django saves files and works with models. One nice thing I like is that it will automatially delete objects when you delete the model.

If I make a Part II about my s3 journey with NewShots, it will be how I moved from boto3 to django-storages.

jschneier/django-storages
https://django-storages.readthedocs.io/. Contribute to jschneier/django-storages development by creating an account on GitHub.
Show Comments