Hosting PostgreSQL on a t4g Graviton2 ARM Instance on AWS EC2

January 12, 2020

Why EC2 and not RDS?

The first thing you might be wondering is why would you want your database in an EC2 instance, when you have a managed service (RDS) where you don’t have to bother with maintaining your server, updating it and you can have automated daily backups with one click. The main reason is COST, this option is roughly cost half of the RDS price. I’m also using one of the new ARM instances, a similar technology to the new Apple ARM CPUs but running on the server. So if you’re running a side project, and you only need a simple single host setup, I think this approach is a really good solution, using a t4g.micro EC2 instance cost about $6 per month.

1. Launch the EC2 instance

Log in to the AWS EC2 console and provision a Linux server to host the DB, make sure to select the ARM architecture. I’ll be using Amazon Linux 2 ARM.

Chose AMI

For instance type, I’ll be using t4g.micro which is free while it’s not generally available until the end of March 2021.

Select instance type

Next, I’ll configure the security group to allow SSH access, and PostgreSQL access on port 5432. Depending if you are planning to host your application code in this same server or a different one, you will need to open this port or not. But if you’re going to have another EC2 instance with your application code, then I’d suggest opening your DB port (5432) only to your “Web Security Group”. To connect to the DB server from my local laptop if I needed to, I use SSH, TablePlus easily allows connecting to your DB via SSH.

Select security group

Once your setup is ready, launch your instance, wait until it is in the running state, so we can connect via SSH. Make sure to move your .pem file for SSH authentication to ~/.ssh and assign the right permissions running the following command in your terminal:

$ chmod 0400 ~/.ssh/your-server-identity.pem

Instance summary

Click on the “Connect” button to get the SSH command details so we can connect to the instance and set the database up.

Connect to the instance via SSH

2. Install PostgreSQL

Now that our server is set up and running, connect using your SSH key.

SSH connection

After a successful connection, run these commands in the server to install and configure PostgreSQL Database.

# Enable the repository to install PostgreSQL
$ sudo yum update
$ sudo amazon-linux-extras enable postgresql11

# Install PostgreSQL server and initialize the database 
# cluster for this server
$ sudo yum install postgresql-server postgresql-devel
$ sudo /usr/bin/postgresql-setup --initdb

# Start the db service
$ sudo systemctl enable postgresql
$ sudo systemctl start postgresql

# To check if the postgres service is running or not
$ sudo systemctl status postgresql

To be able to connect to PostgreSQL, we still have to configure authentication. Client authentication is controlled by a configuration file, which traditionally is named pg_hba.conf and is stored in the database cluster’s data directory. In this specific version of Linux, the file that we need to edit is /var/lib/pgsql/data/pg_hba.conf. I have edited mine as you can see in the screenshot, I’m not using custom username/passwords, but you can tweak yours as you need. Bear in mind that by default, PostgreSQL is only accepting connections from localhost at the moment.

Authentication

As I mentioned, at the moment we can only connect to our DB server from localhost, so if we want to run our application code in a different server, we need to configure the server to listen outside TCP connections. In this specific Linux server, we need to edit /var/lib/pgsql/data/postgresql.conf. In the screenshot above, you can see that I added a specific line allowing authentication from another EC2 Instance in AWS, it’s not a good idea to leave permissions open to the public internet for security reasons.

Configuration

If we save the changes to the previous 2 files, we can start the PostgreSQL server, and verify that’s running:

$ sudo service postgresql start 
$ sudo service postgresql status

Start server

You can also manage the DB server from the command line:

$ psql -h localhost -U postgres

SQL command line

As I mentioned, I use Table Plus as UI for my databases, and I can connect using SSH to the EC2 instance, and then use localhost to connect to the server. But you can use any other UI of your preference as well.

TablePlus Connection Details

After you connect, you can select the DB that you need, create new ones, etc.

TablePlus DBs

3. Backup strategy for our instance

Now that we have our DB server up and running, it’s a good idea to have a backup in place in case something goes wrong and we need to restore it.

3.1 Backup the whole EBS Volume - EC2 Instance Hard Drive

I’d like to mention that this approach would have a few seconds of downtime, depending on the type of system that you have this might be acceptable, otherwise I’d go with the option I explain in section 3.2.

To setup the backup, go to EC2 > Elastic Block Store > Lifecycle Manager, and click on ‘Create a Lifecycle Policy’

Backup step 1

Give the policy a name, and I’m using a Tag to select the Instance that I want to backup using this policy.

Backup step 2

Define the schedule and retention for your backups.

Backup step 3

Once you save the policy your backups will be created on the schedule that you’ve defined. It’s really easy!

Backup step 4

Keep in mind that using this method you’re backing up the whole server, so if you need to restore, you just need to launch a new instance from one of your backups.

3.2 Run a backup script as CRON and save the output to S3

We can also create a script, and run it scheduled using Cron in the system, with the following steps:

  1. Database backup with pg_dump command.
  2. Compression with BZip2.
  3. Encrypt the file.
  4. Upload it to S3.
#!/bin/bash
export DUMP_FILE=/backup_`date +%Y%m%d_%H%M%S`.pgdump
PGPASSWORD=$POSTGRES_PASSWORD pg_dump -d $POSTGRES_DB -U $POSTGRES_USER -h $POSTGRES_HOST -f $DUMP_FILE
bzip2 $DUMP_FILE
mcrypt ${DUMP_FILE}.bz2 -k $DB_BACKUP_PASSWORD
aws s3 cp ${DUMP_FILE}.bz2.nc $S3_BACKUP_PATH

Bear in mind that mcrypt is not installed, but you can simply run:

$ sudo amazon-linux-extras install epel
$ sudo yum install mcrypt

Once you’ve setup the script and install the dependencies, you can just execute it manually to verify that it works, and setup the cron to run as often as you need.

4. Conclusion

As you can see this method is not as convenient as a managed RDS Database server, but it’s cheaper, and it’s also a good learning experience. In my next post I show you how you can automate this process using Terraform, so it’s not a manual process anymore. I would recommend this method for a production system if you have a single database host and you don’t need failover or start complicating the architecture, where using RDS would be very beneficial as all the management would be done by AWS. Subscribe to my Newsletter if you want to be informed when my new content is available. Don’t hesitate to leave your comments or suggestions.


Software developer and consultant. I help companies build great products. I've worked with all kinds of companies. Contact me by email.

Get my new content delivered straight to your inbox. No spam, ever.