Three Tricks to Understanding your DynamoDB Table
Published over 4 years ago
- Your DynamoDB table contains one type of item: relationships between pairs of objects.
- Your objects don't "have" metadata. They are related to the concept of metadata, and that relationship has metadata.
- In DynamoDB, PK stands for "partition key", not "primary key". PK identifies a database partition, not an individual item in your database.
That's it! Those are the tricks.
Need more context? Sounds good, I got you.
Your app's data is a graph
When I say "graph", I'm not talking about line charts and bar charts—I'm referring to the concept from mathematics.
Wikipedia's definition, paraphrased here, is a good one: "A graph is a a set of nodes, and a set of relationships between nodes."
Let's unpack that a little more by defining a few key terms:
- A set is a list of unique items. No duplicates allowed!
- A node, at least in the context of a web app that would use DynamoDB as its database, is a JavaScript object.
- A relationship is a pair of nodes that are connected in some way inside your app.
Applying graph theory to your web app lends us an interesting perspective on our data.
Each person that uses your app is a node. Each object that they build in your app is also a node. For example:
- If you're building a to-do app, each to-do is a node.
- If you're building a content management system where people can write and tag their blog posts, each post is a node, and each tag is a node.
- If you're building something even more complex, like a Figma-esque design tool, your nodes include frames, shapes, points inside a shape, control points that adjust line curvature, and tons of other things.
Relationships exist between people and the things they create in your app. Also, relationships often exist between two things that a person creates. For example:
- In your to-do app, a user has a relationship with each to-do they have created
- In your content management system, a user has a relationship with each blog post they have created. Each blog post has a relationship with each tag that was added to it.
- In your design tool, frames have a relationship with each shape inside them, shapes have a relationship with each point that defines them, and points have a relationship with each control point that adjusts the curvature of any lines extending from that point.
You can visualize a graph by drawing a circle for each node and a line for each relationship:
Your DynamoDB table is a list of relationships in your app's graph
People coming from SQL backgrounds are accustomed to building one table for each type of node in their app's graph. When they need to define a relationship between two nodes, they add a new property to a node, and fill it in with a unique ID of another object in another table.
Each DynamoDB database, on the other hand, only gives you one table. Alex DeBrie, when describing DynamoDB's single-table concept, said that this constraint is "the biggest hurdle for people to overcome."
For me, that was absolutely true. How can you store users in the same table as to-dos? How can you store blog posts in the same table as tags, and users?! It's insane! It doesn't make sense!
And finally, I realized my error: coming from a SQL background, I was taught—albeit in different words—that tables are lists of nodes in your app's graph. Nodes are separated into different tables based on their type.
Got a bunch of users? Stick them in a "users" table. Got to-dos? "To-dos" table. Blog posts, tags, shapes, vector points? Table, table, table, table.
In DynamoDB, you don't have a table for each type of node. You have one table, listing all the relationships between nodes.
In other words, the "one item type per table" concept does not change when you migrate from SQL to DynamoDB's flavor of NoSQL. What changes is that your single table's item type is now "Relationships between pairs of objects", rather than "Users", "Blog Posts" or any other type of object.
The "one item type per table" concept does not change when you migrate from SQL to DynamoDB's flavor of NoSQL. What changes is that your single table's item type is now "Relationships between pairs of objects", rather than "Users", "Blog Posts" or any other type of object.
I'm not just spitballing here, either. DynamoDB expert Rick Houlihan uses the same analogy when teaching DynamoDB 🎯
A simplified version of DynamoDB's relationship table for a to-do app looks like this:
Each relationship is actually an object with at least two keys, most commonly named "pk" and "sk" in DynamoDB.
The pk
key's value is the unique identifier of the node on one end of the relationship, and the sk
key's value is the unique identifier for the node on the other end of the relationship.
Note that pk and sk do have other purposes within DynamoDB—the choices you make when assigning node IDs to each key will have side effects on the efficiency of your database queries.
That's a huge topic, and outside the scope of this article, but for more info, I recommend two additional resources from Alex DeBrie:
- Conference talk: Data modeling with Amazon DynamoDB
- The DynamoDB Book
Also note that to-do IDs are unique among all to-dos, not just the to-dos for a particular user. This may not be necessary for apps with less complex data relationships, but is definitely recommended for anything beyond the complexity of a to-do app where each to-do can only be accessed by its creator.
So...where do I put my user's profile data?
Ahh, the elephant in the room.
In SQL, your user's profile data goes in the user table, along with that user's primary key. In DynamoDB, you don't have a user table; you only have a relationship table.
The solution here is three-fold:
- Think of your user, or any other object with a profile or metadata, as just their ID, and nothing more. Your objects are just unique IDs. They don't "have" metadata.
- Think of the abstract concept of "metadata" as a distinct node in your app's graph.
- Connect objects to the "metadata" node, and store your metadata in that relationship.
Okay, things are starting to get weird, so let's go back to the table:
{ name: 'Alex' }
{ name: 'Barack' }
{ title: 'Write DynamoDB article' }
{ title: 'Tweet storm' }
In graph terms, this technique explicitly defines METADATA as a separate node in our app, and stores the actual metadata we care about in the relationship between any given object and that METADATA node.
By thinking of METADATA as a distinct node in our app's graph, we get to retain the SQL-friendly mental model of "This table is just a list of relationships", and we can still easily and effectively create storage spaces for all of our objects' metadata.
PK stands for "partition key", not "primary key"
In SQL, the acronym "PK" pretty much always stands for "Primary key", and the PK key of an object holds that object's unique identifier.
In DynamoDB, "PK" is an acronym for "partition key". I won't get into the weeds of what a partition key is and what purpose it serves—I recommend checking out Alex DeBrie's conference talk, linked above, for more info on that.
The actual primary key/unique identifier for each relationship in your table is a concatenation of the partition key and the sk
, which stands for "sort key".
Putting all of these concepts together, it starts to make sense: in a list of relationships, where only one relationship can exist between any two objects, how do you identify a unique relationship? Well, you just combine the unique IDs of the two related objects. That specific combination of IDs is guaranteed to be unique.
In DynamoDB, the concatenated pk
and sk
for each relationship is the "Primary key" that you're accustomed to seeing in SQL databases.
It's unfortunate that "primary" and "partition" start with the same letter, since it screws up your "PK = unique identifier" heuristic, making it a little more difficult to get comfortable in DynamoDB.
The upside is that the actual name of pk
in your database is fully customizable, so you're free to avoid confusion by, for example, thinking of partitions as "Groups" of data, and renaming pk
to gk
.
DynamoDB ❤️ relational data
To close out this article, I just want to drive home that NoSQL, including DynamoDB's flavor of it, is excellent for highly relational data.
I'd even argue that the term "relational database"—used almost exclusively to describe multi-table SQL databases—is much better suited for DynamoDB, which is is literally a database of relationships!
Rick Houlihan says it best in his 2019 talk about DynamoDB:
"[Non-relational data] doesn't exist. It's a marketing term that was invented by marketing people who tried to describe a technology they don't understand to other people who don't understand it, and they came up with 'It's not relational, it must be non-relational.' Don't use that term; don't fall in that trap. Data is all relational, or it doesn't matter."
Damn, I love that quote!