How to design data models in Python
Published: January 17, 2023
This article will show you how to write data models in Python 3.7+. A data model is a
class/enum that defines the structure of your application's data. For example, when an application requires
a sign-up, it's going to store data like first name, last name, email, password
. As a
software engineer, you would probably choose to create a User
model with such fields.
Let's take a look at how a simple User
model might look in Python.
Simple User Model
class User:
first_name: str = ''
last_name: str = ''
email: str = ''
password: str = ''
As you can see, it's a simple class that encapsulates a bunch of properties. In a professional setting though, we expect our models to be responsible for the following:
- Defining the structure of the data
- Validating data, i.e., check if the email is valid
- JSON Parsing to communicate with other parts of our or third-party application
- Store data (if needed)
Typing Module
We use the typing package throughout this article. It basically lets us define the data type of each property, function param/return type, etc. The typing module makes our code better defined and more predictable.
Initialization
While our User
model looks sleek, it's a bit hard to work with. Why don't we add a
constructor.
class User:
first_name: str = ''
last_name: str = ''
email: str = ''
password: str = ''
def __init__(self, **kwargs) -> None:
self.first_name = kwargs['first_name'] if 'first_name' in kwargs else ''
self.last_name = kwargs['last_name'] if 'last_name' in kwargs else ''
self.email = kwargs['email'] if 'email' in kwargs else ''
self.password = kwargs['password'] if 'password' in kwargs else ''
Passing kwargs
is a common Python technique that gives us a lot of flexibility.
user = User(email="john@doe.com", password="unsafepass")
JSON Parsing
What if we want to send the model data outside the app? Imagine the app we're writing is the backend,
and our colleague who works on a mobile needs to communicate the User
data with us.
One of the most popular formats to communicate via HTTPS is JSON. The serialization from/to JSON should be implemented in the model itself. We'll use the json module which basically converts a python dictionary to a JSON string and vice versa.
Convert an object to a dictionary
There's a special __dict__
type that converts all attributes of the given object into a dict. Knowing
that, we can just implement a simple to_dict
method.
class User:
first_name: str = ''
last_name: str = ''
email: str = ''
password: str = ''
def __init__(self, **kwargs) -> None:
self.first_name = kwargs['first_name'] if 'first_name' in kwargs else ''
self.last_name = kwargs['last_name'] if 'last_name' in kwargs else ''
self.email = kwargs['email'] if 'email' in kwargs else ''
self.password = kwargs['password'] if 'password' in kwargs else ''
def to_dict(self) -> dict:
return self.__dict__
When can we not use __dict__?
Did you think it was too good to be true? You were right! There's a catch. __dict__
won't
work correctly if one of your attributes is another custom object. Let's add a Dog
class
and reference it in the User
model.
class Dog:
name: str = ''
age: int = 0
# ... (skipping the init and to_dict functions)
class User:
first_name: str = ''
last_name: str = ''
email: str = ''
password: str = ''
dog: Dog = None # Our new property of the 'Dog' type
# ... (skipping the init and to_dict functions)
# Let's create an instance of User now
u = User(first_name="John", dog=Dog(name="Milo", age=4))
user_as_dict = u.to_dict()
json.dumps(user_as_dict) # fails, because the dog property is still a type of Dog - it wasn't converted to a dictionary
print(user_as_dict) # print it out to see the invalid dictionary structure
How do we fix it?
We simply have to write out the conversion code line by line. Let's take a look
at to_dict
method for both Dog
and User
.
class Dog:
name: str = ''
age: int = 0
# ... (skipping the init function)
def to_dict(self) -> dict:
result: dict = {}
result['name'] = self.name
result['age'] = self.age
return result
class User:
first_name: str = ''
last_name: str = ''
email: str = ''
password: str = ''
dog: Dog = None
# ... (skipping the init function)
def to_dict(self) -> dict:
result: dict = {}
result['first_name'] = self.first_name
result['last_name'] = self.last_name
result['email'] = self.email
result['password'] = self.password
result['dog'] = self.dog.to_dict() # Notice we're calling Dog.to_dict() here
return result
# Let's create an instance of User again
u = User(first_name="John", dog=Dog(name="Milo", age=4))
user_as_dict = u.to_dict()
json.dumps(user_as_dict) # success!
print(user_as_dict) # notice even the Dog object becomes a dictionary now
Convert a dictionary to an object
There are no shortcuts in the opposite direction. We'll have to write out the conversion from dict to a
new instance of User
by hand.
class Dog:
name: str = ''
age: int = 0
# ... (skipping init and to_dict functions)
@staticmethod
def from_dict(obj: dict) -> 'Dog':
model = Dog()
model.name = obj['name'] if 'name' in obj and obj['name'] else ''
model.age = obj['age'] if 'age' in obj and obj['age'] else 0
return model
class User:
first_name: str = ''
last_name: str = ''
email: str = ''
password: str = ''
dog: Dog = None
# ... (skipping init and to_dict functions)
@staticmethod
def from_dict(obj: dict) -> 'User':
model = User()
model.first_name = obj['first_name'] if 'first_name' in obj and obj['first_name'] else ''
model.last_name = obj['last_name'] if 'last_name' in obj and obj['last_name'] else ''
model.email = obj['email'] if 'email' in obj and obj['email'] else ''
model.password = obj['password'] if 'password' in obj and obj['password'] else ''
model.dog = Dog.from_dict(obj['dog']) if 'dog' in obj and obj['dog'] else None # Notice we call Dog.from_dict() here
return model
We'd like to highlight a couple of things in the from_dict
method:
- It's a static method
- We check whether each key is present in the dictionary. This strategy makes parsing all
properties optional. You could change the method's return type to
Optional['User']
and returnNone
when parsing fails, or you could throw an error in such a case.
Validation
Reusing our User
model, what would we want to validate there? We should
actually validate every property, but let's start with the first_name
.
We like to tackle the validation using two methods for each property.
is_valid
- this method would returnbool
based on the validation result.validate
- this method would return an error message if the validation fails.
So let's validate the first_name
property.
import re
from typing import Optional
class User:
first_name: str = ''
last_name: str = ''
email: str = ''
password: str = ''
dog: Dog = None
# ... (skipping methods defined above)
def is_first_name_valid(self) -> bool:
return bool(self.first_name and re.match(r"[a-zA-Z]{1,30}", self.first_name))
def validate_first_name(self) -> Optional[str]:
return None if self.is_first_name_valid() is None else "First name is not valid"
We decided to use a simple (way too restricting for production) regular expression to allow only alphabetic characters between 1 and 30 characters long.
Static validation methods
Depending on your project requirements, you might want to consider putting the validation logic
into a static method and letting the instance methods piggyback on it. The advantage would be that you could validate data
without creating a User
instance, i.e., User.is_first_name_valid("John")