OneGen Blog

Articles

How to design data models in Python

Published: January 17, 2023

This article will show you how to write data models in Python 3.7+. A data model is a class/enum that defines the structure of your application's data. For example, when an application requires a sign-up, it's going to store data like first name, last name, email, password. As a software engineer, you would probably choose to create a User model with such fields. Let's take a look at how a simple User model might look in Python.

Simple User Model

    class User:
    first_name: str = ''
    last_name: str = ''
    email: str = ''
    password: str = ''

As you can see, it's a simple class that encapsulates a bunch of properties. In a professional setting though, we expect our models to be responsible for the following:

  • Defining the structure of the data
  • Validating data, i.e., check if the email is valid
  • JSON Parsing to communicate with other parts of our or third-party application
  • Store data (if needed)

Typing Module

We use the typing package throughout this article. It basically lets us define the data type of each property, function param/return type, etc. The typing module makes our code better defined and more predictable.

Initialization

While our User model looks sleek, it's a bit hard to work with. Why don't we add a constructor.

    class User:
    first_name: str = ''
    last_name: str = ''
    email: str = ''
    password: str = ''

    def __init__(self, **kwargs) -> None:
        self.first_name = kwargs['first_name'] if 'first_name' in kwargs else ''
        self.last_name = kwargs['last_name'] if 'last_name' in kwargs else ''
        self.email = kwargs['email'] if 'email' in kwargs else ''
        self.password = kwargs['password'] if 'password' in kwargs else ''

Passing kwargs is a common Python technique that gives us a lot of flexibility.

    user = User(email="john@doe.com", password="unsafepass")

JSON Parsing

What if we want to send the model data outside the app? Imagine the app we're writing is the backend, and our colleague who works on a mobile needs to communicate the User data with us.

One of the most popular formats to communicate via HTTPS is JSON. The serialization from/to JSON should be implemented in the model itself. We'll use the json module which basically converts a python dictionary to a JSON string and vice versa.

Convert an object to a dictionary

There's a special __dict__ type that converts all attributes of the given object into a dict. Knowing that, we can just implement a simple to_dict method.

    class User:
    first_name: str = ''
    last_name: str = ''
    email: str = ''
    password: str = ''

    def __init__(self, **kwargs) -> None:
        self.first_name = kwargs['first_name'] if 'first_name' in kwargs else ''
        self.last_name = kwargs['last_name'] if 'last_name' in kwargs else ''
        self.email = kwargs['email'] if 'email' in kwargs else ''
        self.password = kwargs['password'] if 'password' in kwargs else ''

    def to_dict(self) -> dict:
        return self.__dict__

When can we not use __dict__?

Did you think it was too good to be true? You were right! There's a catch. __dict__ won't work correctly if one of your attributes is another custom object. Let's add a Dog class and reference it in the User model.

        class Dog:
    name: str = ''
    age: int = 0

    # ... (skipping the init and to_dict functions)


class User:
    first_name: str = ''
    last_name: str = ''
    email: str = ''
    password: str = ''
    dog: Dog = None # Our new property of the 'Dog' type

    # ... (skipping the init and to_dict functions)

# Let's create an instance of User now
u = User(first_name="John", dog=Dog(name="Milo", age=4))
user_as_dict = u.to_dict()
json.dumps(user_as_dict) # fails, because the dog property is still a type of Dog - it wasn't converted to a dictionary
print(user_as_dict) # print it out to see the invalid dictionary structure
    
How do we fix it?

We simply have to write out the conversion code line by line. Let's take a look at to_dict method for both Dog and User.

class Dog:
    name: str = ''
    age: int = 0

    # ... (skipping the init function)

    def to_dict(self) -> dict:
        result: dict = {}
        result['name'] = self.name
        result['age'] = self.age
        return result


class User:
    first_name: str = ''
    last_name: str = ''
    email: str = ''
    password: str = ''
    dog: Dog = None

    # ... (skipping the init function)

    def to_dict(self) -> dict:
        result: dict = {}
        result['first_name'] = self.first_name
        result['last_name'] = self.last_name
        result['email'] = self.email
        result['password'] = self.password
        result['dog'] = self.dog.to_dict()  # Notice we're calling Dog.to_dict() here
        return result


# Let's create an instance of User again
u = User(first_name="John", dog=Dog(name="Milo", age=4))
user_as_dict = u.to_dict()
json.dumps(user_as_dict)  # success!
print(user_as_dict)  # notice even the Dog object becomes a dictionary now

Convert a dictionary to an object

There are no shortcuts in the opposite direction. We'll have to write out the conversion from dict to a new instance of User by hand.

    class Dog:
    name: str = ''
    age: int = 0

    # ... (skipping init and to_dict functions)

    @staticmethod
    def from_dict(obj: dict) -> 'Dog':
        model = Dog()
        model.name = obj['name'] if 'name' in obj and obj['name'] else ''
        model.age = obj['age'] if 'age' in obj and obj['age'] else 0
        return model


class User:
    first_name: str = ''
    last_name: str = ''
    email: str = ''
    password: str = ''
    dog: Dog = None

    # ... (skipping init and to_dict functions)

    @staticmethod
    def from_dict(obj: dict) -> 'User':
        model = User()
        model.first_name = obj['first_name'] if 'first_name' in obj and obj['first_name'] else ''
        model.last_name = obj['last_name'] if 'last_name' in obj and obj['last_name'] else ''
        model.email = obj['email'] if 'email' in obj and obj['email'] else ''
        model.password = obj['password'] if 'password' in obj and obj['password'] else ''
        model.dog = Dog.from_dict(obj['dog']) if 'dog' in obj and obj['dog'] else None  # Notice we call Dog.from_dict() here
        return model

We'd like to highlight a couple of things in the from_dict method:

  • It's a static method
  • We check whether each key is present in the dictionary. This strategy makes parsing all properties optional. You could change the method's return type to Optional['User'] and return None when parsing fails, or you could throw an error in such a case.

Validation

Reusing our User model, what would we want to validate there? We should actually validate every property, but let's start with the first_name.

We like to tackle the validation using two methods for each property.

  • is_valid - this method would return bool based on the validation result.
  • validate - this method would return an error message if the validation fails.

So let's validate the first_name property.

    import re
from typing import Optional


class User:
    first_name: str = ''
    last_name: str = ''
    email: str = ''
    password: str = ''
    dog: Dog = None

    # ... (skipping methods defined above)

    def is_first_name_valid(self) -> bool:
        return bool(self.first_name and re.match(r"[a-zA-Z]{1,30}", self.first_name))

    def validate_first_name(self) -> Optional[str]:
        return None if self.is_first_name_valid() is None else "First name is not valid"
    

We decided to use a simple (way too restricting for production) regular expression to allow only alphabetic characters between 1 and 30 characters long.

Static validation methods

Depending on your project requirements, you might want to consider putting the validation logic into a static method and letting the instance methods piggyback on it. The advantage would be that you could validate data without creating a User instance, i.e., User.is_first_name_valid("John")

We use cookies to track activity using Google Analytics & reCAPTCHA. It helps us understand what our visitors like about our product and how they interact with our website. For more information, check out our Privacy Policy.