In the age of layered architecture and microservices, there emerged a lot of ways to communicate between each component of a system. In modern projects, the main purpose of backend application is to expose an API. In the Python world the easiest way is to use Django and Django Rest Framework - the most popular tool for creating REST APIs with Python. In most cases, it gets its job done pretty well and fast. However, REST itself has its limitations.
- No way to auto generate proper API documentation. There is OpenAPI extension for DRF but still you have to annotate it for non standard cases
- Handling nested representations - that’s the greatest problem encountered when working with REST. There is no recommended way to handle it properly. That’s of course totally undesired, as stated in The Zen of Python:
There should be one - and preferably only one - way to do it.
- Over-fetching and under-fetching - REST has a fixed response schema, that means a single endpoint will always return a response with the same shape. Client can’t specify what data exactly he needs. That causes the overfetching, the client gets more data than it’s actually needed. Underfetching is caused by issue mentioned above - depending on the implementation, API may not respond with nested data needed by the client, which leads to multiple requests to different endpoints, which is, of course, inefficient.
GraphQL as an alternative to REST
To address the above issues, in recent years a new alternative to REST has appeared - GraphQL. Its main advantage over REST is allowing the user to specify exactly what he wants from the server, and the user gets exactly what he asked for - no more, no less. GraphQL was designed with nested relations in mind, so querying such data is made simple. Moreover, querying and updating the data is completely separated - first is done by queries, and the second by mutations. Recently GraphQL is a very hot topic, some say that it’s a replacement for REST APIs. Let’s check if it’s really worth that hype.
Expectations to GraphQL
Main reason to dive into GraphQL was looking for something more convenient for frontend developers. GraphQL clearly promised that but some serious doubts appeared regarding how that promise impacts implementation and how much burden will be put on developers to provide it.
REST simplifies the API by separating models into corresponding endpoints, so you have better control over what can be queried (which models, which fields) by whom. GraphQL library is expected to have proper ways of addressing these problems.
GraphQL for Django - Graphene
Basically right now there are 2 frameworks to choose between. Fairly mature Graphene and pretty new Ariadne. These two are following different approaches - Ariadne is using schema first pattern and Graphene - code first, which means that schema is generated by code, specifically, by Python classes. Ariadne seems to be a much simpler approach not suited for more sophisticated use cases therefore it is not evaluated further.
Defining data model
For testing purposes, a simple store Django application has been created. Whole application consists of a standard Django User model, Product model, Cart model and intermediate CartItem model.
In Graphene, to define schema a Python class has to be implemented for every type, mutation, input, etc. Thanks to code-first approach, code reuse is easy, as it is possible to extract some common functionalities or base classes. Moreover, Graphene has good integration with Django, which allows developers to create types directly from Django models. Another benefit is the possibility to reuse Django forms, or even DRF serializers when implementing the mutations. Graphene also has Relay implemented out of the box, which is broadly used in production environments. These advantages make adding GraphQL API to existing Django project with exposed REST API a breeze. Below there are some code snippets from Graphene implementation.
This is where the schema is glued together. In the above code, root query, resolvers and mutations are defined.
Feelings on the code with GraphQL
Graphene coding is pretty similar to DRF. Anyway, it still lacks some features like generic mutations which could do CRUD operations, like ViewSets in DRF or unified authentication and permissions management. For now, all of these have to be implemented manually, while DRF provides convenient utilities to accomplish that.
In DRF CRUD is created using Serializers and Viewsets under proper URLs. In GraphQL there is one URL and schema which define Types. Schema defines which fields are available and describes their types, and those types are validated by GraphQL itself. But nothing more. It is a representation of the API. Whole logic goes to resolvers. So a resolver can be compared to ViewSet but it also has functionalities of Serializer in terms of extra validation or doing some stuff before serializing/deserializing or saving. Isn’t it great? Finally, the whole business logic is in one place.
Writing data is different because the philosophy of writing is different in GraphQL. All API components have to be specified explicitly.
What does it mean? In REST when there is /api/projects/ endpoint one can probably think: oh, let’s try GET, PUT, PATCH, POST, DELETE, maybe something will work. In GraphQL there is a type Query in the schema which contains the field project. Cool, it’s possible to query it. There is also type Mutation with field createProject. One can think: oh cool, I can create a project. But unless there is a mutation called changeProject or deleteProject it’s certain there’s no way to do this.
Just remember that usually getting data should be very fast while writing data don’t have to be fast, but should be secure and comprehensive. That’s why GraphQL seems to be reasonable to force to explicitly write or read. There is no worries if some lifecycle method from the ViewSet or Serializer will trigger during handling read request and slow it down.
GraphQL disadvantages and tradeoffs
No tool or technology is perfect, so is GraphQL. Features which make GraphQL so attractive in some conditions turn out to be its flaws. In this section I want to point out some problems which could potentially appear.
Never trust user input. It’s the principle that every programmer should know. Malicious user input can open up a number of potential security vulnerabilities. In REST, all input accepted from the user is the endpoint, query parameters and request body, which can be easily parsed and validated. The server decides what data should be returned or what action should be performed next. GraphQL allows the users to specify exactly what data is needed from the server. There is no mechanism out of the box which could prevent the user from entering a malicious query which could kill the server. Let’s get for example a social app. User model has many-to-many relationship to itself, name it followers. The potentially malicious query is nested followers a few levels deep. Complexity of such a query grows exponentially. For testing purposes, Iet’s add followers field to standard Django User model, then create 10 users, each of them following the remaining 9. Then execute a query:
The query is only 4 levels deep, it was executing for about 30 seconds and was resulting in more than 8000 SQL queries (graphene-django-optimizer reduced it to 6 queries, but still due to complexity of those queries, it took a long time to execute). Note that these are only 10 objects in the database, imagine what will happen when allowing to execute such queries in the production environment. The problem is, actually there is no simple way to protect from that vulnerability. The naive approach is to limit allowed query depth. How to choose the allowed depth? The above query is 4 levels deep and is already exhausting for the server. On the other side, limiting query depth to some small number as 3 or 4 kills GraphQL flexibility and in some cases would lead to underfetching. Another pattern is to compute the complexity of a query, and block queries above some limit. This solution is the most effective one, but also the most complicated and not so easy to implement. How does it look in practice? Let’s check how this issue is handled in big applications, for example Github with its GraphQL API. After quick research in the documentation one can find out that there are few forms of protection against malicious queries. Section about resource limitations in API docs: https://developer.github.com/v4/guides/resource-limitations/
Still, despite the limitations, it’s possible to build a few queries which took a pretty long time to execute and some of them even caused a 500 server error. Probably there’s some timeout implemented which drops the query. That has led to a question: if such an enormous application as Github has to apply timeout to the queries, is there any good way to examine the query before evaluating it?
Not HTTP compatible
Unlike REST, GraphQL does not use the benefits of HTTP, like status codes or caching. GraphQL accepts only POST requests and always responds with 200 status code (or 500). Although lack of status codes is compensated by built-in detailed error payloads, codes are very useful from developer’s perspective, i.e. during testing or checking if request was successful. Caching is implemented by client libraries like Apollo or Relay, but it requires global unique IDs for each node to work, so it adds complexity.
GraphQL APIs seem to be way harder to implement than REST. Maybe it’s because REST is commonly known to every web developer and probably everything with REST has been already done. On the other hand, GraphQL is something completely new. But still, all problems stated above add complexity to the implementation of proper GraphQL API, expose developer to potential traps and security issues. Another thing is the maturity of REST which makes development way easier with loads of libraries and frameworks available.
Did GraphQL meet the expectations?
Did GraphQL solve the problem?
Is it worth to use GraphQL?
It depends. The concept of GraphQL is intriguing and maybe a bit addicting. The best way to describe GraphQL is mentioning that it auto-creates API documentation from the schema.
So GraphQL just forces the API to be readable and easy to understand. The code seems to be less complicated when compared to REST, but that will be probably inverted as the complexity of the project grows. Once there will be a solid ecosystem for GraphQL it may become worth switching to it from DRF for certain projects.
GraphQL and Graphene are good candidates to replace REST and DRF when preparing your next API, especially when it is focused on the read part. Your frontend devs will love it but you will have a little bit harder times when coding more sophisticated use cases because of traps and tradeoffs mentioned in previous sections.
Want to read more about Python?