You are here

Complex Queries With SQLAlchemy (Example#1)

There are lots of examples of how to use SQLAlchemy to provide your Python application with a first-rate ORM. But most of these examples tend to model very trivial queries; yet the real power of SQLAlchemy, unlike many ORM solutions, is that it doesn't hide / bury the power of the RDBMS - and if you aren't going to use that power why bother with an RDBMS at all [Aren't NoSQL solutions the IT fad of the year? You could be so hip!]. So in this post I'll provide a useful non-trivial query and how to perform the a complex query - only better - using SQLalchemy.

So first, a query:

    SELECT process.process_id, op1.value_int, op2.value_string, op3.value_string
    FROM process
      INNER JOIN route ON ( route.route_id = process.route_id )
      LEFT OUTER JOIN obj_property op1
        ON ( op1.obj_id = route.route_id AND
             op1.namespace_prefix = '' AND
             op1.value_key = 'expireDays' )
      LEFT OUTER JOIN obj_property op2
        ON ( op2.obj_id = route.route_id AND
             op2.namespace_prefix = '' AND
             op2.value_key = 'preserveAfterCompletion' )
      LEFT OUTER JOIN obj_property op3
        ON ( op3.obj_id = route.route_id AND
             op3.namespace_prefix = '' AND
             op3.value_key = 'archiveAfterExpiration' )
    WHERE process.db_status != 'archived'
      AND process.state IN ( 'C', 'F', 'Z' )
      AND process.status != 'archived';

This query returns the process_id value from the process table and then some values from multiple records from the table obj_property - if those records exist - correlated via an intermediate table route. Complex, but also fast!

Aside: With the correct DDL when creating the database PostgreSQL also enforces the integrity of the relations, maintains usage statistics, and provides multiple levels of atomicity. I certainly wouldn't want to have to do all that myself.

So how to model such a query in SQLAlchemy? First realize that the result is actually going to be better than what straight SQL / DBAPI would give us. Instead of some values the query will return real objects, the advantage of this is the amount of code eliminated by having to do things based on ids or primary keys. This example assumes that the ORM entities Process, Route, and ObjectProperty have already been described - there is lots of documentation about how to declare the relations between your tables and your objects using the declarative style.

    from sqlalchemy.orm   import  aliased
    from sqlalchemy       import and_, or_

    db = ctx.db_session()

    op1 = aliased(ObjectProperty)
    op2 = aliased(ObjectProperty)
    op3 = aliased(ObjectProperty)

    q = db.query( Process, op1, op2, op3 ).\
           join( Route, Route.object_id == Process.route_id ).\
           outerjoin( op1, and_( op1.parent_id == Route.object_id,
                       'expireDays' ), ).\
           outerjoin( op2, and_( op2.parent_id == Route.object_id,
                       'preserveAfterCompletion' ), ).\
           outerjoin( op3, and_( op3.parent_id == Route.object_id,
                       'archiveAfterExpiration' ), ).\
           filter( and_( Process.state.in_( [ 'C', 'F', 'Z' ] ),
                         Process.status != 'archived' ) )

The SQLAlchemy aliased method declares multiple references to ObjectProperty that can be used independently: op1, op2, and op3. The other advanced technique is to use the outerjoin method to relate the need for a LEFT OUTER join.

The results of this query will be tuples of four elements; the first being a Process object and the second, third, and fourth will either be ObjectProperty objects if the concomitant outer join identified a record or None if no record matched the join. The lovely upside of this is that the query results can be processed using a straight forward for-each construct:

    for process, expire_days, preserve_after, archive_after in q.all():
       if expire_days:

Personally I find the ORM code to be easier to visually parse than the native SQL. Especially if you need to build the query dynamically or modify it based on the applications needs - since q is a query object additional filter and join conditions can continue to be added. Imagine trying to do that with straight SQL?

Add one more filter expression to the queries WHERE clause and limit the query to 150 results:

    q = q.filter(Process.owner_id == 10100)
    q = q.limit(150)

Another advantage to this method is that SQLAlchemy can adapt it's dialect to the specific back-end if, for example, you are stuck using a database other that PostgreSQL. Without such an adaptive layer using anything other than the most trivial queries becomes daunting do to slight but important differences in how various engines express joins and nested queries.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer