In this blog post we will do several attempts at implementing a simple backend function, and while doing so, we will be identifying some performance and concurrency issues. Hopefully reaching a decent solution by the end. We have a table called counters on PostgreSQL with two columns: id serial primary key, count int not null. We now want to implement a function that given an id, increases that id by one and returns its new value. -- pseudocode to illustrate the desired behavior. -- return the updated value (nil if not found) function increment_counter(id) local count = Counter[id] if count == nil then return end local new_value = count + 1 Counter[id] = new_value return new_value end -- and a batch version to increment multiple counters at once, returning a -- map where the keys are the ids, and the values the new counter values: function batch_increment_counters(ids) local res = {} for i, id in ipairs(ids) do res[id] = increment_counter(id) end return res end For some weird reason there are some developers that prefer using unsafe, verbose and poorly performing libraries instead of writing typesafe and simple SQL, removing useless layers of abstraction in the process. -- here the code follows the same imperative pattern but now we query for -- a counter object instead. That counter object will magically know when -- it is modified so that "updates" get sent when we do session:flush() or -- session:commit(): function increment_counter(session, id) -- this is the same as doing: SELECT * FROM counters WHERE id=$1; local counter = session:query(Counter):filter_by{id=id}:first() if counter == nil then return end local new_value = counter.count + 1 counter.count = new_value return new_value end -- same code as before, except for the db session: function batch_increment_counters(session, ids) local res = {} for i, id in ipairs(ids) do res[id] = increment_counter(session, id) end return res end We could argue in favor of using an ORM that this allowed us to have all the query code in the host language instead of having to deal with it as strings (as would happen if we were manually using SQL). Yet this is a false dichotomy, as it's possible to easily build the queries from an AST, and there are tools for languages with static type checking that can even verify a certain degree of soundness of those ASTs at compile time, ie: jOOQ. However by using an ORM you are distancing yourself from the real code that you'll be running, and unless you're an expert on the specific ORM being used (and even if you are) it's quite common to find implicit queries run on unexpected moments. And while in most cases your code will look perfectly ok, the generated SQL could end up querying several tables multiple times with tons of useless joins! And you will never notice how awful they are unless you test thoroughly the code while carefully looking at the mountains of badly generated SQL... or until your app goes to production and is hit with a burst of real usage on real data. Then you'll need to do large refactors so that the ORM magically manages to generate what you were expecting in the first place. The main fallacy of ORMs is the belief that you won't need to learn SQL if you decide to use it. In practice however you'll end up writing ORM calls expecting it's generating what you were expecting, and then rewriting them once you realize it isn't. And unless you have deep knowledge of SQL and how it works, you may end up with a disaster waiting to happen. But the main problem is that ORMs encourage a model where the data is fetched into the program memory for its processing. While what we really need for our code to be performant and safe is to have the logic written in a declarative fashion so that it can be correctly adapted to RDBMs. Pure functional or declarative code allows us some side benefits like being able to retry queries (with application code in the middle) in case of dirty reads. Haskell's STM Monad is a beautiful example of how this is done. Enough ORM bashing (for now), let's go back to the example and see what really happens when you run that code trying to update ids={2,7,8}: BEGIN; SELECT 1; -- why not? orms love adding useless load SELECT counters.id "counter_id", counters.count "counter_count" FROM counters WHERE id=2; SELECT counters.id "counter_id", counters.count "counter_count" FROM counters WHERE id=7; SELECT counters.id "counter_id", counters.count "counter_count" FROM counters WHERE id=8; -- then the logic of your application will run until you do -- session.commit() and you see (assuming there was no counter with id=7): UPDATE counters SET count=37 WHERE id=2; UPDATE counters SET count=891 WHERE id=8; COMMIT; You then do some unit testing (using a real db instance even): it works. Plus some manual testing for good measure: it works. Ready for deploy...? Not so fast, we are just starting! Let's talk about "lost updates". A lost update, as its name suggest, happens when a change is totally or partially overwritten by another user that wasn't aware of it. Let's say we have two users. 1. User A reads a variable foo, and remembers that its value is 42. 2. User B reads the same variable foo, remembering its value: 42. 3. Then after having lunch user A decides to overwrite foo with the value being remember plus 1, with 43. 4. Finally B, that took a bit longer decides to do the same: overwrite foo with the remember value plus one, 43. So in the end foo gets set to 43. However if A and B would have done that process on different days, the value would have ended up being 44 instead. As you may now realize, the previous code has exactly this same issue. There are two approaches to this: * either lock so that when user B tries to read at step 2, that request gets blocked until user A finishes, * or using the more performant "UPDATE" alternative that combines both operations in a single one: UPDATE counters SET count=count+1 WHERE id=$1 RETURNING count; So let's use the UPDATE alternative and observe how we can no longer the traditional imperative code we had before, now we are mandated to write a bunch of calls to build the query in a language that seems to have a resemblance with SQL, but with possible weird behaviors: Achivement unlocked: ORM becoming useless, code completely rewritten! function increment_counter(session, id) -- this is the same as doing: SELECT * FROM counters WHERE id=$1; local query = Counter.update() :returning(Counter.c.count) :filter_by{id=id} :values{count=Counter.c.count + 1} local new_value = session:execute(query):first() end And we also need to change the batch version, that changes so much that it makes sense to start from the new increment_counter: function batch_increment_counters(session, ids) -- this is the same as doing: SELECT * FROM counters WHERE id=$1; local query = Counter.update() :returning(Counter.c.id, Counter.c.count) :where(Counter.c.id:in_(ids)) :values{count=Counter.c.count + 1} local res = {} for _, row in ipairs(session:execute(query):fetchall()) do res[row.id] = row.count end end -- and now it makes sense to rewrite once more time the single version: function increment_counter(session, id) return batch_increment_counters(session, {id})[id] end Now we are no longer using the ORM as an ORM, instead now it has become a mere translation layer. Get ready to having to learn two different languages for the same; and don't be surprised if magically these seemingly direct translations sometimes get automatically impoverished by useless joins and weird side effects of our host languages. Even here the ORM will do whatever it can in order to make our lives as painful as possible. Ok, ok, now that we are not using the ORM as an ORM and that we changed the code to do atomic updates insead... is it finally working? Once again: unit testing -> it works. Manual testing -> it works. We even run the same code in two processes in parallel, and it works. does it really work? ... well... not really... let's do some testing: fran=# create table counters(id int primary key, count int not null); CREATE TABLE fran=# insert into counters select n, 0 from generate_series(1, 1000000) n; INSERT 0 1000000 now in two different ttys let's run this same one-line-script to run 100 updates, each randomly updating about 1% of the table: $ for i in `seq 1 100`; do echo ' update counters set count = count+1 where (id*random())::int%100 = 0' | psql; done UPDATE 9827 UPDATE 9893 UPDATE 9837 UPDATE 9847 UPDATE 10022 UPDATE 10078 UPDATE 9897 UPDATE 9751 UPDATE 9729 UPDATE 9938 UPDATE 9877 UPDATE 10014 UPDATE 9993 UPDATE 9981 UPDATE 9915 UPDATE 9869 UPDATE 10040 UPDATE 9764 UPDATE 9890 UPDATE 10029 UPDATE 10022 UPDATE 9912 UPDATE 9760 UPDATE 9864 UPDATE 9653 UPDATE 9986 UPDATE 10058 UPDATE 9956 UPDATE 9847 UPDATE 9957 ERROR: deadlock detected DETAIL: Process 20052 waits for ShareLock on transaction 680; blocked by process 20065. Process 20065 waits for ShareLock on transaction 679; blocked by process 20052. HINT: See server log for query details. CONTEXT: while updating tuple (6328,261) in relation "counters" UPDATE 9832 (wat.gif)!? TLDR: PostgreSQL update locks the updated rows not necessarily in order. PostgreSQL is an MVCC (multi version concurrency control) database, which means that for each transaction either all of the changes will be applied or none of the changes will. And this is how it's done: * All transactions are numbered by an increasing transaction id. * All the rows have xmin and xmax version numbers, xmin with being the id of the transaction that created that row and xmax the id of the transaction that deleted it. * When there's an update, the updated row data doesn't change, what really happens is that its xmax is set, and the an updated copy of the row gets created into the database. When querying, PostgreSQL compares the id of the current transaction "txid_current" with xmin, ignoring rows with old xmin ids and those with larger uncommitted xmin. When updating it will also check at xmax (along with other fields) to see if the row that it's trying to update has already been updated by another uncommitted transaction, in which case it blocks until that transaction is either committed or rolled back. What then could happen is: * user A updates row1 * user B updates row2 * user A tries to update row2, but it's blocked by B. * user B tries to update row1, but it's blocked by A. Point in which this transaction by B is rolled back with a "deadlock detected" error, to avoid blocking these two transactions to infinity and beyond. So... Ooof, how about doing one update at a time with a strict order: That way we could avoid deadlocks (this is called 2PC): https://en.wikipedia.org/wiki/Two-phase_commit_protocol However doing the updates one at a time is really slow... it's completely unacceptable. So... Thankfully, there's a better alternative: "SELECT FOR UPDATE". What select for update does is to lock the selected rows (by setting its xmax), allowing us to specify an order for locking! and we can even place these SELECT FOR UPDATE as a subquery of the updates for extra performance! This is how the previous one-line-script would look like: $ for i in `seq 1 100`; do echo ' update counters set count = count+1 where id in ( select id from counters where (id*random())::int % 100 = 0 order by id for update )' | psql; done Q: If this can be fixed by locking in order, why doesn't PostgreSQL do this? A: Because we were updating random rows spread all around the table. If the server would need to update them in order, it would imply having set those xmax randomly all around the table (which is obviously slow)... Q: How much slower is it? A: The previous test on my computer has total time: 19s vs 36s (about twice as much). Q: Would it be possible to try to update them without the SELECT FOR UPDATE and if it fails, then update it with it? A: Yup, you could do that, but the note that the deadlocks are only avoided if all the transactions involved respect the same order. So if you were to just keep retrying using the random order, there could be some pathological case where you'd end up deadlocking forever. I have a hunch that by using 2PC (the for update) for the updates following the deadlock, the chances of it occurring again would be exponentially smaller with the number of retries. So in the case of a failure you would see: BEGIN; ... -- some queries here SAVEPOINT xyz; UPDATE table SET ... WHERE conditions; -- deadlock detected! (with low probability ^_^) ROLLBACK TO SAVEPOINT xyz; UPDATE table SET ... WHERE ids IN (SELECT ids FROM table WHERE conditions ORDER BY ids FOR UPDATE) RELEASE SAVEPOINT xyz; ... -- continue with transaction normally And (usually) when it works: BEGIN; ... -- some queries here SAVEPOINT xyz; UPDATE table SET ... WHERE conditions; RELEASE SAVEPOINT xyz; ... -- continue with transaction normally Note that the updates with the SELECT FOR UPDATE could deadlock as well. This appendix is answer to: "What would you use instead of an ORM?" Either something like jOOQ, or if I felt like wanting to write something from scratch, how about using a simple raw database library like: https://github.com/fcr--/lua-pgdriver With that in mind, let's start with a simple version: function increment_counter(db, id) return db:mexec([[ UPDATE counters SET count = count + 1 WHERE id = $1 RETURNING count ]], {{id}})[1][1].count end -- The code is just building the query and that's it: function batch_increment_counters(db, ids) local params = {} for i=1,#ids do params[i] = '$'..i end local res = {} for row in db:mquery([[ UPDATE counters SET counter = counter + 1 WHERE id IN ( SELECT id FROM counters WHERE id IN (]] .. table.concat(params, ',') .. [[) ORDER BY id FOR UPDATE) RETURNING id, count; ]], {ids})() do res[tonumber(row.id)] = tonumber(row.count) end return res end One way to improve it, is to have a template language, that could be as simple as this basic function I wrote in a couple of hours: expand_template. expand_template takes a string with the template and a table of options that is used according to the patterns mentioned inside the template: $name: Simple mapping. It will be replaced by a unique-in-the-query '$'..number: expand_template 'SELECT $foo, $bar' == 'SELECT $1, $2'. ${name=subtemplate}: Loop n-times the inner template subtemplate. This is useful when you need to expand a variable number of items as is usually the case in insert statements: expand_template('INSERT INTO foo VALUES ${foos=($id, $name)}', {foos_count=2}) == 'INSERT INTO foo VALUES ($1, $2),($3, $4)'. ${name?subtemplate}: Conditional expansion. If name is true then subtemplate will be expanded in its place, or an empty string if false: expand_template('SELECT 1{x?+$n}', {x=true}) == 'SELECT 1+$1'. expand_template('SELECT 1{x?+$n}', {x=false}) == 'SELECT 1'. $_: Single argument replacement. This can be used when there's a single argument in a template or subtemplate: expand_template 'foo $_'=='foo $1' And this is its source code; note that it's so simple that we didn't even need lpeg for this. Just basic top-down recursive pattern matching: local function expand_template(template, opts) local argcount = 0 local function rec(template, mapping, mapping_submappers) -- we keep a dictionary with the mappings from parameter name to -- argument index: mapping = mapping or {} -- and (for the recursive formatters) a table from the name to an -- array of the submappers mapping_submappers = mapping_submappers or {} template = template:gsub('%$(%b{})', function(parenscontent) -- support for conditional expansion: local name, subtemplate = parenscontent:match '{%s*([%a_][%w_]*)%s*%?%s*(.-)%s*}$' if name then if type(opts[name]) ~= 'boolean' then error(('conditional parameter %q must be passed as a boolean '.. 'option to expand_template'):format(name)) end if not opts[name] then return '' end return (rec(subtemplate, mapping, mapping_submappers)) end -- and recursive multi-parameter expansion: name, subtemplate = parenscontent:match '{%s*([%a_][%w_]*)%s*=%s*(.-)%s*}' if name then local submappers = {} mapping_submappers[name] = submappers local buffer = {} for i = 1, assert(opts[name .. '_count'], 'missing corresponding _count key' ) do buffer[i], submappers[i] = rec(subtemplate) end local separator = opts[name .. '_separator'] or ',' return table.concat(buffer, separator) end error(('unsupported template format %q'):format(parenscontent)) end) -- now replace the names at the current level: template = template:gsub('%$([%a_][%w_]*)', function(name) if not mapping[name] then argcount = argcount + 1 mapping[name] = argcount end return '$' .. mapping[name] end) return template, function(args, row_arguments) row_arguments = row_arguments or {} if type(args) ~= 'table' then assert(mapping._, 'non table arguments can only be used with $_') row_arguments[mapping._] = args return row_arguments end for k, value in pairs(args) do if mapping[k] then row_arguments[mapping[k]] = value elseif mapping_submappers[k] then local submappers = mapping_submappers[k] for i, innervalue in pairs(value) do assert(type(i)=='number' and i==math.floor(i) and 1<=i and i<=#submappers) submappers[i](innervalue, row_arguments) end else error(("argument %q wasn't mentioned at the template"):format(k)) end end return row_arguments end end return rec(template) end In order to explain how it works, we show an unrelated example where all the supported patterns are used, letting us know at all times the exact SQL code being generated: function get_employees(opts) -- here we generate the query, based on the parameters in opts: -- note that here we know exactly what's being generated, and in -- particular we have a strong separation between the query generation -- and the query execution. That's why we never accept the particular -- values when expanding the template. local sql, mapper = expand_template([[ SELECT * FROM employees WHERE true ${has_name? AND (name LIKE $name)} ${has_min_age? AND ( date_of_birth <= now() - '1 year'::interval * $min_age)} ${has_roles AND role IN (${roles=$_})} ${has_limit? ORDER BY id LIMIT $limit} ]], { has_name = (opts.name~=nil), has_min_age = (opts.min_age~=nil), has_roles = (opts.roles~=nil), roles_count = opts.roles and #opts.roles, has_limit = (opts.limit~=nil), }) -- The mapper is the function that converts the tree of named values -- flattening into an array $1..$n of the items. The expansion of the -- prepared statement arguments will be done by the PostgreSQL server. return db:mexec(sql, {mapper{ name = opts.name, min_age = opts.min_age, roles = opts.roles, limit = opts.limit, }})[1] end -- example call: local employees = get_employees { name = '%smith%', min_age = 42, roles = {'developer', 'sysadmin'}, } * Don't use an ORM. * If you're going to do batch updates, the simplest safe way is to always update with the SELECT FOR UPDATE subquery (don't forget the ORDER BY). * Doing the SELECT FOR UPDATE does not come for free.