A Tale of MySQL and SQLAlchemy

Posted on August 13, 2016

I recently got myself into a little bit of trouble when I ran some buggy code against a database. It was for a personal project to develop my skills with the Flask framework so I was being quite lax with testing. To cut a long story short, functionality which was designed to make less than 10 changes to a database made several hundred. Whoops.

I could have recreated the database but figured as the point of the project was to learn, this provided a good opportunity. The following are a few MySQL and SQLAlchemy commands which I found useful and wanted to document as a reference.

MySQL LIKE functionality - add a % to match any number of characters, equivalent to * in other systems. Using _ matches a single character.

SELECT * FROM [table] WHERE [field] LIKE '[string]%';
UPDATE [table] SET [field] = [new_value] WHERE [field] LIKE '[string]%';

Another useful SQL ability is to use a select statement against one table to alter fields in another. Essentially, you can update one table based on entries in another table. For example, you may have the following two tables:

Posts table
+----------------+--------------+
| Field          | Type         |
+----------------+--------------+
| id             | int(11)      |
| post   	     | varchar(200) |
| date_published | datetime     |
+----------------+--------------+

PostTags association table
+----------+---------+
| Field    | Type    |
+----------+---------+
| post_id  | int(11) |
| tag_id   | int(11) |
+----------+---------+

Let’s say you wanted to delete all tag associations for posts which were published on 12/08/2016.

DELETE FROM PostTags WHERE post_id IN (SELECT id FROM Posts WHERE date_published LIKE '2016-08-12 %')

Should you need to delete rows after a specific row on a table with no id field, such as the association table above, it’s possible to create the id column and have MySQL fill in all the values on creation. These can then be used to form the query.

ALTER TABLE [table] ADD id INT UNIQUE AUTO_INCREMENT NOT NULL;
DELETE FROM [table] WHERE id > [x];
ALTER TABLE [table] DROP id;

The following select statement will return a count of rows, instead of the actual rows.

SELECT COUNT(*) FROM [table] WHERE [x = y];

As this is becoming a bit of a command reference, I would be amiss to leave out adding rows and updating rows.

    INSERT INTO [table] ([field1],[field2],[field3]) values ('[value1]','[value2]','[value3]');
    UPDATE [table] set [column] = '[value]' WHERE [x = y];

The following can be used to find duplicate rows, for example, posts in PostTags which have the same tag associated more than once.

    +----------+--------+
    | post_id  | tag_id |
    +----------+--------+
    |       45 |      6 |
    |       50 |     10 |    |       50 |     10 |    |       50 |      6 |
    |       51 |      8 |
    +----------+--------+

    
    SELECT * FROM [table] GROUP BY [field1],[field2] HAVING COUNT(*) > 1;

In this particular instance I needed to remove all duplicate entries, but leave a single row behind, otherwise the tag would no longer be associated with the post. For example, if a post had the same tag associated with it tree times, I needed to only remove two of those three rows. There may be a way to do this in MySQL, but I found it more convenient to use SQLAlchemy.

The first task is to collect a list of all rows which have duplicate entries.

result = db.engine.execute('SELECT * FROM [table] GROUP BY [field1],[field2] HAVING COUNT(*) > 1;').fetchall()

The key here is that the result list only contains a single instance of each duplicate row, in the form of an SQL Alchemy object.

We can then run a for loop which iterates through the result list and deletes each entry.

for r in result:
 db.engine.execute('DELETE FROM [table] WHERE ([field1],[field2]) IN ((' + str(int(r[0])) + ',' + str(int(r[1])) + '))')

I’m sure that is more complicated than it needs to be, but it got the job done.

The above code would have the following effect on our previous example table.

+----------+--------+
| post_id  | tag_id |
+----------+--------+
|       45 |      6 |
|       50 |      6 |
|       51 |      8 |
+----------+--------+

As we’ve just erased every row for pairs which were in the table multiple times, we need to re-add each entry once. Luckily our result list already has everything we need to accomplish this.

for r in result:
 db.engine.execute('INSERT INTO [table] (([field1],[field2]) VALUES (' + str(int(r[0])) + ',' + str(int(r[1])) + ')')

And the end result would be:

+----------+--------+
| post_id  | tag_id |
+----------+--------+
|       45 |      6 |
|       50 |      6 |
|       51 |      8 |
|       50 |     10 |
+----------+--------+

That’s all for now, it certainly was an interesting learning experience.