SQLite Performance Tuning In A Nutshell

SQLite is used in wide variance of use cases. As small application repositories, as a file store for data exchange or even as a full fledged database with huge amount of data and complex queries. And SQLite can handle that! Its performance is quite impressive due to using native I/O and a pretty cool SQL query planner.

To make good use of this query planner and get every query done in a few milliseconds one should follow some simple rules and review the SQL statements and indices on the database. But how does that work?

I will walk through a few simple examples of performance problems like missing indices, to show how to find what is wrong and how to fix. I will do that step by step.

1. Create yourself a test database

On a Mac everything you need is there. On Windows you have to download and install it. Make sure you have the SQLite directory in your path. To create a datebase then simply write

sqlite3 Performance.db

2. Create the tables for the test

We create two simple tables "Foo" and "Bar", each having 4 columns "ColumnA", "ColumnB", "ColumnC", "ColumnD". Both tables have the first column as their primary key. The table "Bar" has one additional column which refers to table "Foo", so we have a foreign key relationship to test joins. The columns is called "FooColumnA".

CREATE TABLE "Foo" (
    "ColumnA"    TEXT NOT NULL,
    "ColumnB"    TEXT NOT NULL,
    "ColumnC"    TEXT NOT NULL,
    "ColumnD"    TEXT,
    PRIMARY KEY("ColumnA")
);

CREATE TABLE "Bar" (
    "ColumnA"    TEXT NOT NULL,
    "FooColumnA"    TEXT NOT NULL,
    "ColumnB"    TEXT NOT NULL,
    "ColumnC"    TEXT NOT NULL,
    "ColumnD"    TEXT,
    PRIMARY KEY("ColumnA"),
    FOREIGN KEY("FooColumnA") REFERENCES "Foo"("ColumnA")
);

3. Fill the tables with test data

We fill the table with random integers as strings, just to have some volume and see the differences between fast and slow queries. We fill in the table "Foo" like 1000 rows, in the table "Bar" over one million rows. So we have like a small and a big table. Beware the creation of the data may take up 20 seconds!

INSERT INTO Foo (ColumnA, ColumnB, ColumnC, ColumnD)
    SELECT RANDOM(), RANDOM(), RANDOM(), RANDOM()
        FROM (SELECT * FROM (
            (SELECT 0 UNION ALL SELECT 1) t2, 
            (SELECT 0 UNION ALL SELECT 1) t4,
            (SELECT 0 UNION ALL SELECT 1) t8,
            (SELECT 0 UNION ALL SELECT 1) t16,
            (SELECT 0 UNION ALL SELECT 1) t32,
            (SELECT 0 UNION ALL SELECT 1) t64,
            (SELECT 0 UNION ALL SELECT 1) t128,
            (SELECT 0 UNION ALL SELECT 1) t256,
            (SELECT 0 UNION ALL SELECT 1) t512,
            (SELECT 0 UNION ALL SELECT 1) t1024
        )
    );

INSERT INTO Bar (ColumnA, FooColumnA, ColumnB, ColumnC, ColumnD)
    SELECT RANDOM(), FooKey, RANDOM(), RANDOM(), RANDOM()
        FROM (SELECT * FROM (
            (SELECT 0 UNION ALL SELECT 1) t2, 
            (SELECT 0 UNION ALL SELECT 1) t4,
            (SELECT 0 UNION ALL SELECT 1) t8,
            (SELECT 0 UNION ALL SELECT 1) t16,
            (SELECT 0 UNION ALL SELECT 1) t32,
            (SELECT 0 UNION ALL SELECT 1) t64,
            (SELECT 0 UNION ALL SELECT 1) t128,
            (SELECT 0 UNION ALL SELECT 1) t256,
            (SELECT 0 UNION ALL SELECT 1) t512,
            (SELECT 0 UNION ALL SELECT 1) t1024
        )
    )
    JOIN (SELECT ColumnA AS FooKey FROM Foo ORDER BY RANDOM() LIMIT 1024);

Thanks to [Joachim Isaksson] (https://stackoverflow.com/questions/17931320/how-to-insert-random-data-into-a-sqlite-table-using-only-queries)

4. A first look at the database

So we have a filled database, not even a small one. It should be over 200MB in size. So that is something to work with. What about indices? There are none in the statements above, but SQLite adds indices to primary keys automatically, so we have in each table one index on ColumnA. Lets see that:

EXPLAIN QUERY PLAN 
SELECT * FROM Foo
WHERE Foo.ColumnA > 0;

Executing that shows:

QUERY PLAN
`--SEARCH TABLE Foo USING INDEX sqlite_autoindex_Foo_1 (ColumnA>?)

So just with the keyword "EXPLAIN QUERY PLAN" before your SQL you see how SQLite is trying to achieve its task. Here we see several things

  • "SEARCH TABLE Foo": SQLite is using an index to work on the table. If that wouldn't be the case you would see "SCAN TABLE Foo". So that is good
  • "USING INDEX sqlite_autoindex_Foo_1": SQLite tells us which index is uses. This is the index automatically created with the primary key. Hence the name.
  • "(ColumnA>?)": This is the condition which is used against the index.

5. A query with a problem

So lets try something different. We query against a column with no index:

EXPLAIN QUERY PLAN 
SELECT * FROM Bar
WHERE Bar.ColumnB > 0;

That says

QUERY PLAN
`--SCAN TABLE Bar

Thats bad. That is a full table scan on a table with over a million rows. We need an index here. But that comes later.

6. A query with no problem, but why?

So there was no index on the column we asked for. Lets do that again but different. We now ask for several columns, but not all columns have indices. We ask for ColumnA and ColumnB, but only ColumnA has an index:

EXPLAIN QUERY PLAN 
SELECT * FROM Foo
WHERE Foo.ColumnA > 0 AND Foo.ColumnB > 0;

That says

QUERY PLAN
`--SEARCH TABLE Foo USING INDEX sqlite_autoindex_Foo_1 (ColumnA>?)

That was perhaps unexpected? SQLite did not do a table scan. It used the index of ColumnA - and the query goes pretty fast. Why is that? Because the query optimizer can use indices with the last column missing. So this is quite smart and a thing to keep in mind. Not every column we ask for needs an index, so always look what the optimizer needs. Additional indices also cost performance on inserting or modifying rows, so they are not for free.

7. A fix for the missing index

In 5 we saw an index was missing. So lets add that now:

CREATE INDEX "idx_bar_columnb" ON "Bar" ("ColumnB");

And voila: QUERY PLAN `--SEARCH TABLE Bar USING INDEX idx_bar_columnb (ColumnB>?)

8. A more complex query with a problem

The same mechanism works for more complex queries. Here is a join and it is not doing well:

EXPLAIN QUERY PLAN 
SELECT Foo.*, Bar.* FROM Foo
JOIN Bar ON Foo.ColumnA = Bar.FooColumnA
WHERE Foo.ColumnB > 0;

QUERY PLAN
|--SCAN TABLE Bar
`--SEARCH TABLE Foo USING INDEX sqlite_autoindex_Foo_1 (ColumnA=?)

There is a table scan again! The table "Bar" gets joined and for that SQLite scans each entry for the table. Also there is no index on "Foo.ColumnB". So lets fix that first:

CREATE INDEX "idx_foo_columnb" ON "Foo" ("ColumnB");

That brings us to:

QUERY PLAN
|--SCAN TABLE Bar
`--SEARCH TABLE Foo USING INDEX sqlite_autoindex_Foo_1 (ColumnA=?)

That changed nothing. For whatever reason. So lets add the missing index on "Bar.FooColumnA":

CREATE INDEX "idx_bar_foocolumna" ON "Bar" ("FooColumnA");

QUERY PLAN
|--SEARCH TABLE Foo USING INDEX idx_foo_columnb (ColumnB>?)
`--SEARCH TABLE Bar USING INDEX idx_bar_foocolumna (FooColumnA=?)

Now two things happened: The query planner starts using the index on "Foo.ColumnB". And then it also starts using the index on "Bar.FooColumnA". So for a complex query a missing index on one side may prevent the use of a good index on the other side.

9. Conclusion

The simple conclusion from above: Always look at the query plan of your queries. Even simple queries may have suprising results there, not to speak of complex ones.