QEP/RFC sqlite virtual tables

Matthias Kuhn

2014-10-22 09:21:33 UTC

Moving this to a new thread (from 1:many relation...

Well, QgsExpression has been designed to mimick the SQL WHERE clause, so
yes I agree.
My point is that if you want to extend expression to support joins,
aggregates, window functions, etc. you end up recreating SQL, but
probably a less powerful one.

You can call it mimicking, you can also call it implementing it's own
SQL (subset) dialect.
Yes, you end up recreating SQL. How powerful it will be is hard to say
right now, right?

The difference I can see between expressions and SQL is that functions
used in expressions may have no direct equivalent SQLite-side. But it
can be resolved by using user-defined functions in SQLite (that would
just call code from QgsExpression for instance), so that the SQL dialect
would be seen as an extension of the current QgsExpression, not
something different.

I don't see anything that prevents virtual layer queries from being
optimized server-side. It has to be taken into account during the
implementation of virtual layers, of course, but it is hard to exhibit
the precise way it will work now without having an almost complete
implementation.

I was referring to this discussion here.
https://github.com/qgis/QGIS-Enhancement-Proposals/pull/5#issuecomment-5814
8788
Sorry I did not respond again, but I don't think a complete optimization
is possible without having deep access to virtual table functionality in
sqlite (or having to parse the same SQL on our end and replace parts of
it which basically comes close to writing our own engine again).
But let's discuss this on the QEP PR instead of inside this ml thread.

Just to give you my feeback based on my implementation of SQLite SQL dialect
in OGR (
https://github.com/OSGeo/gdal/blob/trunk/gdal/ogr/ogrsf_frmts/sqlite/ogrsqlitevirtualogr.cpp
), I also doubt that you could easily inform the server that a join should be
done. AFAIR I could forward simple filtering to the server/backend (things like
"column_A >= 5 AND column_A <= 10 AND column_B != 10") by implementing the
xFilter virtual method, but Virtual Tables are not informed of joins happening
(the sqlite virtual table API hardy exposes SQL at all). So that would indeed
require analyzing the SQL on QGIS side.
AFAIR, I've just verified that JOIN like "t1.col1 = t2.col2" are seen by the
virtual table implementation like successive filters "t1.col1 = val1", "t1.col1
= val2", where val1, val2 are values from t2.col2.
And regarding xFilter, even filters with OR etc.. are hidden to the virtual
table implementation and evaluated only by SQLite.

Thanks Even for your feedback.
The idea under virtual layers is that SQLite may be exposed to advanced
users if they want to use a complex query on some data sources.
It is very interesting to use SQLite to replace existing features from a
code point of view because it simplifies implementation.
But exposing a SQL query for already existing features such as joins may
not be desirable for standard users.
So for joins, since the information comes from QGIS ("I want to join
tables t1 and t2 on this column predicate"), the SQL query can be
deduced and some other metadata is also known, like the fact that t1 and
t2 may be from the very same database.
In that case the virtual layer may be created with some metadata saying
that this is in fact a join that must be optimized. And then the virtual
layer provider can ask the underlying provider (say postgis) if it can
process the whole join query.

I am not only referring to the current relations implementation here. I
don't think aggregate functions need necessarily be based on creating a
relation definition in the project file (Just like in a database you
also don't need a foreign key constraint to create join).

But if I want to create a new layer or an ad-hoc iterator that joins
data or makes use of other advanced relational database features, it
would be nice if QGIS could forward such requests to the database. Say I
want to create a symbology based on the amount of waste within
countries, QGIS should be able to execute this request on the database.
In this scenario you could likely want to define this request inside an
expression (e.g. a virtual column).
So basically, I think we need a functionality that is able to work as a
database abstraction layer. Forward wherever possible to the db, use a
(local) fallback implementation where not.

I am not opposed to use SQLite for this. There is no reason for us to
reinvent the wheel, the NIH syndrome should not be our driver. But I see
some limitations and I would rather prefer to have these properly
addressed. And I will be more than happy if you tell me that you
convinced the sqlite virtual table developers that they should extend
their API to expose the parsed SQL structure and allow to tweak it. Then
we have a fallback implementation which is well-tested but we still have
the possibility to fine-tune requests for big databases.

Regards,
Matthias

--
Help getting QGIS to the next level of quality before November 15!
http://blog.vitu.ch/10102014-1046/crowdfunding-initiative-automated-testing