I posted a message yesterday, but somehow it didn't get on the list. Trying again....
I have some questions/remarks about the (incredibly useful) geo_shape type and filters.
Are there plans to support "Pre-Indexed-Shapes"
also in documents, i.e. specify a pre-indexed shape to be indexed with a
new document, instead of adding the geometry itself to that document? I
would expect that in many use-cases the same geometries will be
indexed with multiple docs. Just like with filters/queries the
performance could benefit quite a lot if the indexer could just copy the
hashes over from an already indexed geometry.
Imho allowing a serialisation of a geometry as
e.g. WKT would not only trim-down on the size of documents, but also on
the work that elasticsearch needs to do for serializing/deserializing
geometries. Polygons quickly become really big when expressed in JSON...
Is this something that is considered and/or that will be accepted when
provided in a decent pull-request?
- A bit more documentation about how the combination of
distance_error_pct and tree_levels affects the precision/results of
filters would really be appreciated. From the docs and code I'm having a
hard time understanding the consequences of altering both values on
filters and indexes. What exactly does distance_error_pct, and how does it affect e.g. an intersection filter?
- Quote from the docs: "Because of current limitations of the algorithm, very large
shapes are not deemed to intersect with very small filter shapes". Are
there any plans to fix this? Assuming the algorithmic problem is that large shapes are only hashed up to a maximum depth, there are a couple of ways to fix this. E.g. the indexer could add an extra field with the hashes from only the deepest hash-level it uses for that geometry. The intersection filter could use this by extending (boolean or) the current filter with a term-filter on that field for all parents of the hashes it currently uses for searching. That way larger shapes that intersect will be included, and smaller shapes that only happen to share a parent won't be included.
- The algorith for "within" (In TermQueryPrefixStrategy) could be improved (imho). It's currently inconsistent for geometries that are equal or just a tiny bit smaller than the filter-geometry, and I think that could easily be fixed. I've filed ann issue about this yesterday, so I won't get further into it here, see https://github.com/elasticsearch/elasticsearch/issues/2552