Implementing an open source spatio-temporal search platform for Spatial Data Infrastructures
A Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide an efficient and flexible way to use spatial information. One of the key software components of an SDI is the catalogue service which is needed to discover, query and manage the metadata. Catalogue services in an SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard which defines common interfaces for accessing the metadata information. A search engine is a software system capable of supporting fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. In this paper we will be focusing on the Lucene, a powerful Javabased search library. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (WorldMap http://worldmap.harvard.edu), the benefits of both worlds (OGC catalogues and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, built on an open source stack. The system implements an OGC catalogue based on pycsw, to provide access to metadata in a standard way, and uses a search engine based on Solr/Lucene, to provide advanced search features typically found in search engines.