Daten suchen

Wie finde ich die Daten, die ich brauche?

Um Ihnen die Suche nach Daten zu erleichtern, bietet opendata.swiss ein Suchfeld an. Und das hat es in sich: Sie können mittels sogenannter Querysyntax komplexe Suchanfragen auf die Daten von opendata.swiss absetzen. Hier erklären wir Ihnen, worum es sich dabei genau handelt und worauf Sie achten müssen. Bitte beachten Sie, dass das Dokument derzeit nur in Englisch verfügbar ist.

Technical background

opendata.swiss has a very powerful search engine, that can help you to find exactly the datasets you want. The search is provided by the open source component Apache Lucene/Solr. Every dataset is indexed by Solr when it gets updated, and if you perform a search on the portal, this index is queried to efficiently deliver results.

The search index is basically the “database” where all the information for the search is saved. It uses a custom schema with all the dataset fields that should be indexed. The schema is flat, i.e. nested elements like resources must be saved differently, in order for Solr to index them. The same applies to the multilingual fields, which are all stored with the language suffix, e.g. keywords_en contains the English keywords.

By default, all the fields that belong to a dataset are copied in one field (called “text”), so that the search process only has to check one field to find a match. So if a user submits a search with the query “weather”, Solr runs this query against the “text” field of all datasets.

Search Index

The search index contains the following fields:

URLs

url, ckan_url, download_url, res_url

Text-fields

extras_*, res_extras_*, urls, name, title, title_string, text, license, notes tags, groups, organization, res_name res_format, res_description, identifier, see_alsos maintainer, author, publishers, contact_points

Translated fields

title, keywords, groups organization, res_name, res_description

Find more detailed information about the Solr configuration in the official Solr documentation. The config and schema of opendata.swiss is available on GitHub:

The source of the referenced files in the solr.xml (e.g. italian_stop.txt, fr_elision.txt, etc.) can be found in the official CKAN-Repository of the current CKAN-Version on Github. All other files (e.g. stopwords.txt) are provided by Solr.

Query syntax

Solr has its own query syntax to write complex queries. Depending on the query, Solr uses a different query parser to determine what to do.

Search operators

  • Use +{field}:{value} to include a search term, e.g. +title_en:power to find all datasets, whose English title contains the word “power”

  • Use -{field}:{value} to exclude a search term, e.g. +title_en:power -title_en:hydraulic to find all datasets, whose English title contains the word “power”, but not “hydraulic”

  • Use AND to combine several search terms that all must match, e.g. keywords_en:(geology AND geophysics) to find all datasets that have both tags geology and geophysics

  • Use OR to combine several search terms, where one of them must match, e.g. organization:(kanton-thurgau OR stadt-zurich)

  • All of these options can be further combined together, e.g. organization:(kanton-thurgau OR stadt-zurich) karte

Searchterm suggestions

The search-field of opendata.swiss provides searchterm-suggestions when a user types into it. For each language a self-contained Solr index is built multiple times throughout the day. That means that changes to datasets or new data won’t be reflected in the suggestions immediately.

The index is based on the following fields:

  • dataset-title (translated)

  • keywords (translated)

  • groups (translated)

  • organization (translated)

  • distribution-name (translated)

  • author

  • maintainer

  • contact_points

  • publishers

  • identifier

  • distribution-format