Populated places maps
In my previous article Theme maps of Germany I showed how to create mashup maps with some data as a feature list tha is contained in the map.
In another quiz I drove this a bit further. I created several maps of Germany. Each of the map contained a feature list of cities/towns/villages with a specific name. What these places had in common was e.g. the same suffix or prefix or something else. The name of a place (like with surnames) decodes some heritage. The german suffix -stein refers to a settlement on top or close by a rock. Examples are Warstein or Königstein. Many suffixes are also derived from an ancient language or dialect spoken in that area. If you check out the german map with the suffixes -rode, -witz, or -ing you will notice that these settlements are located in a certain area only.
Geonames
To create maps of this kind we need data about populated places and their geographic location. There is such a list on the internet on the site geonames.org. It contains places from all around the world. These are not only populated places but also landmarks, historic sites etc. Geonames offers a database and exports by country that can be downloaded. I went to export dump page that contains a zip file for each country.
I downloaded the ZIP for Germany. After unzipping it there is a big text file with fixed fields (tab separated) and a readme file that describes the fields and their possible content.
<?php
$inFile = $_SERVER['argv'][1] ?? null;
$outFile = $_SERVER['argv'][2] ?? null;
$usage = 'Usage: php ' . basename($_SERVER['argv'][0]) . ' infile.txt outfile.json';
if (empty($inFile)) {
echo "No input CSV file from Geonames provided.\n$usage\n";
exit(1);
}
$fp = fopen($inFile, 'rb');
if (!$fp) {
echo "Could not open file $inFile.\n$usage\n";
exit(1);
}
if (empty($outFile)) {
echo "No json file provided.\n$usage\n";
exit(1);
}
$data = [];
while ($row = fgetcsv($fp, 4096, "\t")) {
// http://download.geonames.org/export/dump/readme.txt
// we want populated places only
if ($row[6] !== 'P') {
continue;
}
// also exclude some place codes like historic sites or places that are
// part of a larger administrative division.
// http://www.geonames.org/export/codes.html
if (\in_array($row[7], ['PPLX', 'PPLW', 'PPLQ', 'PPLH', 'PPLF'])) {
continue;
}
$data[] = [
$row[0],
$row[1],
(float)$row[4],
(float)$row[5],
$row[7],
$row[10],
$row[11],
$row[12],
$row[13],
(int)$row[14],
];
}
fclose($fp);
file_put_contents($outFile, json_encode($data));
This PHP script helps us to shrink the original TSV file by reducing the data to
populated places only and also convert the result into a json format. From the readme
we know that in field 7 the classification is set. We are interested in populated
places only, therefore, the field must contain a P. In field 8 the specification
of the feature class is set. By looking at this field we are able to filter out all
rows that contain historical or abandoned places that don't exist any longer.
Web Application
To do a live search I created a Web Application to query the data and visualize the results in a map. The core of the application looks quite the same as the Theme maps app. We have a map and a feature data as geojson and use the d3.js library to create a map with the locations.
The query is done on the fly by entering a search string and the search mode. With this information the geonames data that we have extracted into the geojson is queried. This is done by an ajax call to the following PHP script:
<?php
const DIR_LEFT = 1; // Seach for the query at the beginning of the string.
const DIR_RIGHT = 2; // Search for the query at the end of the string.
const DIR_MIDDLE = 3; // Search for the query in the middle of the string.
$query = mb_strtolower($_REQUEST['q']);
$qlen = mb_strlen($query);
$dir = (int)$_REQUEST['d'];
$iso = preg_match('/^[a-z]{2}$/', $_REQUEST['c']) ? $_REQUEST['c'] : '';
$boundary = (int)$_REQUEST['b'];
// The file where to load the populates places from geonames.org.
$jsonFile = $iso . '.json';
// The list of selected items from the json file above.
$result = [];
// Send a header for the json response because in case of an error we send
// and empty array.
header('Content-type: application/json');
// Nothing to query or no file to query.
if ($qlen === 0 || !file_exists($jsonFile)) {
echo '[]';
exit;
}
// Something might be wrong with the data.
$data = json_decode(file_get_contents($jsonFile, true));
if (!is_array($data)) {
echo '[]';
exit;
}
// Run over all entries and check the name of the settlement.
foreach ($data as $row) {
if ($dir === DIR_LEFT) {
if (mb_strpos(mb_strtolower($row[1]), $query) === 0) {
if ($boundary === 1 && mb_strlen($row[1]) > $qlen) {
$b = mb_substr($row[1], $qlen, 1);
if (preg_match('/\w/i', $b)) {
continue;
}
}
$result[] = $row;
}
continue;
}
if ($dir === DIR_RIGHT) {
$sub = mb_substr(mb_strtolower($row[1]), -1 * $qlen);
if ($sub === $query) {
if ($boundary === 1 && mb_strlen($row[1]) > $qlen) {
$b = mb_substr($row[1], -1 * ($qlen + 1), 1);
if (preg_match('/\w/i', $b)) {
continue;
}
}
$result[] = $row;
}
continue;
}
$pos = mb_strpos(mb_strtolower($row[1]), $query);
if ($pos !== false && (mb_strlen($row[1]) - $qlen > $pos) && $pos > 0) {
if ($boundary === 1) {
$match = [];
$words = preg_split('/\b/', $row[1]);
foreach ($words as $word) {
if (mb_strtolower($word) === $query) {
$match = $row;
break;
}
}
if (!empty($match)) {
$result[] = $match;
}
continue;
}
$result[] = $row;
}
}
echo json_encode($result);
We simply load the geojson file that contains the list of villages into the memory, decode the json to have a large array and lookup each element whether the name matches the given query string, depending on whether we look for a prefix, suffix or substring in the middle. The search is not very sophisticated, when looking up the suffix -burg names like Rothenburg ob der Tauber are not found while a place called Schwarzach bei Nabburg is matched although this shouldn't be the case. The latter should rather match when the suffix -ach is queried. The search algorithm needs some improvement to be more exact.
Also, the data might not be 100% accurate because the filter did not work exactly. While transferring the data from the geonames file into the geojson, I may have excluded or included some rows that were no supposed to be there. In addition, the data at geonames.org could be incorrect. However, I estimate this not as a major impact.
To be able to verify the result set, below the map is a list of the results together with additional fields from geonames.org such as the administrative codes. Also, the result can be rendered within an OSM map so that the user can check what OSM contains at the location of the feature points. In general, you can get an idea where names are distributed within the country and see some nice patterns.