Reflexions, the University of Liège website that makes knowledge accessible


Mapping crime in cities

10/9/15

A young researcher at the University of Liège has recently developed an interactive digital model to map and analyse information in a crime-related data warehouse according to a multidimensional approach.  This type of model is known as SOLAP ("Spatial OnLine Analytical Processing"): a classic tool in geomatics, a discipline that's a cross between geography and computer science. But this type of server is usually used to generate vector maps that can only show  aggregated values in a discrete, or discontinuous, space. Thanks to the use of a raster method, which develops maps with the help of pixels and not vector entities (points, lines, polygons), this new model allows the integration of a continuous space  on a map that is subsequently more accurate and faithful to the user’s expectations. Faced with the onslaught of digital information, tools such as this are essential to detect and analyse what is hidden in ever-vaster databases. Moreover, this model isn’t limited to crime and could be used for other areas such as ecology, climatology and epidemiology.

At the border between geography and computer science is geomatics. A recent and still little-known discipline, which is suitably varied nonetheless. It is involved in the acquisition of spatial data, whether in the field (total station or 3D scanner), by satellite imagery or aerial photography (photogrammetry and remote sensing). The data acquired supply spatial databases (or geographic information systems) and allow an in-depth analysis of the territory. Another more familiar aspect of geomatics is now a cherished part of our daily lives: GNSS (Global Navigation Satellite System). Thanks to a constellation of satellites, GNSS is the branch at the root of all the GPS technologies found in cars, drones, aircraft, and telephones, and enables the continuous improvement of tools as famous as Google maps.

Jean-Paul Kasprzyk, a geomatician who recently defended his doctoral thesis (1), and an researcher in the University of Liège’s Geomatics Unit, has specialised in the development of tools linked to databases. "The world of databases is divided into two major branches", he summarises. "On the one hand there are transaction databases, aimed at the man in the street. These are tools that allow us to look for relatively precise information, such as a travel itinerary. And on the other hand, there is business intelligence, which takes into account various pieces of information and more complicated statistical calculations, permitting the aggregation of a multitude of data to help a restricted group of people take decisions. Business intelligence therefore develops tools that help public authorities, CEO’s, etc., to embrace the realities and large amounts of complex data, and act accordingly."

The subject of Jean-Paul Kasprzyk’s thesis relates to the latter. "Whatever the profile of the decision-makers, whether they’re private or public, as soon as they have to take digital information into account, they may be faced with such huge datasets that are impossible to interpret.  To analyse them, they must be summarised and therfore aggregated." A means of aggregating these data is to use an OLAP server (OnLine Analytical Processing), a digital interface linked to a data warehouse that allows users to understand them in a user-friendly and intelligible way. An OLAP can cover all sorts of databases. However, when it integrates computer tools linked to the spatial aspect (GIS, or geographic information system), OLAP turns to SOLAP (Spatial OnLine Analytical Processing).

Organising large data warehouses

London beaks in mapAnd a new SOLAP is exactly what Jean-Paul Kasprzyk was responsible for modelling. While the prototype currently seems to be able to adapt to a variety of demands, it was initially optimised for a very precise task: the management of a multidimensional data warehouse listing the crimes and offences committed in London in 2012. Its multidimensional nature refers to the possible distinction and comparison between different types of criteria, such as the type of offence (burglary, robbery, shoplifting, etc.), its location (street, neighbourhood, district, etc.), the time (month, for instance), or the profile of the perpetrators and the victims. As for choosing the capital of England, the reason is quite straightforward.  "In the beginning, I was supposed to work on a Belgian database, in partnership with the federal police", the young researcher remembers. "And some of the people I was in contact with are still very enthusiastic about my work. But I was faced with structural problems and confidentiality issues. London, on the other hand, provides free access to this type of data, up to a certain level of detail. I was therefore able to easily make an inventory of a whole series of data - and there was already a huge amount - for 2012 alone. It was sufficient to start to develop the SOLAP." London isn’t the only city to provide this type of data. The police in Seattle also releases a great deal of information, which the geomatics specialist was equally able to integrate in the warehouse.

The usefulness of such a programme soon becomes obvious. "Take, for instance, the commissioner of London's police force, who has to allocate the patrols at the beginning of February in an effort to reduce crime to a minimum. To assist him in his task, he uses a database that lists all the past offences. He can relatively quickly obtain a map of London that shows the spatial distribution of crime for the month of January and base himself on the assumption that the distribution will be similar in February." But there’s a slight problem for the head of Scotland Yard. In 2012, approximately 1.2 million offences were listed for London alone, which makes almost 100,000 for the month of January. "A simple spatial distribution of the crimes committed, where every offence is represented by a dot on a map, is illegible."

Hence, SOLAP allow you to gather and summarise these values according to what it is you want to know. "On a vectorial level, I reoganised these offences by aggregating them on an entity basis, which, in this case, represents the different police sectors in London. The colour of these polygons varies according to the number of dots they contain. This gives you a denstiy of offences aggretated in a discretized space." But this vector map only serves to illustrate the use of SOLAP in general. Because Jean-Paul Kasprzyk wasn’t interested in the vector mode, but sought to integrate a continuous space in this type of model, using the raster method.

From vector to raster for spatial continuity

"The main problem with vectorial mapping techniques", the geomatics specialist points out, "is that they bias the values you’re trying to define. For instance, the purpose of this list of break-ins is to locate hotspots; i.e., places with a higher concentration of crime. An analysis that will then allow the user to decide where best to deploy the police patrols, and therefore better prevent crime. However, vector maps have a geometrically frozen discrete space, following a random decision, in this case, the distinction of the police sectors." Therefore, the form of the hotspots shown on the map are influenced by these borders, whose outline is independent of the crimes. Consequently, areas with a low crime rate can be part of a section with a high concentration of crime, and appear on the map as hotspots, and vice versa. Hence the aim to integrate a continuous space in the model.

Vectorial raster map

Police services are already favouring this type of more accurate map. For this purpose, they use a particular algorithm, KDE (Kernal Density Estimation). "Initially, the offences are represented by a cloud of points. These points are discrete values. To integrate them into a continuous space, you have to smooth them, which is what this algorithm does. More precisely, it sweeps a territory and in every pixel of a raster, it generates a relative value that depends on the number of crimes over a given time, and their proximity in relation to the pixel." Ultimately, the algorithm gives each pixel a colorimetric variation according to the density of crime. The data is aggregated at pixel level and no longer depends on artificial borders, but on their actual location.  The map can be more or less precise, according to the resolution of the pixels, but also the smoothing window. "The bigger this windrow, the smoother the surface will be over a large distance. Therefore, there will be few hotspots, which will be quite big. The information is less precise but this can be useful if you want to identify the main risky areas of the city (global analysis). On the other hand, the smaller the window, the more precise the resolution will be, and a lot of small hotspots will appear. The map will be very precise, but the data will be spatially less aggregated (local anlaysis)." Hence, there is a whole series of parameters that have an impact on the visual aspect of the map. The important thing here, as in many fields, is to find a happy medium to obtain a useful image.

When police techniques draw inspiration from geomatics

SOLAP Data

Using KDE, it is therefore possible to generate a continuous space by smoothing discrete phenomena to determine relative values on a map. It is this methodology used by the police that Jean-Paul Kasprzyk integrated into the multidimensional SOLAP. An innovative cross. "The existing SOLAP use the vector model, which  allows the spatial aggregations only through pre-defined discrete objects (police sectors, for instance), since it is the programmer’s responsibility to define every entity separately, which preserves the pixel as a spatial unit. That’s why the maps generated in vector mode don’t take into account the space in a continuous manner." SOLAP is a recent discipline, created in Canada in 1997. At the time, researchers conceived the structure of spatial data warehouses in vector mode, because the technology is lighter than raster, and the results were already more convincing. However, to maintain user-friendliness and an interest in using SOLAP, the fluidity and therefore the speed of calculation, which depends on the amount of information on the server, remains one of the main priorities. A clear advantage of vector over raster. "We’ve only recently realised the limits of the vector format for everything concerning the study of spatially continuous phenomena, such as pollution or the climate, or, within the framework of this research, the variation in the crime rate within a town.  Furthermore, the continuous space modelled by raster offers the user more freedom when they want to include new geographic entities in their analysis. These entities must no longer be defined in advance in the data warehouse since they can be reconstructed on the fly using a set of pixels stored in the system."  

An interactive and multidimensional model

There is free access to the interface on the internet (http://nolap01.ulg.ac.be/rastercube). However, it is necessary to fill in a short registration form first. The user can then select a data set (certain types of crimes, certain months of the year, etc.) and then ask to generate a map that will aggregate them. They can also consult a series of graphs that provide information in the form of figures, such as the variation in crime over several months, etc.

Besides raster technology, a highly original aspect of the research is the multidimensional nature of the continuous analysis. "Usually", Jean-Paul Kasprzyk explains, "databases structure information in the form of tables storing lists of saved views. Here, the multidimensional character results from the fact that SOLAP functions with data hypercubes. It extracts data from the warehouse which it then arranges in several dimensions, several axes of analysis. This allows us to slice into the cube, to limit ourselves to one type of offence, for instance. We can also drill. Rather than aggregating information per month, it can be on a quarterly basis, etc. We can also play with the resolution of the map, with the number of pixels. Every operation will reveal different information, and will therefore depend on what the user is looking for." »

Multidimensional SOLAPFor now, the prototype isn’t used by the federal police. But it has only just been presented, and its future is still to be written. And even though it might not have a direct application, it opens the way for new methods, a whole new way of thinking in the approach of these GIS, which discreetly help us every day. Moreover, besides crime in Seattle and London, it can already offer other datasets. Essentially to demonstrate its adaptability, it can also show variations in temperature on the surface of the moon. Given the complete absence of offences on this satellite, the operation simply proves that the model isn't limiting to fighting crime!

(1) Integration of spatial continuity in the multidimensional structure of a data warehouse - raster SOLAP. University of Liège, doctoral thesis in science. http://hdl.handle.net/2268/182360


© Universit� de Li�ge - https://www.reflexions.uliege.be/cms/c_399766/en/mapping-crime-in-cities?printView=true - April 24, 2024