What is the KDE+ method?
The KDE+ method performs cluster analysis of traffic crashes (or other point events) within a network (road, railway...). It extends the kernel density estimation (KDE) by statistical significance testing and allows for the ranking of the resulting significant clusters.



What kind of inputs are needed?
All the data which are needed for the analysis performed by the KDE+ method are:

  1. XY position or stationing of traffic crashes (or other point events) on the sections. The crashes (point events) which occur in intersections (junctions) should be excluded.
  2. The network consisting of the sections (it is assumed that traffic volume is more or less constant in space within a single network section)










Is road segmentation necessary?
Road segmentation is not needed to apply the KDE+ method. In fact, we discourage from segmenting a road prior to the application of the KDE+ method, because it can distort results. For example, it can divide a hotspot. The sections of road network has to be divided between intersections, where traffic volume is changing.

Mitigation is only cost-effective at places where clustering of traffic crashes occurs due to a local factor. Other traffic crashes on a road segment represent noise with respect to the spatial pattern of traffic crashes, i.e. originating as a result of global causes (Fig. a). The second example (Fig. b) indicates road section segmentation which caused that all three segments have the same number (4) of traffic crashes. The segmentation system itself can therefore influence inputs to a model and thus the final results.















Which attributes of resulting clusters are important for me?
ID_clus     - ID of the cluster
ID_line     - ID of the line section on which the cluster is located
NPts_clus     - number of points within the cluster
NPts_line     - number of points on the line section on which the cluster is located
Strength     -  a relative number which measures the degree of violation of the null hypothesis (uniform distribution of traffic crashes along the road section); cluster strength is important for individual drivers, it represents the individual risk
Clus_from     - relative position of the cluster start point on the section
Clus_to     - relative position of the cluster end point on the section
Len_clus     - length of the cluster
Len_line     - length of the line section
Dens_Point     - density of points within the cluster per 100 m
Str_Dens2 = Strength*Dens_point^2 (collective risk) - a measure of collective importance of a cluster
GStr - global strength - suitable for reducing the false alarm rate, possible miss of less important clusters




Should I use strength or collective risk to order resulting clusters?
It is important to consider both individual risk (represented by the cluster strength) and collective risk. Kernel density estimation (the blue curve) highlights places where a traffic crash is the most likely to occur within a road. On the other hand, number of traffic crashes within a road reflects the dangerousness of the road as a whole (how frequently traffic crashes occur) and it is related to exposure (in the form of number of possibilities for traffic crash occurrence). Thus, collective risk of a cluster depends on the cluster strength and number of traffic crashes per 100 m. (Favilli et all., 2018)

Example

  1. Low cluster strength (relatively low number of traffic crashes within the cluster compared to the number of traffic crashes within the whole road segment) + low number of traffic crashes within the road segment.
  2. High cluster strength (relatively high number of traffic crashes within the cluster) + low number of traffic crashes within the road segment.
  3. Low cluster strength (relatively low number of traffic crashes within the cluster) + high number of traffic crashes within the road segment.
  4. High cluster strength (relatively high number of traffic crashes within the cluster) + high number of traffic crashes within the road segment.

Ordering according to the cluster strength: 4, 2, 3, 1 (2 has greater individual risk than 3, because 3 has greater exposure).
Ordering according to the collective risk of a cluster: 4, 3, 2, 1 (3 has greater collective risk than 2, because there are fewer traffic crashes on 2 than on 3).


Recommendations and restrictions
It is recommended to exclude the point events located at intersections when analyzing traffic crashes in general. The reason lies in the fact that intersections are typically dangerous places by definition. If they are not (for example in the case of animal-vehicle collisions), there is no need to exclude events located at intersections.

The original restriction of network sections shorter than 200 meters (mentioned in Bíl et al., 2013) doesn’t apply from version 2.0. Therefore, also short sections can be analyzed using KDE+.


Global test (global threshold)
-        applied to a road as a whole, not at every location on the road
-        resulting global threshold is always higher than the original (local) threshold
-        cannot precisely identify the extend of a hotspot
-        suitable for reducing the false alarm rate
-        possible miss of less important clusters
If we want to focus only on identification of several most dangerous hotspots, we can proceed as follows. First, we identify and localize significant clusters according to the local threshold. Afterwards, we check the results of the global test. For filtering out false alarms from resulting clusters set the “GStr” > 0.



Border bias




A detailed spatiotemporal analysis

Since hotspots (statistically significant traffic crash clusters) evolve over time, it is meaningful to perform a spatiotemporal analysis.

An approach, based on the KDE+ method, which is capable of evaluating spatiotemporal behavior of traffic crash hotspots in a high detail was introduced in Bíl, M., Andrášik, R., Sedoník, J., 2019. A detailed spatiotemporal analysis of traffic crash hotspots. Applied Geography 107, 82-90. This approach can be utilized in research focusing on the spatiotemporal evolution of crash patterns within a road network. Practitioners can use it as a tool allowing for a retrospective analysis of the efficiency of safety measures.

A sketch of the three elementary forms of hotspots in relation to their temporal behavior is shown
(hotspot 1: disappearance, hotspot 2: stability, hotspot 3: emergence).