KDE+ research news

Global test (global threshold)
-        applied to a road as a whole, not at every location on the road
-        resulting global threshold is always higher than the original (local) threshold
-        cannot precisely identify the extend of a hotspot
-        suitable for reducing the false alarm rate
-        possible miss of less important clusters
If we want to focus only on identification of several most dangerous hotspots, we can proceed as follows. First, we identify and localize significant clusters according to the local threshold. Afterwards, we check the results of the global test. For filtering out false alarms from resulting clusters set the “GStr” > 0.


Border bias




What is the KDE+ method?
The KDE+ method performs cluster analysis of traffic crashes (or other point events) within a network (road, railway...). It extends the kernel density estimation (KDE) by statistical significance testing and allows for the ranking of the resulting significant clusters.


What kind of inputs are needed?
All the data which are needed for the analysis performed by the KDE+ method are:

  1. XY position or stationing of traffic crashes (or other point events) on the sections. The crashes (point events) which occur in intersections (junctions) should be excluded.
  2. The network consisting of the sections (it is assumed that traffic volume is more or less constant in space within a single network section)


Recommendations and restrictions
It is recommended to exclude the point events located at intersections when analyzing traffic crashes in general. The reason lies in the fact that intersections are typically dangerous places by definition. If they are not (for example in the case of animal-vehicle collisions), there is no need to exclude events located at intersections.

The original restriction of network sections shorter than 200 meters (mentioned in Bíl et al., 2013) doesn’t apply from version 2.0. Therefore, also short sections can be analyzed using KDE+.
 

Is road segmentation necessary?
Road segmentation is not needed to apply the KDE+ method. In fact, we discourage from segmenting a road prior to the application of the KDE+ method, because it can distort results. For example, it can divide a hotspot. The sections of road network has to be divided between intersections, where traffic volume is changing.


Which attributes of resulting clusters are important for me?
ID_clus     - ID of the cluster
ID_line     - ID of the line section on which the cluster is located
NPts_clus     - number of points within the cluster
NPts_line     - number of points on the line section on which the cluster is located
Strength     - (individual strength) a relative number which measures the degree of violation of the null hypothesis (uniform distribution of traffic crashes along the road section); cluster strength is important for individual drivers, it represents the individual risk
Clus_from     - relative position of the cluster start point on the section
Clus_to     - relative position of the cluster end point on the section
Len_clus     - length of the cluster
Dens_Point     - density of points within the cluster per 100 m
Str_Dens2 = Strength*Dens_point^2 (collective strength) - a measure of collective importance of a cluster; this measure represents the collective risk
GStr - global strength - suitable for reducing the false alarm rate, possible miss of less important clusters


Should I use strength or collective strength to order resulting clusters?
It is important to consider both individual risk (represented by the cluster strength) and collective risk. Kernel density estimation (the blue curve) highlights places where a traffic crash is the most likely to occur within a road. On the other hand, number of traffic crashes within a road reflects the dangerousness of the road as a whole (how frequently traffic crashes occur) and it is related to exposition (in the form of number of possibilities for traffic crash occurrence). Thus, collective strength of a cluster depends on the cluster strength and number of traffic crashes per 100 m.

Example

  1. Low cluster strength (relatively low number of traffic crashes within the cluster compared to the number of traffic crashes within the whole road segment) + low number of traffic crashes within the road segment.
  2. High cluster strength (relatively high number of traffic crashes within the cluster) + low number of traffic crashes within the road segment.
  3. Low cluster strength (relatively low number of traffic crashes within the cluster) + high number of traffic crashes within the road segment.
  4. High cluster strength (relatively high number of traffic crashes within the cluster) + high number of traffic crashes within the road segment.

Ordering according to the cluster strength: 4, 2, 3, 1 (2 has greater individual risk than 3, because 3 has greater exposition).
Ordering according to the collective strength of a cluster: 4, 3, 2, 1 (3 has greater collective risk than 2, because there are fewer traffic crashes on 2 than on 3).