Tools and background on fusing GPX files for the purpose of correcting missing parts of the track. Especially useful for mending a Strava activity when you have stopped for a snack, and forgotten to restart your GPS tracking device - and you are with someone who has not done this! This seems to happen to me all the time.
Note: this file may use LaTex support recently provided by github. Sometimes these equations do not render, and you will see strange things with backslashes and odd looking bits (raw LaTex). If this is the case, reload the page in your browser.
For development, on windows, after installing python 3.11, I used:
py -3.11 venv venv
.\venv\Scripts\Activate.ps1
python -m pip install -U pip setuptools wheel
python -m pip install -e .
Once installed, there are two scripts which can be run in linux or OSX without any python invocation. These two scripts and their algorithms are described below and, for example, can be invoked from the command line in the repo home directory as:
patch_gpx_time data/Calero_Mayfair_ranch_trail.gpx data/Calero_big_ride_2.gpx test_patch_time.gpx
patch_gpx_spatial data/Calero_Mayfair_ranch_trail.gpx data/Calero_big_ride_2.gpx test_patch_spatial.gpx
also note that both of these scripts respond usefully to the –help argument. On Windows, one can omit the shebang-decorated scripts above and invoke the python directly - assuming this is done after the installation step above:
python patch_gpx_time.py data/Calero_Mayfair_ranch_trail.gpx data/Calero_big_ride_2.gpx test_patch_time.gpx
python patch_gpx_spatial.py data/Calero_Mayfair_ranch_trail.gpx data/Calero_big_ride_2.gpx test_patch_spatial.gpx
A further refinement is that unit tests are provided with the plots (in particular, plt.show()) off by default. This assures they will run on OSX, but it is also useful on linux or Windows to set some or all of the
do_plots=False
test default arguments to True to generate plots on the fly - and write some of them to disk.
Running tests can be accomplished from the
python -m unittest test_dtw.py
python -m unittest test_patch_gpx_spatial.py
python -m unittest test_patch_gpx_time.py
Scenario: a Strava activity shows that you have missed a section of your route. Examples are that you have forgotten to restart your GPS device after a break at a nice park bench, or that you have forgotten to start your GPS device at the beginning of an activity. However, you were with someone else, and chances are they are not absent-minded in exactly the same way you are. You need to patch the gap(s) in your activity using their data. In order to do so, you use the ‘export GPX’ functionality of strava to generate a GPX file of your activity. In order to get someone else’s activity, you ask them to export their activity as a GPX file (to insure complete data). Afterwards, you use tools provided here to generate a new - patched - GPX file. You upload this patched file to Strava and delete your original activity.
For the purposes of repairing Strava activities, there are several design preferences:
An algorithm which only uses location data (latitude, longitude and elevation) is patch_gpx_spatial:
The results of a DTW alignment (via the PyPI dtw-python package) are shown in Figure 1. DTW uses the Euclidean distance matrix between all points in the reference and query trajectories to build two indices - the alignment. The contents of these indices are plotted as a point along the x (query) and y (reference) axes of the main plot in Figure 1. The two subplots along the x and y axes are the local coordinates (mean corrected) in meters - corresponding to latitude (blue), longitude (orange) and elevation (green). Note the reference index is roughly 4X longer than the query - the GPS device is sampling much more often in the reference data. The vertical jump in the indexes at around 1700 in the query is the offending section of the trajectory to be repaired.
Some experimentation with this process with real GPX data (see below), suggests that deciding on when the reference trajectory points are missing from the alignment merely by looking at the alignment index is problematic and results in fidgety identification of missing regions of one trajectory. Part of the problem is that GPX trajectories can have very different spatial sampling - due to device settings - and this smears out the beginnings and ends of missing regions. One solution is to choose a distance threshold - and find the connected regions in the alignment where the two trajectories exceed this threshold. Then, for each connected region, choose to insert the reference points in the output only if the reference sampling is more than the query sampling in that region. Finally, patch_gpx_spatial is not the tidiest solution in that it does not include so-called extension data for any track points. This means that data like temperature and heart rate - even for the query - are not included in the output. This could be added fairly easily. The patch_gpx_time implementation below faithfully copies query extension data to the output.
A second, much simpler algorithm is to simply find missing time intervals in the query trajectory and insert reference trajectory data present in those missing time intervals into the corrected output. This is slightly more complex due to possible data overhangs at the ends. The algorithm is patch_gpx_time:
The typical usage model for the patch_gpx tools is to mend GPX files obtained from Strava - when I am using this, I download my own GPX file of a damaged activity, and then ask whomever I rode or ran the activity with to send me the GPX file of their corresponding activity. Although Strava settings can be adjusted, I have found that typically a complete GPX file is only produced by the owner of the activity on Strava.
An example of the repair of an activity at Calero County Park in California is shown in the following links (I suggest you open it in another tab or window so that you can read along with the comments here. At the time of writing it did not seem possible to get the README.md rendered page to render this interactive folium map). The first link uses the patch_gpx_spatial algorithm:
and the second link uses the patch_gpx_time algorithm:
In this example, there are three routes plotted. My route, the query, is in dashed red. My friend’s route (provided here with his permission) is dashed blue. The corrected route is solid green. Note you should be able to zoom in and out on this map/route in your browser. You should see that the route is almost always showing the red dashes and the green together - recall the query (my route) is preferred. The primary fix is in the section near Bald Peaks - where I forgot to restart my device at the snack stop at the top of the climb up Longwall Canyon Trail. In that section the fixed route tracks my friend’s ride along Bald Peaks Trail.
Note the two algorithms give very similar results for this data, but there is a peculiar difference in the trajectories at the beginning and end (at the Rancho San Vicente Entrance Parking Lot). This probably needs to be investigated a bit.