The images below are from a wing session last year. Of course, the Motions had the benefit of being on my biceps, rather than wrists. The differences can be attributed to measurement errors (several possible causes) and arm movements
Did you have the watches on the same arm, or on different arms?
Can we respectfully request that we refrain from doing this sort of individual file analysis online in the forum please? It is better done slowly and carefully, with time for team consultation, and when not feeling under pressure to reply immediately.
Please just send the files to info@gpsteamchallenge.com.au and we will do our best to analyse and compile the results among the team thoroughly, and then reply/report on it as appropriate.
I disagree. I find this kind of analysis quite interesting, and would very much prefer to see it here on the forum. The analysis in a closely knit group of experts has some serious limitations (which are reduced by K888's detailed posting of results). And it completely excludes others who might be interested.
One example of these limitations is the handling of missing points. The vast majority of "repeated" points in Garmin are in fact missing points. They show up in lots of sessions, and sometimes at a very high frequency. Just a few years ago, a much lower occurrence of missing points was quoted as a reason to remove a device from the list of approved devices.
I have seen at least one instance where missing points increased the observed different to Motion data for the top 2 second run by about 0.4 knots. Interpolating missing points pretty much eliminates this error increase in this case, and is expected to do so in many (but not all) cases.
But, of course, you can just keep all your analysis "private", and then bless us with your final decisions. That, however, reduces my motivation to adapt Speedreader to just about zero. Maybe I should make Speedreader private again, too.
Can we respectfully request that we refrain from doing this sort of individual file analysis online in the forum please? It is better done slowly and carefully, with time for team consultation, and when not feeling under pressure to reply immediately.
Please just send the files to info@gpsteamchallenge.com.au and we will do our best to analyse and compile the results among the team thoroughly, and then reply/report on it as appropriate.
No rush from me. I sent the files through and was surprised to see such a quick response. I did post here with the KA72 results for each, but I was not trying to rush you guys. I just looked at it myself and found it interesting so posted it while they were on ym screen. For me the end game is getting the 965 approved. Take your time with going through the individual files I send.
It looks like currently the 965 is performing similarly to the 255 in speeds and position, but the 965 performs better in regards to the timing issue that causes repeated position points. The activity that I sent through seems to have picked up a couple of edge cases (alpha on the border of the circle limit) which may have confused the results. In another thread @tonyd said he had sent through some 965 comparison files some time ago. Have those files been investigated and were the results similar to mine?
Regarding the jitter. It seems much worse on my bike than your example (up to around 1 knot). But I can't say that was not due to mounting on the bars causing vibration so will put it down to that. Does the doppler speed include vertical components as well as horizontal. If it is picking up vertical speed, that would not be helpful for use in windsurfing but would average out if averaged over several samples... but would make the graphs look as I have observed with jitter. The Garmin graph I referred in red shows no decelerations over the accelerating down a hill. So perhaps Garmin has some filtering or somehow isolates only the horizontal component.
Agreed, both devices have unknown errors but the Motion has greater precision. As way of an illustation, I've previously compared two Motions and two Garmins to see how the reported speeds differ throughout a session. The images below are from a wing session last year. Of course, the Motions had the benefit of being on my biceps, rather than wrists. The differences can be attributed to measurement errors (several possible causes) and arm movements.
The graphs you had posted seem to support your statement that "the Motion has greater precision". However, there are multiple problems with your post (even if we assume that the reader understands the difference between precision and accuracy). The primary issue with the data you show is that it's from a winging session, with the Motions being on your upper arms, and the Garmins being on your wrists. Anyone who has ever pumped or jibed while winging knows that arm movements are often independent when winging, and that the wrists move a lot more than the upper arm. While not explicitly stating it, y\our statement is leading towards the differences being from "measurement errors (multiple causes)" rather than arm movements.
I have repeated your experiment in a way that completely eliminates any arm movements - driving around. I compared two Garmin 255 watches (one Music, one not) and two ESP32 loggers, all placed right next to each other on the dash board. Here are the results, in a similar graph to the one you showed:
In this example, the curves look quite similar. The Garmin 255 curve has a slightly broader center (also seen in the 1st-3rd quartile range), while the ESP32 data show more outliers and this a bigger range. Here's an example of a 2-second region with a difference of > 0.4 knots in speed over 2 seconds from the ESP32 units:
As is typical for regions where the two units deviate, the error is directional (all speeds are lower in one unit), meaning that a higher sampling rate would not reduce the actual error (but it would report a lower error estimate!).
When doing "all point" comparisons of wing data, one has to be very careful, since winging often includes rapid arm movements. For those who have not seen this in GPS tracks, here's an example:
This is from a Motion recording at 5 hz. The spikes are from pumping to get on the foil. Most likely, the Motion was on the upper arm, and recordings from a wrist-mounted unit would show even larger spikes than the 2-3 knot drops in 0.2 seconds seen here.
There was a fair bit of discussion prior to me being offline yesterday. I will share some thoughts in seperate posts, including some genuine windsurfing data from my last speed session in Portland Harbour.
Regarding the question about 2D speeds, everything is initially calculated in 3D (relative to the earth's centre of mass), then converted into 2D (lat + lon + speed) using an ellipsoid model (typically WGS-84) and a primitive geoid for elevation / altitude. Some activities on Garmin watches have a 3D distance + 3D speed option which also factors in changes in altitude. The GNSS chipset is sometimes capable of outputting vertical speeds, either via their binary protocol, or custom NMEA sentences. The 2D speed is referred to as speed over ground (SOG), which is tangential with the ellipsoid.
Regarding filtering. Noisy sensors (including GNSS) are typically passed through a Kalman filter to smooth the output. There are lots of videos (and books) describing the topic but the general principle is to combine actual measurements with predictions. Given an understanding of the current state (time, position, velocity, acceleration, etc) make a prediction for the next epoch (e.g. 1 second into the future). When you get to that point in time, use the latest satellite observations to calculate the current position + velocity + time (PVT) and then combine them with the previous prediction to produce your best estimate. In principle the combined results are less noisy and more trustworthy than either the calculation from raw measurements, or the prior prediction alone. This is a massive over-simplification and I've not described dynamic models, underlying stats or maths, but hopefully it gives some idea as to what the Kalman filter is doing. It's a combination of measurements and predictions. This is all done natively within the GNSS chipset.
On a related note, I once read a post by a GNSS engineer describing the internals of the SiRFstar III (used in GT-31) as updating the tracking of doppler shift @ 10 Hz internally. The Kalman filter could only output a PVT solution at 1 Hz, which is what we got on the GT-31. The SiRFstar IV chipset also updates the Doppler observable at 100 ms, but the Kalman filter can output every 200 ms (5 Hz) as we saw on the GW-52 and GW-60. The Airoha (and MediaTek ancestors) are able to output PVT solutions at 10 Hz. Based on all of the aliasing tests that I've performed, 1 Hz outputs appear to be using a split-second speed and not using interim Doppler measurements. Despite the Kalman filter you can still see aliasing artefacts because only one Doppler measurement is used per second. I tried to describe aliasing in simple terms at logiqx.github.io/gps-details/general/aliasing/
The likes of COROS and Garmin also apply their own filtering / smoothing they deem suitable for the sport / activity. I'm 99.9% sure the windsurfing / kitesurfing / other speeds being saved in Garmin FIT files are the Doppler-derived speeds produced by the GNSS chipset. There is however a Garmin filter which snaps speeds under 0.2 m/s (circa 0.4 knots) to zero.
The post of winging data wasn't meant to diminish the significance of arm movement. The original analysis of that specific dataset was simply to give an insight into the real-world performance of the Garmin watches when winging, recognising that a large part of any differences will be due to arm movements.
This happens to a much lesser degree when windsurfing but it is still significant. I'll show some actual windsuring data in the next post.
Whilst testing + evaluating the mini motions for Weymouth Speed Week, I always sailed with 4 mini motions - 2 on each bicep. I was very impressed with just how consistent they were with each other on the same arm and how they could pick up the arm differences when winging (see screenshot).
You can see how the motions on the same arm track each other very closely, yet the two arms differ in places. The differences between the two arms appear to be due to independent arm movements. These tests were from minis worn on the biceps, so a much smaller range of movement than the wrists.
Looking at some SUP data was also quite interesting. These charts were used as baselines for some aliasing investigations - wearing on the bicep vs wrist. 0815 (red) was on the board and 0815 (blue) was on my arm. Initially 0815 was on my bicep then moved to the wrist, just above my watch.
Whilst on the bicep the speed was comparable to the board speed, but with a slight difference in phase.
When the motion was on my wrist you can even spot the difference between paddling on different sides (upper hand vs lower hand).
All of these charts are simply to illustrate how sensitive our GNSS receivers are to movements. This is data that I have to hand and may be interesting to some people. The data in this post is not supposed to represent the dynamics of windsurfing, just share some general observations.
Now for some actual windsurfing data...
Regarding testing, I tend to do 3 types of testing where possible - static testing (leave devices in an open area for several hours), driving (non-challenging dynamic testing) and real-world testing (windsurf / windfoil / wing). Real-world testing uses the most ideal (but still practical) method of mounting / wearing the device. Secure on the bicep produces great results with the Motion, but the wrist is obviously most practical for a watch. I have no doubt that two watches would produce more consistent results if they were worn on the bicep, but that isn't very practical.
I always wear two motion minis as my references devices. This makes it easy to spot any outliers and allow for an evaluation as to whether the Motion results can be trusted as the baseline. Visual inspection of the data always usually shows bigger differences in my watches than the Motions. A visual inspection of the entire track is par for course, along with a comparison of the top 5 results for all significant categories. Visual inspections will usually cause me to pick up any anomalies (and occasional spikes) which don't appear in top 5 results.
My previous reference to errors includes the standard GPS / GNSS error budgets and things like an upside down receiver. A receiver may be capable of measuring the true speed at any specific moment (including split-second jerky movements), but in an ideal world you really need multiple measurements per second if outputting PVT solutions at 1 Hz. Recall the previous post about what happens internally in the SiRF chipsets. That is very likely to be the reason we don't see aliasing effects on 1 Hz devices from Locosys.
Peter's driving tests show little difference between the ESP and Garmin devices that were tested, although there was a slight bias in the watches (not centered around zero). This is consistent with my own testing but sadly it's not representative of the real-world performance we see on the water. It provides useful insights into the innate precision of the watches, so I appreciate seeing Peter's data and think it provides useful insight to anyone interested in the nuances of GNSS performance.
I've taken my last windsuring session (pre-Christmas) and created similar plots to my earlier winging example. This was a typical speed session in Portland Harbour. The charts aren't measuring the innate precision, but they show how the live speeds (smoothed using a 2s rolling average) differ between 2 equally good watches worn on the wrist, and 2 motions worn on the bicep.
The percentile figures say that during this session, whilst sailing > 25 kts:
- watches within 0.13 kts of each other for 50% of the time, 0.44 kts for 95% of the time and 0.81 kts for 99.7% of the time
- motions within 0.03 kts of each other for 50% of the time, 0.09 kts for 95% of the time and 0.18 kts for 99.7% of the time
Roughly speaking the differences between the motions are 4 to 5 times smaller than the watches across a range of speeds. These figures are not meant to represent "errors" or give an absolute measure of precision, they simply show how much two watches / motions vary throughout a typical windsurfing session when worn in the recommended manner.
These stats are consistent with what I observe when comparing top 5 results using Speedreader, or comparing 2s results as I finish a run, which I have done many hundreds of times. A lot of the differences are likely due to how the watches are being worn on the wrists, but that is unavoidable. Most runs show results within 0.1 or 0.2 kts of each other on the watches, but 0.5 to 0.7 kts is not a rare occurrence.
Accuracy and precision are different things from a scientific perspective, and this post does not even touch on the differencea or attempt to quantify them for these devices. The ISO definition of accuracy works quite nicely for GNSS, which is a combination of trueness and precision. One aspect of Peter's earlier post related to trueness, when discussing errors being high or low for sustained periods.
I guess the main goal of this specific post is to show what can be achieved with modern Garmin watches in real-world conditions and when wearing the devices in the most optimal (but practical) way. It isn't giving a measure of accuracy in the ISO sense (combination of trueness + precision) but observing the consistency of data from two devices provides some useful insights imho.