This document was written by Paul Edwards and is released
to the public domain.  It can be found at:
http://freespace.virgin.net/paul.edwards3/gpsavg/peavg.txt

POST-S/A Addendum at end.

Averaging - just how useful is it?
----------------------------------

The answer to this question is not straightforward.  It
depends on the following 3 factors:

A. What you are planning on using the waypoint for.
B. How much it costs you to average.
C. How many satellites you can see.


A - What you are planning on using the waypoint for
---------------------------------------------------

Ignoring 3D applications where altitude is important,
and the benefits of averaging are actually higher,
there are 4 broad categories of use for a waypoint
in practical terms.  Your practical use is more
likely to be a mixture of them rather than fall
100% into one category.  Accuracy is normally quoted
in RMS, but RMS is a single measure that doesn't
actually represent ANY of the 4 practical uses, it
attempts to give a single general figure to encompass
them all.

1. How far you are away from the true position "on average" (in metres).

2. The size of the search area "on average" (in square metres).

3. The worst-case scenario, as seen by the maximum distance 
you can get away from the true position (in metres).

4. The worst-case scenario, as seen by the maximum search
area you may end up having to search.

Some examples of all 4 types of use are:

1. You are in a clearing and have line of sight to all objects,
so as long as you are within reasonable distance to the object,
you will spot it immediately.  But its actually dark so you
have to first of all get to the location on your GPS by car, 
then get out and use a torch to scan 360 degrees until you see 
the object.  You are interested in average distance because that 
is how far you have to walk, and that is your cost that you are
trying to reduce.

2. You are in a crowded marketplace, looking for a particular
stall.  Because there are stalls everywhere and people
everywhere you can only see one stall at a time.  Once again
you want to reduce the distance so that you find the desired
stall sooner.

3. You are in the clearing again, dropped off by the taxi
driver, and now you have to walk to the object.  You can
walk about 100 metres OK, but after that your arthritis 
kicks in and it is excrutiating painful to walk further.
You don't particularly care about reducing the distance
from 90m to 80m, as your arthritis doesn't bother you at
those distances.  However, 200m and you'll probably have
fainted trying to get there.

4. You are're in the market place again.  The next bus
is in 10 minutes time.  Then there is another 20
minutes for the one after that.  After that, you
have to catch a taxi, much more expensive.  But
you need to find the stall regardless.  In 10
minutes you can search a circle with radius 100m.
But a circle with radius 200m is 4 times the
area, which means you will miss the last bus.  You're
keen to catch the first bus.  There is no advantage
to decreasing the search time below 10 minutes
because you have to wait for the bus anyway.  You
are not interested in the average, just getting rid
of the extremeties, which DO cost you.


B - How much it costs you to average
------------------------------------

Nothing comes for free, or does it?  Perhaps you are taking
the waypoint for your employer, and he is having to pay you
for every minute that you're standing around doing nothing.
Or perhaps you're sightseeing and you're enjoying the view
from Mrs Macquarie's Point, and time spent averaging is
totally free.  In many cases, GPS receivers allow you to
average whilst entering the name of the waypoint, which
may take a minute.  If you usually stand still to do this
(instead of bumping into things whilst you enter the name
whilst walking), then this minute is often free.  But
usually the average beyond the one minute will not be free,
e.g. if you're in a foreign city, do you really want to 
stand outside your hotel so that you can find it again
easily or is your time better spent touring?

In some situations you actually NEED a particular accuracy,
regardless of the cost, so the cost essentially becomes
irrelevant, you just have to look up the value and charge
the employer accordingly.


C - How many satellites you can see
-----------------------------------

The S/A error introduced by the different satellites
does not cause a constant error, e.g. 50 metres to
the North.  Otherwise it would be easy to correct
for.  Instead, the satellites all have their own
errors put out, and the result of this is that when
you average them all out, the result is not as bad
as if you just opened yourself up to one of them.
This process is known as overdetermination.  Generally, 
the more satellites you have, the better you reduce 
the effects of S/A.  Because the effects of S/A have 
already been reduced, averaging has less potential for 
gain.  And the converse is true.



So how do you combine all these factors?  Well it is
simple enough to graph long term averages on the pure
forms of the 4 uses.  John Galvin has provided this
data for the first 8 minutes of averaging at various
intervals.  I don't have a full set of data for any
other set of "satellite visibility" so you cannot
see the results of that, you'll have to judge for
yourself.

To deal with the cost issue, we can graph the simplest
case, where the cost is directly proportional to time,
say you are a wage-earner and taking the waypoint for
your boss to use later.  Let us also assume the
"average search area" (case 2) scenario, and that the
time (and cost) are directly proportional to the
search area.

There is still a constant factor missing, that
depends on your application, so needs a judgement
call made.  That is the time/cost taken to search
each unit (ie square metre).  So the best that can
be done is graph the reduction in search area
against cost (time).  But that graph alone is not
sufficient to make a judgement call, as even if
you're getting near-zero value for money at the
moment, it might still be cost-justified to continue
if in a little while you'll be back in the black.

So we need 3 different graphs for each usage scenario,
e.g. average search area.

1. average search area versus time, for applications
that require a particular average search area, regardless
of cost.

2. gradient of (1), ie reduction in search area per
minute (unit cost) versus time (cost), for cost-sensitive 
applications.

3. reduction in search area to date per time (cost)
versus time (cost).

Graph 3 will show you what length of time gives you the
best value for money, which prevents you stopping at a
peak or trough of Graph 2.  Graph 2 is useful for if you
want to stop immediately (meaning cost isn't really
proportional to time, it's more expensive for larger
times), but want to make a judgement call as to whether
the gain currently being had is worth it, ie if you're
at the best rate you can get, you'll stay on for another
minute anyway.

Graphs 2 and 3 are derived from 1, they just show the
information needed to make those decisions clearer.


So you can see we need a total of 12 graphs, 3 for each
of the 4 usage scenarios.


First, John Galvin's data:

avgtime  avgsrch      maxsrch
0 sec    9762.452408  113339.7668
1 sec    9760.270024  111823.6649
15sec    9701.816866  96078.86435
30sec    9595.167867  86487.03635
60sec    9281.674712  71202.35339
120sec   8433.794734  50641.68168
240sec   6818.709443  24735.88637
360sec   5700.897571  17839.41602
480sec   4931.960283  15908.32562
 
all in square meters
 
avgtime    rmserror        avgerror        maxerror
0 sec      55.744821       49.24206883     189.9399056
1 sec      55.73858981     49.23444454     188.6652525
15sec      55.5714331      49.07659576     174.8795355
30sec      55.26514947     48.87212795     165.9206988
60sec      54.35484134     48.30882213     150.5470447
120sec     51.71998073     46.61398298     126.9635684
240sec     46.58822378     42.68752282     88.7337425
360sec     42.59873273     38.84340466     75.35557312
480sec     39.62185878     35.45462813     71.16022234
 
all values in meters


Before we go on, it is worth looking at one particular
set of data, that at the 4 minute mark.  4 minutes is
the period of correlation for S/A.  Here are the results...
 
reduction caused by 4 minutes averaging (correlation period):
RMS: 16%
Usage Scenario 1 (average error): 13%
Usage Scenario 2 (average search area): 53%
Usage Scenario 3 (maximum error): 30%
Usage Scenario 4 (maximum search area): 78%


The full 12 graphs can be found at
http://freespace.virgin.net/paul.edwards3/peavg.123
Unfortunately I don't have the ability to translate
the graphs into jpegs for universal display, so it
is only of use to people with Lotus 123.  However,
you can graph them yourself from the raw data, e.g.
here is the data for the average search area reduction
rate, ie Scenario 2 Graph 2 (square metres/minute)...

0 0
1 131
15 251
30 427
60 627
120 848
240 808
360 559
480 384


I think the average search area reduction is the most
interesting one to look at, because an average distance 
application doesn't come up too often.  If you can see 
the object line of sight it doesn't normally make a 
difference whether its 100 or 50 metres away.  Searching 
applications are considerably more expensive.

As you can see, we get into the law of diminishing returns
around the 6 minute mark, the most gain actually happens
quite early on, and in fact by the first 4 minutes the 
search area has already halved!


I have graphed another set of data, this one with a better
RMS, about 36 metres.  This data has been processed by me,
with raw data obtained from:
http://www.cnde.iastate.edu/staff/swormley/gps/Data/99081201.asc
for location 41.501596, -81.607336.  I have calculated results
for every single second from 1 second to 32 minutes.  The
C code to do this can be found in ozpd at www.kerravon.w3.to,
gpsdist.c.  The resultant spreadsheet is called peavg2.123.

Interesting points to note are the peaks of position
improvement rate, occurring at (seconds):

rms = 207
avgerror = 194
maxerror = 289
avgsrch = 183
maxsrch = 231

However it is the best total reduction/time which is
what is the best value for money assuming a linear
cost.  These peaks occur at.

rms = 351
avgerror = 351
maxerror = 379
avgsrch = 307
maxsrch = 337

So for average search area applications, 5 minutes 7 seconds
is the most cost-justified averaging time, assuming cost
is proportional to time.  Or put another way, if you want 
to earn something (reduced search space) for your time spent, 
then spend just over 5 minutes there, and you'll get paid the 
highest rate of any averaging time.  I think this is the most 
important number to remember.

After 4 minutes we see the following reductions, not as
good as John Galvin's data due to the lower starting RMS...

rms = 16%
avgerror = 15%
maxerror = 23%
avgsrch = 29%
maxsrch = 40%


For completeness, there is an earlier graph available,
peavg.gif, which uses some data points provided by David
Wilson to determine the rate of search area reduction as
a function of time.  I do not know what the reason for
the dramatic changes earlier on are, but they are not
reproduced in the other graphs, and are minor in the
scheme of things (since they are short-lived) so are 
probably best ignored.

One other point not covered is in the practical searching.
Most real-life searching of places will involve arriving
at the GPS location and then, if you can see the object,
going directly there, or if you can't, then having to
perform an expensive search.  The real practical gain
comes when you go from an expensive search to a
line-of-sight.  This difference could be very small, e.g.
just 2 metres closer may have given you line-of-sight, or
perhaps it would have made you more inclined to go in the
correct direction.  

If you e.g. need to be within 50 metres to see something, 
and you've only got 50% chance of being within 50m, 52% of 
being within 52m, then you will have 2% less searches to do.  
With more than one user of a waypoint, and more than one 
waypoint, that's 2% saving for lots of people, with only 
one cost of taking the original waypoint, so the benefits 
may be enormous.  As always, it depends on exact usage.


POST-S/A Addendum, using RMS figures from David Wilson:

time (mins), rms error, search area/const, (area reduction/const)/cost(time)
0 5.62 31.58
1 5.11 26.11 (5.47)
2 4.79 22.94 (4.32)
5 4.33 18.75 (2.57)
15 3.85 14.82 (1.12)
30 3.5 12.25 (0.64)
60 3 9 (0.38)
120 2.45 6 (0.21)

This is for applications where search area is the best model (which I
believe is the most accurate model for the majority of situations).

The SA the situation was very different.  You actually got the BEST
rate of decrease at the 3 minute mark, with the best overall rate
(value for money) at the 5 minute mark.

As we can see, you get the best value for money now is at the 1 minute
mark.  I suspect it is even less than that, and that the best rate
approaches 0, ie you hit the law of diminishing returns immediately.
So I don't think I'll average for any time that isn't free.

Having said that, you do halve the search area in about 8 minutes, same
as when SA was on.  But back then, it was a much larger area that was
being eliminated.  Still, it's good to know that you can spend just 8
minutes at a site and do more reduction in search area than the next
50 years.  Certainly worth trying to find a reason to stick around!