Recent

Author Topic: [SOLVED] Fitting a "good enough" normal to a list of data points  (Read 816 times)

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
[SOLVED] Fitting a "good enough" normal to a list of data points
« on: September 03, 2024, 05:35:55 pm »
Given a list of data values with one ordinate evenly spaced and the other normalised, where it would be reasonable to assume that the data should fit a normal distribution, is there a "quick and dirty" way of finding where the peak of the normal should be? Sample data below, followed by a more detailed explanation.

Code: [Select]
-1000: 0.2017 **********
 -950: 0.2406 ************
 -900: 0.2202 ***********
 -850: 0.5324 ***************************
 -800: 0.3633 ******************
 -750: 0.3980 ********************
 -700: 0.4054 ********************
 -650: 0.5328 ***************************
 -600: 0.2253 ***********
 -550: 0.3911 ********************
 -500: 0.3082 ***************
 -450: 0.3620 ******************
 -400: 0.4034 ********************
 -350: 0.5403 ***************************
 -300: 0.4002 ********************
 -250: 0.3891 *******************
 -200: 0.6688 *********************************
 -150: 0.6007 ******************************
 -100: 0.6864 **********************************
  -50: 0.6944 ***********************************
    0: 0.5733 *****************************
   50: 0.7204 ************************************
  100: 0.8316 ******************************************
  150: 0.8691 *******************************************
  200: 1.0000 **************************************************
  250: 0.7709 ***************************************
  300: 0.8776 ********************************************
  350: 0.6078 ******************************
  400: 0.5409 ***************************
  450: 0.4319 **********************
  500: 0.3770 *******************
  550: 0.3541 ******************
  600: 0.4373 **********************
  650: 0.4930 *************************
  700: 0.5059 *************************
  750: 0.3358 *****************
  800: 0.4647 ***********************
  850: 0.4693 ***********************
  900: 0.2902 ***************
  950: 0.3032 ***************
 1000: 0.2656 *************

Peak: 200KHz
Mean: 20KHz

ADS-B messages report aircraft altitude etc., and are 50-150 bits long transmitted at 1 Mbit/sec on a nominal frequency of 1090 MHz. Hence a receiver would normally have a bandwidth of somewhat more than 2MHz.

Doppler being insignificant, it would be reasonable to assume that the actual transmission frequencies were distributed normally, but did not depart more than 1MHz either side of the nominal.

The histogram above plots the normalised number of messages received as a receiver scans either side of the nominal 1090MHz in steps of 50KHz. The actual sampling order is +0KHz, -50, +50 and so on. It looks as though the "best" overall frequency for a single receiver is 20KHz above the nominal, or possibly slightly higher.

The above took 40 minutes to collect. I'll set it going again tomorrow morning using 25KHz spacing and extend the "tails" to 1.5MHz, and collect four rounds which should reflect a fairly good sample of daytime traffic.

MarkMLl
« Last Edit: September 07, 2024, 06:05:35 pm by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

wp

  • Hero Member
  • *****
  • Posts: 12305
Re: Fitting a "good enough" normal to a list of data points
« Reply #1 on: September 03, 2024, 06:35:47 pm »
Sorry, I don't understand. What the is number at the left of the chart? -1000, -950, etc. And the number after the colon? The "normalized" message count? Normalized to what?

Basically you can find the position of the maximum by doing a nonlinear fit with a gaussian curve. You can use the LMath library to do this. Find an example at my github: https://github.com/wp-xyz/LazSamples/tree/master/math/lmath/gaussfit

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
Re: Fitting a "good enough" normal to a list of data points
« Reply #2 on: September 03, 2024, 06:44:48 pm »
Sorry, I don't understand. What the is number at the left of the chart? -1000, -950, etc. And the number after the colon? The "normalized" message count? Normalized to what?

Basically you can find the position of the maximum by doing a nonlinear fit with a gaussian curve. You can use the LMath library to do this. Find an example at my github: https://github.com/wp-xyz/LazSamples/tree/master/math/lmath/gaussfit

Table needs to be scrolled, but for some reason the + marker isn't showing. Number on the left is the frequency either side of nominal, number on right is the number of messages received over a minute normalised so that the maximum number (of the order of 13,000) is shown as 1.0.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

wp

  • Hero Member
  • *****
  • Posts: 12305
Re: Fitting a "good enough" normal to a list of data points
« Reply #3 on: September 03, 2024, 08:08:04 pm »
The attached project performs a gaussian fit to your data. The best-fit curve is displayed in blue, the center frequency is 94 kHz. The fit is not very good, maybe the fitting equation needs to be extended by a background offset parameter.

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
Re: Fitting a "good enough" normal to a list of data points
« Reply #4 on: September 03, 2024, 08:57:19 pm »
The attached project performs a gaussian fit to your data. The best-fit curve is displayed in blue, the center frequency is 94 kHz. The fit is not very good, maybe the fitting equation needs to be extended by a background offset parameter.

Thanks very much for that, I'll take a very careful look at it. I'll also post another dataset late tomorrow in case anybody's interested.

This is a godawful "protocol" with absolutely no contention management, and the only way that the high-profile tracking sites work is because they've conned a lot of people into running a dispersed network of receivers on their behalf. Since my interest is local low-altitude field management I can't realistically I do that, and depending on how long the "tails" turn out to be I might need to run two receivers tuned +-500KHz relative to the nominal... or maybe more.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Curt Carpenter

  • Hero Member
  • *****
  • Posts: 505
Re: Fitting a "good enough" normal to a list of data points
« Reply #5 on: September 03, 2024, 10:16:26 pm »
The question that pops to mind is:  how stable are your receivers?

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
Re: Fitting a "good enough" normal to a list of data points
« Reply #6 on: September 04, 2024, 08:38:03 am »
The question that pops to mind is:  how stable are your receivers?

That's a major part of what I'm trying to find out. Empirically, experience has shown that the one I'm using is fairly stable for analogue (i.e. AM or FM) modulation, but a UHF beacon suggested that there was a constant frequency offset of around -50KHz... at the beacon's specific frequency which I think was 1297MHz. My suspicion is that that offset is genuine, but that it varies across the spectrum, hence at 1090MHz it's somewhat positive e.g. +93KHz as on that graph; I don't know whether this is an SDR hardware or software issue.

Again empirically, once I'm monitoring a moving aircraft I continue to see it (subject to antenna placement and radiation pattern), so I'm reasonably confident that there is no gross short-term wandering. But my ability to receive specific aircraft varies a lot: I see a different population with a +ve offset from what I see with a -ve one.

The possibility of unexpected short-term drift or jitter is the reason why my scan pattern starts at the nominal frequency and then samples on alternating sides. The results suggest that for any run there is a roughly symmetrical drop-off either side of the peak. I've had very limited success trying to monitor this signal with a conventional receiver, I think I'd need to add a discriminator tap so I knew what the RF was doing.

Antenna design is a different matter since I'm more interested than most in overhead (rather than close-to-horizon) traffic, I'm currently trying 3/2-lambda with groundplane but I don't think it's directly relevant.

In combination, that leaves me reasonably confident that while I might have to compensate for a receiver offset I don't have to compensate for drift, and that I'm seeing problems inherent to the unmanaged contention implicit in ADS-B squitter messages. I don't know to what extent this is worse with poorly-maintained small-aircraft equipment than it is with commercial gear, and I've not even started to give serious consideration to Doppler effects.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Curt Carpenter

  • Hero Member
  • *****
  • Posts: 505
Re: Fitting a "good enough" normal to a list of data points
« Reply #7 on: September 04, 2024, 06:48:31 pm »
It's been a few years since I experimented with SDRs (they were cheap USB dongles and some homebrewed under 6m though) and my memory is that they all showed a lot of drift.   Never really tried to explore it though.  My reaction to your data was that it looked pretty good given all the variables at play.  Does your beacon stay put for you over a few hours?

Sounds like fun!     

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
Re: Fitting a "good enough" normal to a list of data points
« Reply #8 on: September 04, 2024, 08:08:53 pm »
It's been a few years since I experimented with SDRs (they were cheap USB dongles and some homebrewed under 6m though) and my memory is that they all showed a lot of drift.   Never really tried to explore it though.  My reaction to your data was that it looked pretty good given all the variables at play.  Does your beacon stay put for you over a few hours?

Sounds like fun!   

Yes, was fairly solid. I've captured quite a lot more scan data today but also spent some time tidying things up (reporting elapsed time, overhead etc.) so might post more tomorrow. What I can say though is that there's an enormous frequency range over which I can capture ADS-B messages plus possibly some aliasing effects, but what I /don't/ know is whether a single aircraft gets splurged over the entire band or if I really am seeing gross mis-tuning of transmitters. **

I think that an interesting experiment would be to select the address of a specific aircraft that I knew to be in the vicinity, and then to rapidly scan over say 4MHz to try to work out what its peak frequency is: even without specific support for this sort of thing in the capture program I'm now fairly confident that I can restart it with changed parameters in about 0.6 seconds hence can easily sweep a MHz per minute which is much less than it takes for a light aircraft to takeoff and move out of range.

** Hypothesis: some transponders free-run until they see 1030MHz nominal SSR, at which point they use a PLL to multiply that by 1.09/1.03 to respond. I've not seen anything written on this, but it might even be a "feature" to try to reduce contention.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
Re: Fitting a "good enough" normal to a list of data points
« Reply #9 on: September 05, 2024, 01:57:35 pm »
I think I see what's going on now.

Particularly if a relatively small amount of data is collected, it appears that there are three Gaussians involved centred on (roughly) 1090, 1089.5 and 1090.5 MHz.

Collecting more rounds of data, rather than lingering longer on each frequency, improves the curves: I speculate that this is because it looks at more aircraft each with their own tuning.

I've not been able to find published material that justifies this, but with a limited amount of searching I've found that (at least) Agilent makes 1030.5MHz SAW filters and speculate that there are others for 1029.5, a 0.5 MHz channel spacing at this sort of frequency is entirely plausible and I suspect that different airfield SSRs are transmitting on different channels: not quite the four-colour map solution but still reasonable.

I haven't a clue how to analyse this numerically but for the moment am assuming that my receiver has a +100KHz offset. I've got a few more things I need to look at but as soon as I have time I'll add a "seek" facility that only reports on a single aircraft.

I've also ordered up another (cheap) receiver, if I have a receiver tuned to each of the outer peaks and find I collect a different population of aircraft on each (i.e. they're not harmonics etc.) I think it will go a long way to proving the above speculation.

Updated: note that the graph below shows the number of messages with good checksum vs frequency, rather than signal strength. Also that the centre frequency of the receiver was being swept over the entire range, so its passband rolloff is of no significance.

As a quick and dirty hack, I've added a median-finder to the original (text-mode) program based on splitting the area into two parts. I'll probably come back to this at some point and see if I can characterise the curves better, experiments a few weeks ago showed that putting attenuators in the antenna feed tidied things up a bit and it might be that that would accentuate the outlines.

MarkMLl
« Last Edit: September 07, 2024, 06:10:18 pm by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Thaddy

  • Hero Member
  • *****
  • Posts: 15556
  • Censorship about opinions does not belong here.
Re: Fitting a "good enough" normal to a list of data points
« Reply #10 on: September 05, 2024, 02:00:56 pm »
That is called a Gaussian distribution.  Use Randg from math.
That is, when you know the noise level and can sample it, Gaussian math will help you filter it with a little bit of help from Fourrier. (easiest is dft, for which examples are available, do not get confused with the first f in fft which does the same).
This is pretty much standard.
(and you know how to handle it. This is not at all for beginners, don't even ask, I am not your math teacher)
« Last Edit: September 05, 2024, 02:16:18 pm by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
Re: Fitting a "good enough" normal to a list of data points
« Reply #11 on: September 05, 2024, 02:30:04 pm »
Thaddy, have you actually read the rest of the thread?

Just about every message refers to its being a Gaussian, with the arguable exception of the first where I describe it as a Normal distribution.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Thaddy

  • Hero Member
  • *****
  • Posts: 15556
  • Censorship about opinions does not belong here.
Re: Fitting a "good enough" normal to a list of data points
« Reply #12 on: September 05, 2024, 02:33:41 pm »
I did miss that part about "normal". In that case you mean the question was already answered.
What is left for me is to point you to some dft code?
If I smell bad code it usually is bad code and that includes my own code.

wp

  • Hero Member
  • *****
  • Posts: 12305
Re: Fitting a "good enough" normal to a list of data points
« Reply #13 on: September 05, 2024, 02:35:43 pm »
Just about every message refers to its being a Gaussian, with the arguable exception of the first where I describe it as a Normal distribution.
Normal distribution = Gaussian curve

MarkMLl

  • Hero Member
  • *****
  • Posts: 7528
Re: Fitting a "good enough" normal to a list of data points
« Reply #14 on: September 05, 2024, 04:42:54 pm »
I did miss that part about "normal". In that case you mean the question was already answered.
What is left for me is to point you to some dft code?

I agree that sticking it into a DFT might be interesting... to somebody who hasn't read the thread.

However I thought I'd made it adequately clear that the x ordinate was already in the frequency domain, and converting it to anything else really makes no sense at all. I'm looking for the center frequency which- as WP has demonstrated- can be done by curve fitting, nothing else.

MarkMLl
« Last Edit: September 05, 2024, 04:48:31 pm by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018