[aprssig] distributed findu possible ?
Matti Aarnio oh2mqk at sral.fiSun Aug 10 13:02:10 UTC 2008
- Previous message: [aprssig] distributed findu possible ?
- Next message: [aprssig] distributed findu possible ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Aug 09, 2008 at 06:17:14PM -0400, Steve Dimse wrote: > From: Steve Dimse <steve at dimse.com> ... > See my last reply for some findU stats. Keep in mind what you are > talking about here is cherry picking the easy stuff findU does. Steve, One of the reasons that people have no idea of what findU can do, is its "user interface". Indeed you have supplied only backend of things, no frontend at all, and on some details like how long data is retained the information is not given anywhere that I can spot. It is much much easier to point aprs.fi's map for the general area of interest, and then look at what happens around there. For that matter, things like CWOP really are rather invisible, and doing data accumulation and relay to NOAA is not something one would want to with distributed systems. (Not that I believe too strongly on the extreme distribution idea at all...) Knowing how rich swamp of encoding formats the APRS packets are, incoming packets must be pre-parsed for position, symbol, etc. information before feeding all that data into database, and then have some _smart_ ways to index those parse results so that one can find "all APRS positions within 20 mile radius of position X,Y", or "all APRS entities with symbol S", or whatever there may be. Plus time-ranges.. Plus application specifics, like WX and Telemetry. I have had a small peek at what DWH is, and how things are handled there. Raw data goes in, gets transmutated in a number of ways, and is viewable via "product tables". In the end the raw data may not live in the system for very long, but those end-product views are longer-term data. Like: http://aprs.fi/weather/OH2KXH/year http://aprs.fi/telemetry/OH2RDK-5/month I don't know how long the data is truly kept at aprs.fi system, but raw data is purged a lot sooner than analysis products. > This is what aprs.fi does, and to some extent aprsworld, but you > can't call it a findU replacement unless it does the hard stuff. > What are you plans for handling: > > http://www.findu.com/cgi-bin/wxpage.cgi?call=K4HG&date=20051023&last=30 > > This is a three year old plot of the data at my house just before > Hurricane Wilma hit and either the DSL line went down or the UPS gave > out after power failure. You can get this for any weather station for > any time in the last eight years that it sent data to the APRS IS. Or > > http://www.findu.com/cgi-bin/track.cgi?call=w7lus-14&geo=usa.geo&start=99999 > > How are you going to show month+ long tracks? That all means that: - Data is kept on persistent database (no ram-only nodes) - Its insertion must be cheap (as "quick") - Its retrieval must be cheap (which may make the insertion less cheap...) Disk space keeps growing, still the disks can handle only so many IO operations per second because moving IO heads along the disk surface and spinning the disks themselves do take roughly the same time now that they took 10 years ago. Thus a single terabyte disk is no _faster_ to do IOs than single 10 GB disk. One needs to have multiple disks for: data mirrors so that single disk can fail without data loss or even service loss, _and_ for IO parallellism. > With your distributed system, how do you handle a guy that travels > from the area covered by one server to another? There are lots of > details you need to address... Same data must be replicated at multiple systems either because of data replication, and because of indexing to answer "what were stations near OH2MQK's position on date NN" - the lookups could be: "OH-databases", and "Eastern Canada -databases". Pretty soon things degenerate to: "have all data at all nodes", which just goes to parallel server's load-balance. However if all nodes do not get all APRS packets, there can be awkward holes in the views of the world. To ensure that all packets make it to all nodes, the way is to connect each of the data collector to all APRS-IS core nodes to pull in all data. ... which is rather stupid thing to do because of the core load it causes when done in large scale setup. Alternate would be to query all partitioned database nodes for relevant data, and then do merge-unique before giving out presentation, but I do recall that goal was to _reduce_ the amount of network traffic in system, and for globally distributed system things do get a bit sticky when backends have to do global lookups.. > Steve K4HG 73 de Matti, OH2MQK PS: Steve, do check DNS A records of findu.com and www.findu.com
- Previous message: [aprssig] distributed findu possible ?
- Next message: [aprssig] distributed findu possible ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the aprssig mailing list
