The Problem with Summer – two unexpected issues with a UKHASnet network

In the last week I’ve noticed two strange errors occurring in my Suffolk UKHASnet network. The network has been running for nearly 1 year now without very much intervention and comprises of an Arduino/ESP8266 gateway node AA, a solar Arduino node AH2 and a solar supercap LPC812 node AI1. Over the winter the network has been running at bare minimum often in zombie mode overnight but with the recent good weather and lengthening days the nodes have spent more time in repeater mode and have actually begun to malfunction.

AA – Afternoon Naps

I recently noted that the AA gateway seems to stop working in the afternoons, this has an impact as it means no data from the network is uploaded to the servers. It is placed in a well insulated attic where a Moteino uses serial to communicate with ESP8266 using AT commands to upload packets.  It appears that every afternoon for the last week it stops working between approximately 1400 and 1900. On closer inspection during this time the RFM69 temperature is reaching about 40C and then the gateway cuts out. AA_TempEvery time it restarts in the evening it initially sends a very high temperature (e.g. 61C) and then returns to functioning as a gateway uploading packets throughout the night until the next day.

Interestingly looking at the temperature graph on the 06 May the node doesn’t reach above 40C (ignoring the noise in the measurement) and therefore continues to gateway data throughout the afternoon. For the last week the weather has been particularly warm, particularly 07 May and 08 May, AH2 (which is outside) reported 32C which would suggest that the gateway is getting too hot due to the environment (hot day + insulated attic) and something isn’t working at that temperature only resuming once it cools off.

Thinking it through I can think of 3 main reasons why its failing:

  1. RFM69 failing at high temperatures – this is a strong possibility, just before the node fully resumes activities it appears to report a high temperature. The datasheet suggests that the operating temperature is -20C to +70C and maximum temperature is +115C. Due to the slight unreliablity of the RFM radios themselves I wouldn’t be suprised that something wasn’t working.
  2. RFM69 drifting – its certainly been seen on high altitude balloon flights that the RFM69s crystals drift and this would result in the receiver freq not matching the transmission frequencies of other nodes. If it was drift you would expect as it drifted off frequency the rate of received packets would decrease before stopping rather than suddenly stop and looking closely at the data this didn’t appear to happen.
  3. ESP8266 not coping or drifting – the ESP8266 seems to be more robust then the RFM69, its quoted operating temperature is wider (-40C to + 125C and I haven’t found any reports with a simple google search of any issues at this temperature.

AA_Temp2Interestingly while writing this entry the rain arrived and the weather was a lot cooler, AA managed to work through the afternoon and looking at the temperature graphs it definitely didn’t reach the high temperatures of the previous day. I suspect that it is an RFM69 issue rather than an ESP8266 and it might be a single component is struggling at the higher temperatures. It will be interesting to see if this happens again as we move into summer, as its quite a rare occurrence in the UK to reach 40C I don’t think there is any point adjusting the AA node, if it becomes a persistent issue the solution would be to move the node out of the insulated attic.

AI1 – Alternating Days

AI1 is a supercap node, it comprises of a solar panel, diode, a large super capacitor and then a EtnaNode (LPC812 and RFM69HW). The solar panels directly charge the super capacitor which then acts as a reservoir to the node (there are no batteries). While the node doesn’t survive the night it is able to power up when there is sunlight and then run into the evening as the cap discharges. During the winter it has woken up every morning and as the days have lengthened it has survived further and further into the night. In the last 2 weeks I’ve noticed that it isn’t booting every day and on closer inspection it actually only boots on alternate days.AI1_Volt

One of the problems I’ve encountered with UKHASnet nodes especially those which are solar powered is that if the voltage supplied to the node gradually increases the microcontrollers often get stuck in locked state where they can’t fully boot up (I assume not all the parts of the micro are able to boot and so it gets stuck). To overcome this issue we use the brownout function to reboot the microprocessor at a particular voltage therefore forcing it out of the locked state. This has been working great during the winter as the capacitors have enough time to discharge low enough to reset the micro however with the lengthening days there now isn’t enough time. The micro remains in the locked state and when the sun comes up just continues to power it until the next night when it discharges enough to reboot.AI1_Volt2

It turns out that if the node makes it through to around 0200 it won’t boot the next day however if it only works until approximately 0100 it will. Looking at the graph from Grafana on 5/5/16 and 7/5/16 the node boots and runs to 0200 and therefore doesn’t boot the next day however on 9/5/16 it only reaches 0100 and so the next morning it boots, the profile is a bit different as it was cloudy but the node manages to run until 0200 and then doesn’t work the following morning.

In an ideal situation we would build a node that can survive through the night and therefore there isn’t any need to boot up in the morning as its still running. It might also be possible to adjust the brown out settings further but unfortunately the node is encased in an enormous blob of glue to there is no way to reprogram the micro. Therefore the node is continue to boot alternate days unless the lengthening days get to the point that they can power the node throughout the night or that we wait until the days shorten again to the point that there is enough night to cause a reboot.