Injekt RTCM 1008

Yet another bug?
After running nice for 13:05:54 hours I got this:

Mar 11 06:54:18 raspberrypi bash[476]: 2020/03/11 04:54:37 [CC—] 42069232 B 6999 bps (1) localhost/STALL
Mar 11 06:54:23 raspberrypi bash[476]: 2020/03/11 04:54:42 [CC—] 42073673 B 7206 bps (1) localhost/STALL
Mar 11 06:54:27 raspberrypi bash[476]: Traceback (most recent call last):
Mar 11 06:54:27 raspberrypi bash[476]: File “/usr/local/bin/rtcmadd1008.py”, line 15, in
Mar 11 06:54:27 raspberrypi bash[476]: message_number = (packet_data[0] << 8) + packet_data[1]
Mar 11 06:54:27 raspberrypi bash[476]: IndexError: index out of range
Mar 11 06:54:27 raspberrypi bash[476]: 2020/03/11 06:54:27 socat[478] E write(1, 0xde2e68, 25): Broken pipe
Mar 11 06:54:28 raspberrypi bash[476]: 2020/03/11 04:54:47 [CC—] 42076984 B 6999 bps (1) localhost/STALL
Mar 11 06:54:33 raspberrypi bash[476]: 2020/03/11 04:54:52 [CC—] 42076984 B 0 bps (1) localhost/STALL
Mar 11 06:54:38 raspberrypi bash[476]: 2020/03/11 04:54:57 [CW—] 42076984 B 0 bps (1) timeout

And did not start working anymore.
(The previous bug caused a timeout which did recover after some time…)

Seems like packet_data has been guite short for some reason or another???
How an earth did we get this far this time anyways:

Has the sys.stdin.buffer been empty and

sys.stdin.buffer.read(length) 
sys.stdin.buffer.read(3)

have returned empty??
Or maybe the length was 0… I sure don’t know.

But as this thing should keep running 24/7 with high reliably something has to be done?

So maybe just simply:

# Just to make sure that there really is enough data in the packet_data

if len(packet_data) > 2:
    message_number = (packet_data[0] << 8) + packet_data[1]
    message_number >>= 4

Hmm, ordinarily sys.stdin.buffer.read(x) should block until x number of bytes. The fact that it returned no bytes may indicate the other side of the pipe has closed. In fact I did a quick little test just now and sys.stdin.buffer.read(x) will either block until it gets x bytes, or if the pipe closes, it will read whatever it can and return it.

So while you can guard the read() call with an if statement ensuring the length is what is expected, if there’s just no data forthcoming on the other end, the code will likely just loop forever doing nothing. So it’s likely that adding such a thing will not solve the problem but make it harder to debug. I dunno!

My feeling is the problem lies in the code that feeds its stdout to rtcmadd1008.py. Can you post the script that is running rtcmadd1008.py?

OK

The pipe is started in systemctl

A proxy is set up:

pi@raspberrypi:~/RpiNtripBase $ more baseProxy@.service
[Unit]
Description=Socat TCP-proxy for the serial port of the base
After=network.target
[Service]
Type=simple
Restart=always
RestartSec=10
ExecStart=/usr/bin/socat -d -d TCP-LISTEN:2102,fork,reuseaddr FILE:/dev/ttyACM0,b%i,raw
[Install]
WantedBy=multi-user.targe

and the injection & str2str server to ntripcaster is started

pi@raspberrypi:~/RpiNtripBase $ more str2str-injectrtcm1008.service
[Unit]
Description=str2str with RTCM 1008 injected
After=network.target
After=ntripcaster.service
[Service]
Type=simple
Restart=always
ExecStart=/bin/bash -c “/usr/bin/socat -u TCP:localhost:2102 - | /usr/bin/python3 /usr/local/bin/rtcmadd1008.py | /usr/local/bin/str2str -out ntrips://:testi@localhost:2101/STALL”
[Install]
WantedBy=multi-user.target

But

If the sys.stdin.buffer.read(n) does not go further before getting n bytes
Then it must have been asking only for 0 or 1 bytes for packet_data ??

As next line:
crc24_data = sys.stdin.buffer.read(3)
has also been executed and is not waiting

and we have now gone down to
message_number = (packet_data[0] << 8) + packet_data[1]
message_number >>= 4
where we got the IndexError

So if there would have beeen a packet with:

1BYTE
Preamble

2BYTES
6 bit Reserved
10 Bit for length (this time 0 or 1)

3BYTES
CRC data

Maybe could be possible? Maybe not? But would explain what happened.
Maybe it is best if I just wait and see for a day or a week if this IndexError shows up again. To see how often this might happen and hopefully get more details of the crime and criminals…

From what I can tell, if the pipe connected to sys.stdin is closed, then you can call read() as many times as you want and it won’t block, but just return a zero-length byte array. Which is what you’re seeing here.

It’s possible that str2str is erring out somehow, and then when the service recycles itself, the new service instance is trying to reconnect to port 2102, but since the old connection is still waiting to be shut down, the serial port is “locked” as it were and so you get no data. Do the logs show any restart on the str2str with RTCM 1008 injected service?

If that could be the problem, then you’ll have to add netcat to the pipeline. One thing I’ve discovered is that if you have socat listening on a tcp/ip port and then connecting to a serial port, only one connection can happen at a time. I have a situation where I’d like two processes to get the data from the serial port, but socat can’t do that alone. So I have to use netcat as well. For example I can do this:

socat /dev/ttyACM0,b57600,raw,echo=0 - | nc -l 2102 -k --send-only

This makes it so multiple connections to port 2102 work and they will all get copied on the data. Nothing written to port 2102 will go to the serial port which is just fine here.

Also why did you put things in two stages? Could you not pipe the serial port directly into rtcmadd1008.py instead of going through socat on port 2102? Especially since only one connection can be made to port 2102 at a time. If you do need multiple connections to port 2102, you’ll have to do the netcat trick above.

Not sure if any this is helpful! This sort of thing is hard to track down. Any additional information from the systemd journal logs?

1 Like

Why his way

Actually i was first doing it something like as you suggest.
Running it on console:

./str2str -in serial://ttyACM0 | ./addrtcm1008.py | ./str2str -out ntrips://:testi@raspberrypi:2101/WITH1008

but got the timeouts.
So installed it this way RpiNtripBase/README.md at master · eringerli/RpiNtripBase · GitHub

But the bug was in snakes digestive system as we found out…

What next
Looked also from the blue book (ver 10403.2):

Clipboard01

So the 0 bytes Data Message is also possible.
And will have to be taken into consideration.

Maybe try to simply fix that right away or just wait for pure curiosity how frequent this is from a F9P.

And I did not see any service restarts. Stream Killed itself as snake got stuck in IndexError.

nc -trick
This might be most a useful tool in many combats. It would then be more like my Trimble NetR5 receiver that has a tcp-server option along the ntrip server/caster etc… options.

Yes zero byte messages are possible. But I’ve never seen one in actuality, and I would not expect to see one coming from an F9P.

I’ve had an F9P running continuously for the last 6 months (except a few power flickers) running with the arduino injector without any issues.

My bet is still on something to do with that socat that talks to the serial port. Might want to go back to the more direct method you were using before now that the length decoding issue is fixed and see what happens.

You can also add logging to the python script by writing to stderr (sys.stderr.write). In fact you can direct print() to stderr with the file= kwarg: print('Testing', file=sys.stderr). Systemd will log all stderr stuff to the journal.

Also are you sure there are no other system processes that might try to touch the serial device? I had occasional problems last fall where my socat would just stop getting serial data, or if it did get data, it was scrambled somehow. Turned out that I was running the Cura slicer program for my 3D printer and by default it was scanning the serial ports from time to time, looking for a 3D printer. That killed my ntrip stuff. I know that ModemManager talks to the serial ports, but only usually once when it detects the port. Never had it conflict generally.

1 Like

I was looking back to your Arduino code: (with my simple reasoning)
I think it will survive Zero length messages and maybe also any other ugly stuff that might come:

It reads one byte at a the time. Drops it on the floor until Finds the Preamble and starts counting. Finds out the message type and Message length on the go .
And reads and writes serial until it thinks it is all done.
And then does the injection if it thinks it should be done.
And goes back to the beginning.

But even if something goes wrong it just keeps on reading and writing byte by byte until it thinks it done and goes back to the beginning. No problem.

Only accident happening if something goes wrong is that we might miss a message if the Preamble has already been sent. No big deal. I would imagine radio RTK streams can loose messages all the time…

But it just keeps going on for months and months.

//Now pass through the rest of the message
    count = 0;
    while (count < length + 1) {
        //read in the message body, less the 2 type bytes, and then
        //the 3 CRC bytes, so length + 1.
        if (Serial.available()) {
            c = Serial.read();
            mySerial.write(c);
            count ++;
        }

To get this perfect we should first read in the whole message, check the CRC-24Q with a QualComm definition and if all is ok then send it over. Too much of a hassle and maybe also waist of time and energy.

1 Like

Just an update on testing stage:

It ran this time about 11 hours before getting to the IndexError…

Now I put it up with:

#!/usr/bin/python3

    import sys

    while True:
    data = sys.stdin.buffer.read(1)
    while (data != b'\xd3'):
        data = sys.stdin.buffer.read(1)

    length_data = sys.stdin.buffer.read(2)
    length = ((length_data[0] & 0b00000011) << 8) + length_data[1]
    packet_data = sys.stdin.buffer.read(length)
    crc24_data = sys.stdin.buffer.read(3)

    if length >= 2:
        message_number = (packet_data[0] << 8) + packet_data[1]
        message_number >>= 4

    sys.stdout.buffer.write(b'\xd3')
    sys.stdout.buffer.write(length_data)
    sys.stdout.buffer.write(packet_data)
    sys.stdout.buffer.write(crc24_data)
    sys.stdout.flush()

    if message_number == 1005:
        # 1008 message: ADVNULLANTENNA (station id =0)
        sys.stdout.buffer.write(bytes([0xd3,0x00,0x14,0x3f,0x00,0x00,0x0e,0x41,0x44,0x56,0x4e,0x55,0x4c,0x4c,0x41,0x4e,0x54,0x45,0x4e,0x4e,0x41,0x00,0x00,0x79,0x06,0x89]))

        sys.stdout.flush()

So possible 0 length messages should not make an IndexError.
Also changed the content of 1008.

Anyways - its up and running again. And with a great interest I’m waiting to see how long the modified USB stream from F9P stays alive this time.

Just wonder if the

    sys.stdout.buffer.write(packet_data)

will make a problem if packet is empty? If it does then I suppose we might just drop this whole packet on the floor and and start from the beginning again or omit this line this time…

The need for two way serial in this eringerli/ RpiNtripBase comes from the idea that base can be configured at startup with a configuration file.

But I suppose if one makes a permanent base a just listening the base would be enough.

.

Yes write() is just fine with a zero-length byte array.

What was the traceback for the latest IndexError exception?

One strategy you could use would be to put a try/except block around everything after the preamble loop. I recommend only catching the IndexError exception, though, lest we hide other problems or bugs. After catching the IndexError exception just maybe make a note in the log with sys.stderr, and then let it loop back around to the preamble again. As you say all it would do is lose or corrupt a packet.

The addrtcm1008.py seems to work well now after the 2 bugs were fixed:

The first one made it to read too much data if the 6 bits before Message Length data was not 0.
Te second caused an IndexError of Message Length was 0.

Now it has been running about 100 hours on my Raspberry Pi without any errors. So it should be OK.

1 Like

so, is the bug fix also available in eringerli github ?

New RTKLIB b34 can insert 1008 + 1230 messages.
I create 2 different mountpoint: old RTCM 3.2 and new RTCM 3.3 MSM7 and its works with Trimble
image

3 Likes

@arkeston do you use F9P with Windows PC and which Trimble devices have you tested with and what is your Mountpoint if you want to tell it, thanks in advance

@arkeston does this work with the official release demo5 B33e?
Or we have to use the b34 version onGithub/demo5 b34 dev?

@pniels the RTKLIB softwares with graphical interface work only on windows. But there are also apps version without GUI for linux
here is my post with an RTKLIB software installed on a raspberryPi4.

RTKbase will maybe also be able to inject 1008 message? @Stefal can you give some details?

I just installed RTKLIB 2.4.3 b34, looks like it is running on Mountpoint PNRTK, there are some who can check with a Trimble for fun.

The next RTKBase release will use rtklib 2.4.3-b34 so my answer is yes!

2 Likes

I use ZEDF9P(ardusimple board) on Windows PC and SNIP software (NTRIP server).
STRSVR get the UBLOX RTCM 3.3 (1005(1), 1077(1), 1078(1), 1097(1), 1127(1), 1230 (1), 4072(1) and flow to two mountpoints with different names to SNIP server.
1st stream (RTCM 3.2) - 1004(1),1006(10),1008(10),1012(1),1033(10),1230(10) with option “-opt TADJ=1.500”.
2nd stream RTCM 3.3 MSM7 - 1005(10),1008(10),1077(1),1087(1),1097(1),1127(1),1230(10)
Cos I’m geodesist i have tested in with my Trimble based geodetic recievers South S86t (based on old Trimble BD960 board with GPS+GLO support) and South S660 (based on Trimble BD970 GPS+GLO+BDS support).
Old trimble device BD960 based have a FIX on stream RTCM 3.2
Trimble BD970 based board have a FIX on stream RTCM 3.3 MSM7.
Both of two sreams have a FIX with Emlid devices Reach RS (M8P L1 based) and ReachRS2 (ZEDF9P based).
image

1 Like

I used a b34 GitHub - tomojitakasu/RTKLIB_bin at rtklib_2.4.3
And this version of RTKLIB could inject/create 1008 and 1230 messages.
No need to use python script.

1 Like

@arkeston One of my mountpoints (PNRTK) I send directly from STRSVR to rtk2go.com

Cos I have white IP address, I used SNIP lite (free version) for my needs.