The main problem is the kinematic model used to calculate the steering angle (as pointed above) which cannot account for wheels slipping but is assuming a purely kinematic relation. So you end up giving a steer angle command based on a model that’s assuming the wheels are glued to the ground while they are not. The second is maybe how the fusion is working between the IMU and GPS, now it’s trying to find a midway between the two. There’s two options:
Trying to get the vehicle attitude as an output, this is basically telling where the nose is pointing. Not really necessary here, except if you go to some more refined control algorithms that can utilize the slip angle information acquired.
Second, like it’s working now, is to try to fuse the signals to reflect the true course of the vehicle. This gives you the true heading of the vehicle, which is basically what’s happening when you use the dual antenna as IMU. More importantly, you can resolve the yaw rate bias, i.e. the correction to the yaw rate measured by the gyro, which produces the correct global motion. So this way you get the “true” yaw rate of the vehicle, which you can use to control the vehicle even without a WAS as discussed here AOG without wheel angle sensor. Or you can use it to calculate how much more you need to turn the wheels to achieve the desired steering rate is you have a WAS, etc.
As the fusion needs to happen at a high rate, you need to do it on the receiver, IMU, or arduino, as you are updating the heading at say 8 Hz RTK rate, and utilizing maximum IMU update rate available to update with the IMU signal. This is what’s is roughly happening inside the commercial high-end RTK receivers that have a built-in IMU, you have position, velocity, attitude, angular velocity and accelerations as state variables, then you augment the state space with the biases for accelerations and angular velocities. For those you use a discrete time state transition model based on more or less Newton’s law and run the prediction step based on those. For the measurements you’d use for example GPS for position and yaw angle, IMU for acc/angular acc, something for velocity (in cars you typically use the ABS sensor speeds) etc. The performance of the filter is then much up to how well you choose the covariance matrices for the model and measurements. These basically define how much you trust the measurement and the model, these can and should be time varying. So if you get gargabe in from GPS for example, the variance for the yaw data measured goes up and the filter is trusting more on the model instead.
Here’s a good MSc thesis on the topic to get started on sensor fusion: https://aaltodoc.aalto.fi/bitstream/handle/123456789/47095/master_Sari_Onur_2020.pdf?sequence=1&isAllowed=y