Open Whisper Systems (OWS), the nonprofit group of cryptographers and software engineers behind the Signal end-to-end encrypted messenger, announced that it has overhauled the app’s calling infrastructure. The change brought video calls (in beta) to Signal as well as improved voice calling and a more unified backend infrastructure.
New Calling Infrastructure
Before Signal, there were two separate applications, both made by OWS: TextSecure and RedPhone. One was for sending end-to-end encrypted messages, and the other was for doing end-to-end encrypted voice calls. The group eventually merged the two into Signal to simplify the experience for users. The strategy worked, and in the past couple of years, the application’s popularity has surged.
However, according to the team, the codebases for the two apps have remained relatively separated, even as the user interface was unified. The team is now further unifying the code bases with a new architecture in the backend.
In the early days of RedPhone, the Session Initiation Protocol (SIP) was used as the signaling mechanism that starts a voice conversation, whereas the voice streaming was done through the Secure Real-Time Transport (SRTP) protocol.
However, according to OWS, SIP wasn’t that suitable for encrypted voice calls, because it would maintain an open long-lived session, which wasn’t compatible with the mobile environment. In other words, the VoIP (Voice over IP) application would have to maintain a constant connection to the company’s servers so the users would be notified about incoming VoIP calls.
Therefore, the Signal team developed its own short-lived signaling protocol, which was coupled with push notifications to announce an incoming call to the user. Initially, the push notifications were delivered via SMS. At the time, there weren’t any platform-wide push notification systems on either iOS or Android. Later on, Signal switched to Apple's and Google's systems when those were developed.
As for the streaming part of the code, Signal started using WebRTC components. The new update completes the transition with Signal switching fully to WebRTC. Signal is also switching to its own messaging channel for the signaling-pathway and the call setup.
Just like for end-to-end encrypted text messaging, voice calling requires user authentication if you want it to be fully protected from a man-in-the-middle (MITM) attack. RedPhone, and later Signal, used the ZRTP protocol for this. The protocol was developed by Phil Zimmermann, who also invented the PGP protocol.
To authenticate each other, both users in a call would get to see two words, called a Short Authentication String (SAS). If the words would be different on the other end, that would mean the call is being intercepted. This solution worked quite well, but according to OWS, it felt bolted on, because Signal already has a different authentication solution for text messages. The group also believes users shouldn’t have to verify one extra thing.
OWS said that the app doesn’t need ZRTP for authentication anymore, because the security of the call setup is now given by the security of the Signal messaging channel. This means verifying an extra SAS is no longer necessary, which simplifies the calling experience.
The group also took into consideration how encoding of the audio packets works. That’s because variable bitrate codecs can introduce the potential for side-channel attacks. Therefore, the team updated the Signal audio codec from Speex to Opus. The Opus codec will be used with a constant bitrate (CBR) rather than a variable bitrate (VBR), which should minimize information leaks.
One of the main features that Signal has lacked is the ability to do video calls. The feature has now arrived in beta, which you can enable by going in the application in Settings > Advanced > Video calling beta. OWS is gradually deploying the feature, so it may take a few days to roll out to everyone.