SIP Conferencing – How does it work?

Voice and video conferencing solutions are very popular. The Session Initiation Protocol (SIP) is the key for vendor-independent and future-proof solutions. This blog explains the concepts of SIP-based conference control and media exchange.


Introduction to SIP

Session Initiation Protocol (SIP) is a signalling protocol of Voice over IP systems. Specified in RFC 3261 and member of the TCP/IP protocol family of the Internet Engineering Task Force (IETF), and a protocol of the Internet.

SIP enables users to establish, maintain, modify and release multimedia sessions.

Sessions may transfer information in one direction only (simplex) such as paging systems, sound systems, video surveillance systems and similar, or simultaneous transfer in both directions (full duplex), such as phone calls only for audio or inclusive video.

SIP does not only support point-to-point connections, but also multi-point connections ( “conferences”). The following article describes how these conferences can be implemented using the SIP protocol.


SIP Conferencing

Tasks of SIP

SIP initiates a session or a conference.

It allows a terminal to join a conference or to leave it.

Terminals invite other participants with SIP to a conference or release participants from the conference.

Terminal use SIP to control the type of the media stream, the coding techniques and the like. SIP transmits the status of the conference to conference participants.


Conflicting Tasks of SIP

On one hand, SIP is used for central signaling, on the other hand SIP offers the advantage of a decentralized concept. This means that different partners can manipulate sessions on SIP.

The media stream in the IP network can be selectively transported via unicast sessions or distributed over a multicast-enabled infrastructure.

An important task is the mixture of signals from different sources, so that all conference participants may “listen” simultaneously (if there is a full-duplex transmission). The media mix is done either centrally (in a special “box”) or locally at individual devices.

The conference can start ad hoc or scheduled done. For this, different functions are required (for example, calendar integration).

Participants should be able to dial in from outside into the conference (dial-in) or to be invited as external participant from within the conference (dial-out). Peer-to-peer conferencing is also conceivable.

Invitations of conferees have to be made in a simple way. Also, the release from the conference should be simply done.

At conference end all partners have to be clearly informed that the conference has ended.


Centralized Conference Control

A central conference bridge (Multipoint Control Unit, MCU) controls the SIP dialogues with all relevant partners. The SIP devices (SIP User Agents) “see” point-to-point SIP sessions.

According to RFC 4353 this constellation is called “tightly coupled conference” as a central office has full control over the conference.

Another form of centralized conference control is carried out directly in a terminal. In this case, the terminal is acting as a conference center. In RFC 4353 this is called a “Focus” point.


Decentralized Conference Control

In a mesh network, all stations are equal and communicate via SIP directly among themselves. In RFC 4353, this situation is called a “fully distributed multiparty conference” because each participant has full control over any connection via SIP signaling.


Centralized Exchange of the Media Stream

Each conference participant sends its media stream to the central conference bridge from where it gets the mixed signal of the other participants.

If the conference controller is a terminal, this is also responsible for forwarding the media streams to all conference participants themselves.


Decentralized Exchange of the Media Stream

All equal stations in mesh replace the media stream directly with all other conference participants with each other (via point-to-point connections).

Another variant is the transmission of the media stream via a multicast-enabled network infrastructure. Here rendezvous points are established in the IP network, managing the participation, the join and leave of participants of a multicast group. These points also distribute signals from subscribers to other subscribers.


RFC 4353 SIP Conferencing Framework

RFC 4353 describes “A Framework for Conferencing with the Session Initiation Protocol (SIP)”. The central point of the conference is called “Focus”. The SIP dialog to the “participants” takes place separately for each participant. The “Focus” contains all the functions for a conference. Users of the “Conference Notification Service” receive appropriate status messages if they want to use the service. A Conference Policy Server manages user rights.


Availability of Solutions

In the world market there are a number of manufacturers that offer SIP-enabled conferencing solutions for different application areas and sizes. Here one must distinguish between solutions for providers, large corporate clients, small and medium enterprises and private users.

Every major Voice over IP and unified communications solution provider offers conference systems in different forms for a long time. Whether these systems communicate via SIP or use proprietary protocols must be clarified from case to case.

For several years, cloud-based solutions are booming.


Don´t miss any new blog post and register to Ronald Schlager´s Blog Newsletter.




Books / Courseware:

SIP Conferencing

Cover page of courseware ‘SIP – The Key to VoIP’

SIP – The Key to VoIP
Part 1: ISBN/EAN13: 1483979172/ 9781483979175
Part 2: ISBN/EAN13: 1483985903/ 9781483985909
Additional books and courseware about business VoIP

Seminar „SIP Protocol – Details

RFCs: 3261, 4353


Visit Ronald Schlager´s Blog about business VoIP


About the author

Ronald Schlager is independent trainer, consultant, book author and blogger about communications technologies and their applications.


Ronald Schlager´s profiles

Amazon, LinkedIn, Smashwords, Twitter, Xing