Please note: this was the first time I was working on a USB and ALSA driver. Everything I write here is my understanding after I read existing source code and documentations. Hence, some explanations might not be fully accurate but it should give you a good understanding of how things work.

If you buy a USB sound card today, there is a good chance that the device will work out of the box due to the USB device class definition for audio devices which defines a standard interface for audio devices over USB and is supported by the Linux kernel for a long time. However, if the device does not use the standard audio interface, like the Behringer BCD2000 in this article, it won't work without an own driver and the only thing you will see in dmesg is something like this:

[69750.030141] usb 3-2: new full-speed USB device number 5 using xhci_hcd
[69750.202045] usb 3-2: New USB device found, idVendor=1397, idProduct=00bd
[69750.202047] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[69750.202048] usb 3-2: Product: BCD2000     
[69750.202050] usb 3-2: Manufacturer: Behringer

You can investigate this a little bit further by executing lsusb -v -d 1397:00bd in a terminal:

Bus 003 Device 005: ID 1397:00bd BEHRINGER International GmbH 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.00
  bDeviceClass            0
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x1397 BEHRINGER International GmbH
  idProduct          0x00bd
  bcdDevice            0.00
  iManufacturer           1 Behringer        
  iProduct                2 BCD2000     
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           46
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           4
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0
      bInterfaceProtocol      0
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0180  1x 384 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0180  1x 384 bytes
        bInterval               1

The interesting part is here: bInterfaceClass 255 Vendor Specific Class. This means that the device indeed uses a proprietary interface. As no other module in the kernel claims to support the device, we have to write our own module.

Initialization

The first thing we need to do is to find out how we have to talk with the device. As there is no documentation regarding the protocol, we can only try to observe how the official Windows driver communicates with the device. Luckily, eavesdropping on a USB communication is quite simple today if you have Windows in a virtual machine on a Linux host. The usbmon kernel module enables a separate process to follow an arbitrary conversation with a USB device and the versatile network packet analyzer Wireshark is able to connect to usbmon and collect the packet traffic between the virtual machine and the device. Hence, we load usbmon with modprobe usbmon as root user and start Wireshark.

The first thing we need to do in Wireshark is finding the right USB bus. You can either check the output of dmesg or lsusb or simply look at the traffic monitor of Wireshark for some time while you cause traffic on the bus, e.g., playing a sound in the virtual machine:

In this example, we see that, in the end, there is only traffic on bus number 3 which is corresponds to usbmon3. If you double-click on usbmon3, Wireshark starts capturing the traffic.

The first thing we need for the new driver is the initialization sequence. We should see the corresponding packets if we unplug and plug the USB device. In this article, we use VirtualBox to execute Windows in a virtual machine (VM) on a Linux host. If the Windows VM is running, you can attach and detach a USB device in the VM menu under Devices -> USB devices without having to physically unplug the device. After detaching and attaching the device again, we usually see a lot of packets in Wireshark:

The first packets that are exchanged are independent of the specific protocol and just share some general information between the host and device. The first packet that is of interest for us is marked in the upper image. The source and destination columns show if we or the device were sending the packet. The length shows how many bytes the packet contains and the info shows a summary of the packet's content. In the USB terminology, a packet is usually referred to as USB request block (URB). URBs can have different types: control, interrupt, bulk and isochronous (ISO) URBs. Interrupt URBs are used to transfer messages on certain events. Bulk and isochronous URBs transfer larger amount of data. The difference between them is that for isochronous URBs, bandwidth is reserved to guarantee successful transmission, e.g., for time critical data like audio or video streams, while bulk URBs are transferred on best effort basis, e.g., to transfer files to an external hard disk. The disadvantage of isochronous transfers is that if the bandwidth cannot be guaranteed, sending a URB fails immediately. Hence, one has to consider carefully how much bandwidth is required.

The marked URB is sent from the host to the device, has a length of 116 bytes and is an interrupt URB. The "out" or "in" refers to the endpoint the URB belongs to. As we saw in the lsusb output above, this USB device has four endpoints. Each endpoint has a transfer type (interrupt or isochronous in this case), a direction ("in" or "out"), an address and a maximum packet size. The "interrupt out" endpoint is used to send interrupts from the host to the device and the "interrupt in" is used by the device to send data to the host. To make things more complicated, an URB can be send in both directions for each "in" or "out" endpoint. The difference between the two endpoints is who is in charge of initiating a regular transfer. As we see in the above image, the host initially sends an URB to both endpoints. However, only the URB that belongs to the "out" endpoint contains actual initialization data while the URB to the "in" endpoint contains uninitialized data. The logic behind this is that an URB is used like a token. The URB can be passed between host and device and only the side that currently has the URB (or token) is allowed to write data into it. Hence, before the device can send actual data over its "in" endpoint, the host has to pass the token to the device.

A first interesting observation here is that only interrupt URBs are exchanged and there are no isochronous transfers which one would expect for audio data. To get a rough idea of what is happening here, a first attempt to gain additional information is to just detach and reattach the device several times and comparing the data payload of the URBs. This is valid operation and should not create any confusion in the internal state machines. If we would just repeat observed packets or send random data, we could maybe initiate a firmware flashing routine with some bad luck, for example. After comparing the first packets, it looked like some bytes at certain positions always stay the same, some vary randomly and some switch between certain values. As I could not associate the fixed values with anything meaningful, I just took the data payload from one session and stored the values for later.

Controlling knobs, sliders and LEDs

The next thing I did was pressing some random buttons and rotating some knobs. For every press, additional interrupt URBs appeared in the list. Hence, the device obviously uses interrupt URBs to indicate such events. In the following picture, we see the raw data of the URB with its payload highlighted:

The bytes before the payload belong to the actual URB structure similar to a header in network packages. Wireshark also decodes these values for you if you unfold "USB URB" in the upper frame. In this example, we see that the payload is 60 bytes long which is quite a lot information for a single pressed button. After looking at a few URBs, I noticed that only some of the bytes change although I press different buttons. For example, if we only look at the first 8 bytes of an URB, we get the following byte sequences with one URB payload per row:

01 90 00 71 03 fa c2 98
02 1b 7f 71 03 fa c2 98
01 1b 7f 71 03 fa c2 98
01 00 7f 71 03 fa c2 98
01 90 7f 71 03 fa c2 98
02 07 7f 71 03 fa c2 98
01 07 7f 71 03 fa c2 98
01 00 7f 71 03 fa c2 98

As we can see, only the first three bytes change. The first byte can be either 1 or 2 and the second and third byte change only sometimes. Long story short, what you see are MIDI commands that are prefixed with a length byte that indicates how many actual data bytes follow. Hence, the third byte is only used in URBs that start with a 2 and all remaining bytes have undefined values. The above sequences translate into the following:

90 1b 7f 1b 00 90 07 7f 07 00

In turn, these bytes can be translated into: note-on event (90) button 1b set to value 7f, button 1b set to value 00, note-on event (90) button 07 set to value 7f, button 07 set to 00. This means that button 7f was pressed and released before button 07 was pressed and released.

After pressing a button with an LED, I noticed that also the driver sent an URB to the device and the LED was turned on. The URB started with the following bytes:

03 00 03 b0 0a 7f

Every other packet that was send to the device also started with 03 00 followed by the length byte and, again, a MIDI command. Hence, 03 00 looks like a fixed prefix. I did not experience other prefixes during normal operation but there might be other undocumented commands the device would accept. After the prefix, we see another MIDI command that is three bytes long: b0 0a 7a which can be translated to Continuous controller (b0) command, set LED 0a to 7f (on).

Audio capturing and playback

The next step after getting the knobs, sliders and LEDs to work was the audio part of the device. The BCD2000 offers two stereo inputs and two stereo outputs. There are three cinch jacks on the back of the device. Two act as stereo inputs and one as master stereo output. On the front, there is another jack for the headphones. If we start recording or playing in an application, the driver sends URBs to both isochronous endpoints and, afterwards, the device continuously sends URBs to the host over the "in" endpoint and the host to the device over the "out" endpoint. An interesting observation is that the URBs have different sizes and if we look at their payload in Wireshark, we see that the payload data always starts with a similar looking byte sequence. This sequence is actually also part of the URB header and contains information about the isochronous frames. The payload of an isochronous URB is split into frames that each has, e.g., an own maximum payload length and the number of bytes that hold actual data. The number of frames in an URB packet can vary as well but is limited by the size of the URB buffer. Unfortunately, the iso frame headers are not interpreted by Wireshark. Hence, we have to skip these bytes during our analysis of the payload.

To get an initial idea of the data structure in the payload, I only connected one audio source to an input jack. In the following picture, the resulting data in an URB is shown.

By comparing multiple URBs, I saw that only the data in the highlighted columns changes considerably while the bytes in the other columns only change between 0x00 and 0xFF. I assume the latter is simply caused by noise in the analog digital converter. After I connected only the other input jack, the highlighted columns show only bytes with a value of 0x00 or 0xFF. Hence, we can assume the data for the inputs are interleaved. To analyze the bytes inside these columns, I only connected one jack of an input, hence, the device is only capturing mono audio instead of stereo. Likewise, only one half of the highlighted columns showed varying data. Therefore, we can assume that also the channels of an audio source are interleaved and every channel has a precision of 2 bytes or 16 bits. Consequently, the data consists of the following repeating structure:

I1C1U I1C1L I1C2U I1C2L I2C1U I2C1L I2C2U I2C2L

In each item IxCyz x denotes the input jack, y denotes the left (1) or right (2) channel and z the upper (U) or lower (L) 8 bits.

By varying the output and the left/right channel in an audio player, I verified that the same structure is also used for audio output.

Driver development

Most parts of the driver code is very similar to other modules and can be copied from them. Therefore, I will not cover all details of the module in this article and recommend reading the code of similar drivers. Other tutorials can be easily found on the net, e.g., [1, 4]. The central functionality basically follows the same scheme for the audio and the MIDI part. First, you have to allocate and initialize the URBs with the correct size, type and endpoint configuration. During URB initialization, you also register a complete handler for each URB that is called when an URB is passed from the device to the host. In this handler, you either copy the data from the URB payload into an ALSA buffer or from an ALSA buffer into the URB and then pass the URB back.

While this is almost everything one has to customize for a MIDI driver, the additional code required for the audio part is a little bit more complicated. For an audio or a so-called PCM device, it is important to understand two structs: struct snd_pcm_ops and struct snd_pcm_hardware. snd_pcm_ops is initialized with pointers to functions that the ALSA core will call on certain events, e.g., an audio player is opening or closing the device. Some of the function pointers are initialized with standard functions provided by ALSA itself and some functions can be almost completely copied from other drivers. An interesting function is the so-called pointer callback a developer has to provide. The purpose of this function is to tell the ALSA core how much data we already sent to the device. This is important as the buffer is organized as a ring buffer and the ALSA core has to know how much new data it can write into the buffer.

The snd_pcm_hardware structure gives the ALSA core additional information about our driver like what kind of audio format the device accepts. A description of this structure can also be found in the documentation by the ALSA maintainer here.

To test the module, I also created a virtual machine as one can easily cause, e.g., a segmentation fault in the kernel and rebooting the main operating system every time becomes tedious. While I tested the audio part, I could not manage to create an undistorted playback. Initially, I thought that I may have chosen a wrong setting in the snd_pcm_hardware structure or somehow accessed the ring buffer in a wrong way. However, after several hours of trying different things without success, I just tested the module with the kernel of the host operating system and, indeed, the sound was perfect. Hence, somehow my code confuses VirtualBox or it triggers a bug in its USB code. Unfortunately, I did not find the cause for this, yet.

The resulting source code of the module can be found on GitHub: https://github.com/anyc/snd-bcd2000

The MIDI part of this module was included in the official Linux kernel sources starting with version 3.16. The audio part took a little bit more time and I am still working on preparing the source for inclusion in the "audio" branch on GitHub.

References:

[1] http://www.linuxjournal.com/article/7353

[2] http://www.evinyatar.be/sphpblog/static.php?page=bcd2ktech

[3] http://gareus.org/wiki/digi003

[4] http://ben-collins.blogspot.de/2010/05/writing-alsa-driver-basics.html

[5] http://www.alsa-project.org/~tiwai/alsa-driver-api/index.html