Lockfree ringbuffer review

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Lockfree ringbuffer review

Stefan Westerfeld
   Hi!

I've just completed writing a lock free ringbuffer to get a JACK driver
implemented. The only special thing is that it is frame based, because I
saw that the JACK driver code itself really got messy if you always have
to ensure that you don't by accident read or write half frames from/to
the ringbuffer.

So here you can create a ringbuffer with 1024 frames of float audio
data, where each frame contains two floats (for the channels), by
constructing a FrameRingBuffer<float> (1024, 2), and the ringbuffer
itself will ensure that you only deal with complete frames afterwards.

I would really like to get some review for the code. It works within the
JACK driver, but that of course doesn't say that its bug-free.

template<class T>
class FrameRingBuffer {
  //BIRNET_PRIVATE_COPY (FrameRingBuffer);
private:
  vector<T> buffer;
  typedef typename vector<T>::iterator BufferIterator;
  int read_frame_pos;
  int write_frame_pos;
  guint elements_per_frame;
public:
  FrameRingBuffer (guint n_frames = 0,
                   guint elements_per_frame = 1)
  {
    resize (n_frames, elements_per_frame);
  }
  /**
   * checks available read space in the ringbuffer
   *
   * @returns the number of frames that are available for reading
   */
  guint
  read_space()
  {
    int wpos = Atomic::int_get (&write_frame_pos);
    int rpos = Atomic::int_get (&read_frame_pos);
    int size = buffer.size() / elements_per_frame;

    if (wpos < rpos)    /* wpos == rpos -> empty ringbuffer */
      wpos += size;

    return wpos - rpos;
  }
  /**
   * reads data from the ringbuffer
   *
   * @returns the number of successfully read frames
   */
  guint
  read (guint n_frames, T* frames)
  {
    int rpos = Atomic::int_get (&read_frame_pos);
    guint size = buffer.size() / elements_per_frame;
    guint can_read = min (read_space(), n_frames);

    BufferIterator start = buffer.begin() + rpos * elements_per_frame;
    guint read1 = min (can_read, size - rpos) * elements_per_frame;
    copy (start, start + read1, frames);

    guint read2 = can_read * elements_per_frame - read1;
    copy (buffer.begin(), buffer.begin() + read2, frames + read1);

    Atomic::int_set (&read_frame_pos, (rpos + can_read) % size);
    return can_read;
  }
  /**
   * checks available write space in the ringbuffer
   *
   * @returns the number of frames that can be written
   */
  guint
  write_space()
  {
    int wpos = Atomic::int_get (&write_frame_pos);
    int rpos = Atomic::int_get (&read_frame_pos);
    int size = buffer.size() / elements_per_frame;

    if (rpos <= wpos)    /* wpos == rpos -> empty ringbuffer */
      rpos += size;

    /* the extra element allows us to see the difference between
     *  - empty ringbuffer
     *  - full ringbuffer
     */
    return rpos - wpos - 1;
  }
  /**
   * writes data to the ringbuffer
   *
   * @returns the number of successfully written frames
   */
  guint
  write (guint n_frames, const T* frames)
  {
    int wpos = Atomic::int_get (&write_frame_pos);
    guint size = buffer.size() / elements_per_frame;
    guint can_write = min (write_space(), n_frames);

    BufferIterator start = buffer.begin() + wpos * elements_per_frame;
    guint write1 = min (can_write, size - wpos) * elements_per_frame;
    copy (frames, frames + write1, start);

    guint write2 = can_write * elements_per_frame - write1;
    copy (frames + write1, frames + write1 + write2, buffer.begin());

    Atomic::int_set (&write_frame_pos, (wpos + can_write) % size);
    return can_write;
  }
  /**
   * returns the maximum number of frames that the ringbuffer can contain
   */
  guint
  size() const
  {
    return (buffer.size() - 1) / elements_per_frame;
  }
  /**
   * clears the ringbuffer
   * this function is not! threadsafe
   */
  void
  clear()
  {
    Atomic::int_set (&read_frame_pos, 0);
    Atomic::int_set (&write_frame_pos, 0);
  }
  /**
   * resizes and clears the ringbuffer
   * this function is not! threadsafe
   */
  void
  resize (guint n_frames,
          guint elements_per_frame = 1)
  {
    this->elements_per_frame = elements_per_frame;
    buffer.resize ((n_frames + 1) * elements_per_frame);
    clear();
  }
};

   Cu... Stefan
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: Lockfree ringbuffer review

Tim Janik-2
On Tue, 30 May 2006, Stefan Westerfeld wrote:

>   Hi!

well, for the most part, i can't make too much sense of the code,
because i don't know how it's expected to be used which is also not
documented anywhere.
a blurb about what thread model and communication patterns you're
using seems to be neccessary, and it probably should go into the
code directly.
also, when doing that, you should add comments next to functions
explaining which thread/entity/etc. is supposed to call it.

so, i'm going to make mostly stylistic comments here. in particular
i can't say anything about the atomic ops. i just noted that you're
only using int_set()/int_get() instead of the usual int_get()/int_cas()
(note that int_set() is just provided for initialization purposes).
and that doesn't make too much sense to me, at least not without the
cmmunication model blurb i sked for above.

> template<class T>
> class FrameRingBuffer {
>  //BIRNET_PRIVATE_COPY (FrameRingBuffer);
> private:
>  vector<T> buffer;
>  typedef typename vector<T>::iterator BufferIterator;
>  int read_frame_pos;
>  int write_frame_pos;

these pointers should be volatile if you mean to access them atomically.

>  guint elements_per_frame;

members should be prefixed with "m_".


> public:
>  FrameRingBuffer (guint n_frames = 0,
>   guint elements_per_frame = 1)
>  {
>    resize (n_frames, elements_per_frame);

to avoid shadowing of members via local variables or function arguments
like here.

>  }
>  /**
>   * checks available read space in the ringbuffer
>   *
>   * @returns the number of frames that are available for reading

i think read_space() is a very irritating funciton name for this,
just like write_space() below. the "verb object" form indicates that
this function *does* something, i.e. read/write.
simply renaming to get_read_space() would fix that. but then, "space"
si allmost a dummy word in that it could mean nearly anything (e.g.
bytes, bits, pages, whatever). afaiu, you mean to calculate the
available number of frames here, so get_readable_frames() and
get_writable_frames() could be possible function names.


>   */
>  guint
>  read_space()
>  {
>    int wpos = Atomic::int_get (&write_frame_pos);
>    int rpos = Atomic::int_get (&read_frame_pos);
>    int size = buffer.size() / elements_per_frame;
>
>    if (wpos < rpos)    /* wpos == rpos -> empty ringbuffer */
>      wpos += size;

if just wpos == rpos means the ringbuffer is empty, how do you
distinguish that from a full ring buffer then?

>
>    return wpos - rpos;
>  }
>  /**
>   * reads data from the ringbuffer
>   *
>   * @returns the number of successfully read frames
>   */
>  guint
>  read (guint n_frames, T* frames)
>  {
>    int rpos = Atomic::int_get (&read_frame_pos);
>    guint size = buffer.size() / elements_per_frame;
>    guint can_read = min (read_space(), n_frames);
>
>    BufferIterator start = buffer.begin() + rpos * elements_per_frame;
>    guint read1 = min (can_read, size - rpos) * elements_per_frame;
>    copy (start, start + read1, frames);

you should use bseblockuitls.h functions to copy floats.

>
>    guint read2 = can_read * elements_per_frame - read1;
>    copy (buffer.begin(), buffer.begin() + read2, frames + read1);
>
>    Atomic::int_set (&read_frame_pos, (rpos + can_read) % size);
>    return can_read;
>  }
>  /**
>   * checks available write space in the ringbuffer
>   *
>   * @returns the number of frames that can be written
>   */
>  guint
>  write_space()
>  {
>    int wpos = Atomic::int_get (&write_frame_pos);
>    int rpos = Atomic::int_get (&read_frame_pos);
>    int size = buffer.size() / elements_per_frame;
>
>    if (rpos <= wpos)    /* wpos == rpos -> empty ringbuffer */
>      rpos += size;
>
>    /* the extra element allows us to see the difference between
>     *  - empty ringbuffer
>     *  - full ringbuffer
>     */
>    return rpos - wpos - 1;
>  }
>  /**
>   * writes data to the ringbuffer
>   *
>   * @returns the number of successfully written frames
>   */
>  guint
>  write (guint n_frames, const T* frames)
>  {
>    int wpos = Atomic::int_get (&write_frame_pos);
>    guint size = buffer.size() / elements_per_frame;
>    guint can_write = min (write_space(), n_frames);
>
>    BufferIterator start = buffer.begin() + wpos * elements_per_frame;
>    guint write1 = min (can_write, size - wpos) * elements_per_frame;
>    copy (frames, frames + write1, start);
>
>    guint write2 = can_write * elements_per_frame - write1;
>    copy (frames + write1, frames + write1 + write2, buffer.begin());
>
>    Atomic::int_set (&write_frame_pos, (wpos + can_write) % size);
>    return can_write;
>  }
>  /**
>   * returns the maximum number of frames that the ringbuffer can contain
>   */
>  guint
>  size() const
>  {
>    return (buffer.size() - 1) / elements_per_frame;
>  }
>  /**
>   * clears the ringbuffer
>   * this function is not! threadsafe

in general, if you mean to put docuemtnation comments next to your functions,
please use regular sentences and punctuation. emphasis can be added with
@emph{}, not an exclamation mark in the middle of a sentence.
also, functions that take arguments and return values need extra docs, like:
  * @param file_pattern  wildcard pattern for file names
  * @return              a singly linked list with newly allocated strings


>   */
>  void
>  clear()
>  {
>    Atomic::int_set (&read_frame_pos, 0);
>    Atomic::int_set (&write_frame_pos, 0);
>  }
>  /**
>   * resizes and clears the ringbuffer
>   * this function is not! threadsafe
>   */
>  void
>  resize (guint n_frames,
>          guint elements_per_frame = 1)
>  {
>    this->elements_per_frame = elements_per_frame;
>    buffer.resize ((n_frames + 1) * elements_per_frame);
>    clear();
>  }
> };
>
>   Cu... Stefan
> --
> Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
>

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: Lockfree ringbuffer review

Stefan Westerfeld
   Hi!

On Tue, May 30, 2006 at 05:13:10PM +0200, Tim Janik wrote:
> On Tue, 30 May 2006, Stefan Westerfeld wrote:
> well, for the most part, i can't make too much sense of the code,
> because i don't know how it's expected to be used which is also not
> documented anywhere.
> a blurb about what thread model and communication patterns you're
> using seems to be neccessary, and it probably should go into the
> code directly.
> also, when doing that, you should add comments next to functions
> explaining which thread/entity/etc. is supposed to call it.

Ok, I added quite a bit of text now. Let me know if it needs to be
extended / fixed.

> so, i'm going to make mostly stylistic comments here. in particular
> i can't say anything about the atomic ops. i just noted that you're
> only using int_set()/int_get() instead of the usual int_get()/int_cas()
> (note that int_set() is just provided for initialization purposes).
> and that doesn't make too much sense to me, at least not without the
> cmmunication model blurb i sked for above.

I designed the code with only two things that I need from the atomic ints
in mind: they should be atomic (i.e. no half integers), and setting them
should actually force the memory really to be written (i.e. they should
act as memory barriers).

In the old implementation, I've forgotten one situation where a memory
barrier is needed. You'll find it in the new code, explicitely marked as
write_memory_barrier.

> > int read_frame_pos;
> > int write_frame_pos;
>
> these pointers should be volatile if you mean to access them atomically.

Fixed.

> > guint elements_per_frame;
>
> members should be prefixed with "m_".

Fixed.

> > }
> > /**
> >  * checks available read space in the ringbuffer
> >  *
> >  * @returns the number of frames that are available for reading
>
> i think read_space() is a very irritating funciton name for this,
> just like write_space() below. the "verb object" form indicates that
> this function *does* something, i.e. read/write.
> simply renaming to get_read_space() would fix that. but then, "space"
> si allmost a dummy word in that it could mean nearly anything (e.g.
> bytes, bits, pages, whatever). afaiu, you mean to calculate the
> available number of frames here, so get_readable_frames() and
> get_writable_frames() could be possible function names.

Fixed. Although I found space in other ringbuffers. But your names are
not too bad, either.

> > guint
> > read_space()
> > {
> >   int wpos = Atomic::int_get (&write_frame_pos);
> >   int rpos = Atomic::int_get (&read_frame_pos);
> >   int size = buffer.size() / elements_per_frame;
> >
> >   if (wpos < rpos)    /* wpos == rpos -> empty ringbuffer */
> >     wpos += size;
>
> if just wpos == rpos means the ringbuffer is empty, how do you
> distinguish that from a full ring buffer then?

I use one extra frame, and the ringbuffer will never be completely
filled. So a 1024 element ringbuffer actually uses 1025 elements, while
at most 1024 of them can be filled in any given situation.

I spread the comment about it in the code now, so that everywhere where
a (x + 1) or (x - 1) computation can be found in the code due to the
extra element, you'll also find a comment about it.

> >   BufferIterator start = buffer.begin() + rpos * elements_per_frame;
> >   guint read1 = min (can_read, size - rpos) * elements_per_frame;
> >   copy (start, start + read1, frames);
>
> you should use bseblockuitls.h functions to copy floats.

Well, I thought of it, but copy can copy anything, be it bytes or
std::strings. The ringbuffer is a template (so you can instantiate it
with floats or bytes or anything else), so just plugging bseblockutils.h
won't work.

I am not sure in how far the current ringbuffer code is useful for
moving larger objects between threads (like smart pointers to big chunks
of memory that have been filled by a reader thread), especially because
when they are copied out from the ringbuffer, they are not explicitely
destroyed. But the copies and construction/destruction is at least done
properly (so you will never "forget" to execute a copy constructor like
you would with just using memcpy) eventually.

As a comment on efficiency, for floats copy calls memmove which in turn
calls memcpy, and bseblockutils.h uses memcpy. So there is no actual
efficiency loss right now. We could use template specialization to
enforce that if you're instantiating a float ringbuffer, then
bseblockutils will be used. But it wouldn't make things faster.

The benchmarks on my AMD64 suggest that replacing memcpy in
bseblockutils.h by SSE instructions will not have a positive effect on
performance. I don't know about the other Intel/AMD systems though.

Finally, if we were to optimize the JACK driver for maximum performance,
a more important factor would probably be that the ringbuffer even has
this API. It means that we need to deinterleave and reinterleave the
data JACK supplies on the stack, and then read/write it to the
ringbuffer. If the ringbuffer would offer an API to access its memory
directly, the extra copy would disappear. But it makes the code harder
to maintain and for now my first priority has been getting things
implemented correctly.

> > /**
> >  * clears the ringbuffer
> >  * this function is not! threadsafe
>
> in general, if you mean to put docuemtnation comments next to your
> functions,
> please use regular sentences and punctuation. emphasis can be added with
> @emph{}, not an exclamation mark in the middle of a sentence.
> also, functions that take arguments and return values need extra docs, like:
>  * @param file_pattern  wildcard pattern for file names
>  * @return              a singly linked list with newly allocated strings

I fixed some of it, although using special tags is hard to get right if
you don't have a web browser open which points to the same documentation
you're writing in that moment.

Anyway, finally, after all this discussion, here is the new code.


namespace {

/**
 * The FrameRingBuffer class implements a ringbuffer for the communication
 * between two threads. One thread - the producer thread - may only write
 * data to the ringbuffer. The other thread - the consumer thread - may
 * only read data from the ringbuffer.
 *
 * Given that these two threads only use the appropriate functions, no
 * other synchronization is required to ensure that the data gets safely
 * from the producer thread to the consumer thread. However, all operations
 * that are provided by the ringbuffer are non-blocking, so that you may
 * need a condition or other synchronization primitive if you want the
 * producer and/or consumer to block if the ringbuffer is full/empty.
 *
 * Implementation: the synchronization between the two threads is only
 * implemented by two index variables (read_frame_pos and write_frame_pos)
 * for which atomic integer reads and writes are required. Since the
 * producer thread only modifies the write_frame_pos and the consumer thread
 * only modifies the read_frame_pos, no compare-and-swap or similar
 * operations are needed to avoid concurrent writes.
 */
template<class T>
class FrameRingBuffer {
  //BIRNET_PRIVATE_COPY (FrameRingBuffer);
private:
  typedef typename vector<T>::iterator BufferIterator;

  vector<T>     m_buffer;
  volatile int  m_read_frame_pos;
  volatile int  m_write_frame_pos;
  guint         m_elements_per_frame;

  void
  write_memory_barrier()
  {
    static volatile int dummy = 0;

    /*
     * writing this dummy integer should ensure that all prior writes
     * are committed to memory
     */
    Atomic::int_set (&dummy, 0x12345678);
  }
public:
  FrameRingBuffer (guint n_frames = 0,
                   guint elements_per_frame = 1)
  {
    resize (n_frames, elements_per_frame);
  }
  /**
   * Checks available read space in the ringbuffer.
   * This function should be called from the consumer thread.
   *
   * @returns the number of frames that are available for reading
   */
  guint
  get_readable_frames()
  {
    int wpos = Atomic::int_get (&m_write_frame_pos);
    int rpos = Atomic::int_get (&m_read_frame_pos);
    int size = m_buffer.size() / m_elements_per_frame;

    if (wpos < rpos)    /* wpos == rpos -> empty ringbuffer */
      wpos += size;

    return wpos - rpos;
  }
  /**
   * Reads data from the ringbuffer; if there is not enough data
   * in the ringbuffer, the function will not block.
   * This function should be called from the consumer thread.
   *
   * @returns the number of successfully read frames
   */
  guint
  read (guint  n_frames,
        T     *frames)
  {
    int rpos = Atomic::int_get (&m_read_frame_pos);
    guint size = m_buffer.size() / m_elements_per_frame;
    guint can_read = min (get_readable_frames(), n_frames);

    BufferIterator start = m_buffer.begin() + rpos * m_elements_per_frame;
    guint read1 = min (can_read, size - rpos) * m_elements_per_frame;
    copy (start, start + read1, frames);

    guint read2 = can_read * m_elements_per_frame - read1;
    copy (m_buffer.begin(), m_buffer.begin() + read2, frames + read1);

    Atomic::int_set (&m_read_frame_pos, (rpos + can_read) % size);
    return can_read;
  }
  /**
   * Checks available write space in the ringbuffer.
   * This function should be called from the producer thread.
   *
   * @returns the number of frames that can be written
   */
  guint
  get_writable_frames()
  {
    int wpos = Atomic::int_get (&m_write_frame_pos);
    int rpos = Atomic::int_get (&m_read_frame_pos);
    guint size = m_buffer.size() / m_elements_per_frame;

    if (rpos <= wpos)    /* wpos == rpos -> empty ringbuffer */
      rpos += size;

    // the extra element allows us to see the difference between an empty/full ringbuffer
    return rpos - wpos - 1;
  }
  /**
   * Writes data to the ringbuffer; if there is not enough data
   * in the ringbuffer, the function will not block.
   * This function should be called from the producer thread.
   *
   * @returns the number of successfully written frames
   */
  guint
  write (guint    n_frames,
         const T *frames)
  {
    int wpos = Atomic::int_get (&m_write_frame_pos);
    guint size = m_buffer.size() / m_elements_per_frame;
    guint can_write = min (get_writable_frames(), n_frames);

    BufferIterator start = m_buffer.begin() + wpos * m_elements_per_frame;
    guint write1 = min (can_write, size - wpos) * m_elements_per_frame;
    copy (frames, frames + write1, start);

    guint write2 = can_write * m_elements_per_frame - write1;
    copy (frames + write1, frames + write1 + write2, m_buffer.begin());
 
    // It is important that the data from the previous writes get committed
    // to memory *before* the index variable is updated. Otherwise, the
    // consumer thread could be reading invalid data, if the index variable
    // got written before the rest of the data (when unordered writes are
    // performed).
    write_memory_barrier();

    Atomic::int_set (&m_write_frame_pos, (wpos + can_write) % size);
    return can_write;
  }
  /**
   * Returns the maximum number of frames that the ringbuffer can contain.
   *
   * This function can be called from any thread.
   */
  guint
  size() const
  {
    // the extra element allows us to see the difference between an empty/full ringbuffer
    return (m_buffer.size() - 1) / m_elements_per_frame;
  }
  /**
   * Clears the ringbuffer.
   *
   * This function is @emph{not} threadsafe, and can not be used while
   * either the producer thread or the consumer thread are modifying
   * the ringbuffer.
   */
  void
  clear()
  {
    Atomic::int_set (&m_read_frame_pos, 0);
    Atomic::int_set (&m_write_frame_pos, 0);
  }
  /**
   * Resizes and clears the ringbuffer.
   *
   * This function is @emph{not} threadsafe, and can not be used while
   * either the producer thread or the consumer thread are modifying
   * the ringbuffer.
   */
  void
  resize (guint n_frames,
          guint elements_per_frame = 1)
  {
    m_elements_per_frame = elements_per_frame;
    // the extra element allows us to see the difference between an empty/full ringbuffer
    m_buffer.resize ((n_frames + 1) * m_elements_per_frame);
    clear();
  }
};

   Cu... Stefan
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: Lockfree ringbuffer review

Esben Stien
In reply to this post by Stefan Westerfeld
Stefan Westerfeld <[hidden email]> writes:

> a lock free ringbuffer to get a JACK driver

Really excellent;)

> I would really like to get some review for the code.

Why not post it to jackit-devel?

--
Esben Stien is b0ef@e     s      a            
         http://www. s     t    n m
          irc://irc.  b  -  i  .   e/%23contact
           sip:b0ef@   e     e
           jid:b0ef@    n     n
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: Lockfree ringbuffer review

Tim Janik
In reply to this post by Stefan Westerfeld
On Wed, 31 May 2006, Stefan Westerfeld wrote:

> On Tue, May 30, 2006 at 05:13:10PM +0200, Tim Janik wrote:

>>>   BufferIterator start = buffer.begin() + rpos * elements_per_frame;
>>>   guint read1 = min (can_read, size - rpos) * elements_per_frame;
>>>   copy (start, start + read1, frames);
>>
>> you should use bseblockuitls.h functions to copy floats.
>
> Well, I thought of it, but copy can copy anything, be it bytes or
> std::strings. The ringbuffer is a template (so you can instantiate it
> with floats or bytes or anything else), so just plugging bseblockutils.h
> won't work.

i really think using a template here is overkill. it's highly unlikely
that we'll use this ringbuffer with ints or doubles. and using floats
allowes optimizations that currently are not possible.

> As a comment on efficiency, for floats copy calls memmove which in turn
> calls memcpy, and bseblockutils.h uses memcpy. So there is no actual
> efficiency loss right now.

if you judge bseblockutils.h by some default implementation of it,
you completely missed the point of having it. the functions it provides
are in place to make use of machine specific extensions to improve
performance. trading that for memcpy() will definitely be a loss.
and as an aside, the default implementation doesn't use memcpy() but
wmemcpy() which is faster due to alignment constraints.

> We could use template specialization to
> enforce that if you're instantiating a float ringbuffer, then
> bseblockutils will be used. But it wouldn't make things faster.

well, what's the use case for a non-float ringbuffer?

> The benchmarks on my AMD64 suggest that replacing memcpy in
> bseblockutils.h by SSE instructions will not have a positive effect on
> performance. I don't know about the other Intel/AMD systems though.

that is unrelated to the ring buffer. if SIMD copying was slower than
non-SIMD copying, this has to be fixed as part of an optimization of
the blockutils implementation, not by the ring buffer (or by the ring
buffer not using blockutils).

> Finally, if we were to optimize the JACK driver for maximum performance,
> a more important factor would probably be that the ringbuffer even has
> this API.

sorry? what API are you talking about?
void    bse_block_copy_float    (guint          n_values,
                                  gfloat        *values,
                                  const gfloat  *ivalues);
copying?

> It means that we need to deinterleave and reinterleave the
> data JACK supplies on the stack, and then read/write it to the
> ringbuffer. If the ringbuffer would offer an API to access its memory
> directly, the extra copy would disappear. But it makes the code harder
> to maintain and for now my first priority has been getting things
> implemented correctly.

i don't quite understand this point.
are you saying that replacing memcpy() by bse_block_copy_float() can't
be done due to interleaving? :-O

> namespace {
>
> /**
> * The FrameRingBuffer class implements a ringbuffer for the communication
> * between two threads. One thread - the producer thread - may only write
> * data to the ringbuffer. The other thread - the consumer thread - may
> * only read data from the ringbuffer.
> *
> * Given that these two threads only use the appropriate functions, no
> * other synchronization is required to ensure that the data gets safely
> * from the producer thread to the consumer thread. However, all operations
> * that are provided by the ringbuffer are non-blocking, so that you may
> * need a condition or other synchronization primitive if you want the
> * producer and/or consumer to block if the ringbuffer is full/empty.

s/if/on the event that/ ?

> *
> * Implementation: the synchronization between the two threads is only
> * implemented by two index variables (read_frame_pos and write_frame_pos)
> * for which atomic integer reads and writes are required. Since the
> * producer thread only modifies the write_frame_pos and the consumer thread
> * only modifies the read_frame_pos, no compare-and-swap or similar
> * operations are needed to avoid concurrent writes.
> */

good comment btw, i now have an idea on why/how things are supposed
to work and can scrutinize your implementaiton accordingly ;)

> template<class T>
> class FrameRingBuffer {
>  //BIRNET_PRIVATE_COPY (FrameRingBuffer);

what's this comment for?

> private:
>  typedef typename vector<T>::iterator BufferIterator;
>
>  vector<T>     m_buffer;
>  volatile int  m_read_frame_pos;
>  volatile int  m_write_frame_pos;
>  guint         m_elements_per_frame;
>
>  void
>  write_memory_barrier()
>  {
>    static volatile int dummy = 0;
>
>    /*
>     * writing this dummy integer should ensure that all prior writes
>     * are committed to memory
>     */
>    Atomic::int_set (&dummy, 0x12345678);
>  }
> public:
>  FrameRingBuffer (guint n_frames = 0,
>   guint elements_per_frame = 1)
>  {
>    resize (n_frames, elements_per_frame);
>  }
>  /**
>   * Checks available read space in the ringbuffer.

s/Checks/Check/

>   * This function should be called from the consumer thread.

s/should/may only/ since i suppose the consumer thread doesn't _have_
to call it ;)
(same for all other functions)

>   *
>   * @returns the number of frames that are available for reading

hm, conventionally we put argument/return value blurbs before the actual
funciton doc paragraph.

>   */
>  guint
>  get_readable_frames()
>  {
>    int wpos = Atomic::int_get (&m_write_frame_pos);
>    int rpos = Atomic::int_get (&m_read_frame_pos);
>    int size = m_buffer.size() / m_elements_per_frame;
>
>    if (wpos < rpos)    /* wpos == rpos -> empty ringbuffer */
>      wpos += size;
>
>    return wpos - rpos;
>  }
>  /**
>   * Reads data from the ringbuffer; if there is not enough data
>   * in the ringbuffer, the function will not block.
>   * This function should be called from the consumer thread.
>   *
>   * @returns the number of successfully read frames
>   */
>  guint
>  read (guint  n_frames,
>        T     *frames)
>  {
>    int rpos = Atomic::int_get (&m_read_frame_pos);
>    guint size = m_buffer.size() / m_elements_per_frame;
>    guint can_read = min (get_readable_frames(), n_frames);
>
>    BufferIterator start = m_buffer.begin() + rpos * m_elements_per_frame;
>    guint read1 = min (can_read, size - rpos) * m_elements_per_frame;
>    copy (start, start + read1, frames);
>
>    guint read2 = can_read * m_elements_per_frame - read1;
>    copy (m_buffer.begin(), m_buffer.begin() + read2, frames + read1);
>
>    Atomic::int_set (&m_read_frame_pos, (rpos + can_read) % size);
>    return can_read;
>  }
>  /**
>   * Checks available write space in the ringbuffer.
>   * This function should be called from the producer thread.
>   *
>   * @returns the number of frames that can be written
>   */
>  guint
>  get_writable_frames()
>  {
>    int wpos = Atomic::int_get (&m_write_frame_pos);
>    int rpos = Atomic::int_get (&m_read_frame_pos);
>    guint size = m_buffer.size() / m_elements_per_frame;
>
>    if (rpos <= wpos)    /* wpos == rpos -> empty ringbuffer */
>      rpos += size;
>
>    // the extra element allows us to see the difference between an empty/full ringbuffer

s/element/frame/ right?

>    return rpos - wpos - 1;
>  }
>  /**
>   * Writes data to the ringbuffer; if there is not enough data

s/enough data/enough free space/ if i understand correctly.

>   * in the ringbuffer, the function will not block.

hm, it be better to say what the function does rather than saying
"it will not block", i.e. what it not does.
afaiu, this should be:

    * Write data to the ringbuffer; if there is not enough free space
    * in the ringbuffer, the function will return with the amount of
    * frames consumed by a partial write.

>   * This function should be called from the producer thread.
>   *
>   * @returns the number of successfully written frames
>   */
>  guint
>  write (guint    n_frames,
>         const T *frames)
>  {
>    int wpos = Atomic::int_get (&m_write_frame_pos);
>    guint size = m_buffer.size() / m_elements_per_frame;
>    guint can_write = min (get_writable_frames(), n_frames);
>
>    BufferIterator start = m_buffer.begin() + wpos * m_elements_per_frame;
>    guint write1 = min (can_write, size - wpos) * m_elements_per_frame;
>    copy (frames, frames + write1, start);
>
>    guint write2 = can_write * m_elements_per_frame - write1;
>    copy (frames + write1, frames + write1 + write2, m_buffer.begin());
>
>    // It is important that the data from the previous writes get committed
>    // to memory *before* the index variable is updated. Otherwise, the
>    // consumer thread could be reading invalid data, if the index variable
>    // got written before the rest of the data (when unordered writes are
>    // performed).
>    write_memory_barrier();
>
>    Atomic::int_set (&m_write_frame_pos, (wpos + can_write) % size);
>    return can_write;
>  }
>  /**
>   * Returns the maximum number of frames that the ringbuffer can contain.
>   *
>   * This function can be called from any thread.
>   */
>  guint
>  size() const
>  {
>    // the extra element allows us to see the difference between an empty/full ringbuffer
>    return (m_buffer.size() - 1) / m_elements_per_frame;
>  }

ugh this is mighty confusing.
you should get rid of this function or at least clearly rename it.
seeing it, i had to go back and read your code all over to figure whether you
used size() in every place or m_buffer.size().
if you need a name, call it total_n_frames(), allthough it seems this function
isn't used/needed at all.

>  /**
>   * Clears the ringbuffer.

s/Clears/Clear/, applies to lots of other comments as well.

>   *
>   * This function is @emph{not} threadsafe, and can not be used while

well, depends on what you mean by "threadsafe". i'd simply leave that
out and just write:

    * Clear the ringbuffer.
    * This function may not be used while
    * either the producer thread or the consumer thread are modifying
    * the ringbuffer.

(also fixed "can" vs. "may")

>   * either the producer thread or the consumer thread are modifying
>   * the ringbuffer.
>   */
>  void
>  clear()
>  {
>    Atomic::int_set (&m_read_frame_pos, 0);
>    Atomic::int_set (&m_write_frame_pos, 0);
>  }

is this really needed outside of resize() though?

>  /**
>   * Resizes and clears the ringbuffer.
>   *
>   * This function is @emph{not} threadsafe, and can not be used while

comment from above applies as well.

>   * either the producer thread or the consumer thread are modifying
>   * the ringbuffer.
>   */
>  void
>  resize (guint n_frames,
>          guint elements_per_frame = 1)
>  {
>    m_elements_per_frame = elements_per_frame;
>    // the extra element allows us to see the difference between an empty/full ringbuffer

s/element/frame/ to stick to your terminology

>    m_buffer.resize ((n_frames + 1) * m_elements_per_frame);
>    clear();

looks to me like you could simply inline resetting of the
indices here.

>  }
> };

hm, don't you want review on the rest of the driver as well?
(that'd prolly also be the part that is more interesting for Paul)

>
>   Cu... Stefan

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Beast Jack driver (Re: Lockfree ringbuffer review)

Stefan Westerfeld
   Hi!

On Thu, Jun 01, 2006 at 05:14:15PM +0200, Tim Janik wrote:

> On Wed, 31 May 2006, Stefan Westerfeld wrote:
> >On Tue, May 30, 2006 at 05:13:10PM +0200, Tim Janik wrote:
> >Well, I thought of it, but copy can copy anything, be it bytes or
> >std::strings. The ringbuffer is a template (so you can instantiate it
> >with floats or bytes or anything else), so just plugging bseblockutils.h
> >won't work.
>
> i really think using a template here is overkill. it's highly unlikely
> that we'll use this ringbuffer with ints or doubles. and using floats
> allowes optimizations that currently are not possible.
>
> [...]
>
> >We could use template specialization to
> >enforce that if you're instantiating a float ringbuffer, then
> >bseblockutils will be used. But it wouldn't make things faster.
>
> well, what's the use case for a non-float ringbuffer?

I've just read my way through the new jack midi API, and we'll need a
ring buffer to get the midi events from the realtime thread, where they
will be delivered by jack, and our midi thread. So yes, there is a use
case, which will probably have unsigned char as data type, or even
something like struct MidiEvent.

> >The benchmarks on my AMD64 suggest that replacing memcpy in
> >bseblockutils.h by SSE instructions will not have a positive effect on
> >performance. I don't know about the other Intel/AMD systems though.
>
> that is unrelated to the ring buffer. if SIMD copying was slower than
> non-SIMD copying, this has to be fixed as part of an optimization of
> the blockutils implementation, not by the ring buffer (or by the ring
> buffer not using blockutils).

Ok, ok... new version uses template specialization, so we have fast_copy
now, which will use Bse::Block::copy if the data type permits doing so.

> >Finally, if we were to optimize the JACK driver for maximum performance,
> >a more important factor would probably be that the ringbuffer even has
> >this API.
>
> sorry? what API are you talking about?
> void    bse_block_copy_float    (guint          n_values,
>                                  gfloat        *values,
>                                  const gfloat  *ivalues);
> copying?
>
> >It means that we need to deinterleave and reinterleave the
> >data JACK supplies on the stack, and then read/write it to the
> >ringbuffer. If the ringbuffer would offer an API to access its memory
> >directly, the extra copy would disappear. But it makes the code harder
> >to maintain and for now my first priority has been getting things
> >implemented correctly.
>
> i don't quite understand this point.
> are you saying that replacing memcpy() by bse_block_copy_float() can't
> be done due to interleaving? :-O

No. It can be replaced, and the new version has it. However, our data
flow looks like this:

  input

jack thread process() callback supplies N separate buffers
   |
   V
interleave on the stack (manually implemented, in process())       *jack-thread*
   |
   V
copy into the input ringbuffer (input ringbuffer write)            *jack-thread*
   |
   V
copy in bse engine supplied buffer (input ringbuffer read)         *engine-thread*
   |
   V
[ beast processing ]                                               *engine-thread*
   |
   V
copy from bse engine supplied buffer (output ringbuffer write)     *engine-thread*
   |
   V
copy onto stack buffer (output ringbuffer read)                    *jack-thread*
   |
   V
deinterleave into seperate buffers (manually)                      *jack-thread*
   |
   V
end of jack thread process() callback

  output


What I was pointing out to is that if the ringbuffer provided an API to
access the data which is stored in the ringbuffer directly, then we
could avoid some copies (if you wonder how such an API would look, read

http://www.portaudio.com/trac/browser/portaudio/branches/v19-devel/src/hostapi/coreaudio/ringbuffer.c

and look at RingBuffer_GetWriteRegions). However if we get everything we
want - with the number of copies we're making right now - I am perfectly
happy. Every sensible bse network will eat much more cycles than we can
possibly spend in those extra Bse::Block::copy() calls.

> >namespace {
> >
> >/**
> >* The FrameRingBuffer class implements a ringbuffer for the communication
> >* between two threads. One thread - the producer thread - may only write
> >* data to the ringbuffer. The other thread - the consumer thread - may
> >* only read data from the ringbuffer.
> >*
> >* Given that these two threads only use the appropriate functions, no
> >* other synchronization is required to ensure that the data gets safely
> >* from the producer thread to the consumer thread. However, all operations
> >* that are provided by the ringbuffer are non-blocking, so that you may
> >* need a condition or other synchronization primitive if you want the
> >* producer and/or consumer to block if the ringbuffer is full/empty.
>
> s/if/on the event that/ ?

I think not (but I am no native speaker). What I want to say is this:
Suppose the ringbuffer is empty. The consumer will, when calling the
read function, just read 0 frames. A common scenario is that the
consumer will want to wait until data is available. To implement this,
the consumer will need to use a synchonization mechanism (i.e.
condition).

> >*
> >* Implementation: the synchronization between the two threads is only
> >* implemented by two index variables (read_frame_pos and write_frame_pos)
> >* for which atomic integer reads and writes are required. Since the
> >* producer thread only modifies the write_frame_pos and the consumer thread
> >* only modifies the read_frame_pos, no compare-and-swap or similar
> >* operations are needed to avoid concurrent writes.
> >*/
>
> good comment btw, i now have an idea on why/how things are supposed
> to work and can scrutinize your implementaiton accordingly ;)
>
> >template<class T>
> >class FrameRingBuffer {
> > //BIRNET_PRIVATE_COPY (FrameRingBuffer);
>
> what's this comment for?

I wanted to call BIRNET_PRIVATE_COPY, but it didn't work, so I put //
marks there, so it compiled. So can I use BIRNET_PRIVATE_COPY or do I
need to put the code BIRNET_PRIVATE_COPY would put there if it worked
here manually?

> >   // the extra element allows us to see the difference between an
> >   empty/full ringbuffer
>
> s/element/frame/ right?

Right.

> > /**
> >  * Writes data to the ringbuffer; if there is not enough data
>
> s/enough data/enough free space/ if i understand correctly.
>
> >  * in the ringbuffer, the function will not block.
>
> hm, it be better to say what the function does rather than saying
> "it will not block", i.e. what it not does.
> afaiu, this should be:
>
>    * Write data to the ringbuffer; if there is not enough free space
>    * in the ringbuffer, the function will return with the amount of
>    * frames consumed by a partial write.

Well, it's similar now:

   * Write data to the ringbuffer; if there is not enough free space
   * in the ringbuffer, the function will return the amount of frames
   * consumed by a partial write (without blocking).

I wanted to keep the remark that it doesn't block there. I also adapted
the read description in a similar way:

   * Read data from the ringbuffer; if there is not enough data
   * in the ringbuffer, the function will return the number of frames
   * that could be read without blocking.

> > /**
> >  * Returns the maximum number of frames that the ringbuffer can contain.
> >  *
> >  * This function can be called from any thread.
> >  */
> > guint
> > size() const
> > {
> >   // the extra element allows us to see the difference between an
> >   empty/full ringbuffer
> >   return (m_buffer.size() - 1) / m_elements_per_frame;
> > }
>
> ugh this is mighty confusing.
> you should get rid of this function or at least clearly rename it.
> seeing it, i had to go back and read your code all over to figure whether
> you
> used size() in every place or m_buffer.size().
> if you need a name, call it total_n_frames(), allthough it seems this
> function
> isn't used/needed at all.

Well, its not used for internal purposes. But its provided for the code
which uses the ringbuffer. Basically if I am implementing C++ code for
some data structure that has a size (as the ringbuffer does), then I
should also provide an accessor that can be used to determine the size.

Just as std::vector has a resize() and size() function.

> > /**
> >  * Clears the ringbuffer.
>
> s/Clears/Clear/, applies to lots of other comments as well.
>
> >  *
> >  * This function is @emph{not} threadsafe, and can not be used while
>
> well, depends on what you mean by "threadsafe". i'd simply leave that
> out and just write:
>
>    * Clear the ringbuffer.
>    * This function may not be used while
>    * either the producer thread or the consumer thread are modifying
>    * the ringbuffer.
>
> (also fixed "can" vs. "may")

Oh yes, your comment sounds better. Definitely. Fixed.

> > void
> > clear()
> > {
> >   Atomic::int_set (&m_read_frame_pos, 0);
> >   Atomic::int_set (&m_write_frame_pos, 0);
> > }
>
> is this really needed outside of resize() though?

The jack driver needs it (to get into a sane state after a dropout).

> >   m_buffer.resize ((n_frames + 1) * m_elements_per_frame);
> >   clear();
>
> looks to me like you could simply inline resetting of the
> indices here.

If there is a function anyway, I might as well use it.

> hm, don't you want review on the rest of the driver as well?
> (that'd prolly also be the part that is more interesting for Paul)

Definitely. So here is an updated version of the ringbuffer and the
whole driver source. I think the driver already works pretty well, its
just that there are quite a few FIXMEs in there.

As I am only posting the C++ source to the list, if anyone is so brave
and wants to compile the driver (you need beast CVS for this), I've put
the whole driver to:

http://space.twc.de/~stefan/download/20060601-preview-bse-jack-0.7.0.tar.gz

(sorry that it extracts to the wrong directory, but I just used make
dist...).

Anyway, here is the C++ source (the interesting part, for the review):

/* BSE - Bedevilled Sound Engine
 * Copyright (C) 2004 Tim Janik
 * Copyright (C) 2006 Stefan Westerfeld
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2 of the License, or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General
 * Public License along with this program; if not, write to the
 * Free Software Foundation, Inc., 59 Temple Place, Suite 330,
 * Boston, MA 02111-1307, USA.
 */
#include "configure.h"
#include "bsepcmdevice-jack.hh"
#include <bse/bseblockutils.hh>
#include <bse/gsldatautils.h>
#include <jack/jack.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <set>
#include <string>
#include <map>
#include <vector>

using namespace Birnet;
using std::vector;
using std::set;
using std::map;
using std::string;
using std::min;
using std::max;
using std::copy;
using Bse::Block;

namespace {

/**
 * This function uses std::copy to copy the n_values of data from ivalues
 * to ovalues. If a specialized version is available in bseblockutils,
 * then this - usually faster - version will be used.
 *
 * The data in ivalues and ovalues may not overlap.
 */
template<class Data> void
fast_copy (guint       n_values,
           Data       *ovalues,
           const Data *ivalues)
{
  copy (ivalues, ivalues + n_values, ovalues);
}

template<> void
fast_copy (guint        n_values,
           float       *ovalues,
           const float *ivalues)
{
  Block::copy (n_values, ovalues, ivalues);
}

template<> void
fast_copy (guint  n_values,
           guint32       *ovalues,
           const guint32 *ivalues)
{
  Block::copy (n_values, ovalues, ivalues);
}

/**
 * The FrameRingBuffer class implements a ringbuffer for the communication
 * between two threads. One thread - the producer thread - may only write
 * data to the ringbuffer. The other thread - the consumer thread - may
 * only read data from the ringbuffer.
 *
 * Given that these two threads only use the appropriate functions, no
 * other synchronization is required to ensure that the data gets safely
 * from the producer thread to the consumer thread. However, all operations
 * that are provided by the ringbuffer are non-blocking, so that you may
 * need a condition or other synchronization primitive if you want the
 * producer and/or consumer to block if the ringbuffer is full/empty.
 *
 * Implementation: the synchronization between the two threads is only
 * implemented by two index variables (read_frame_pos and write_frame_pos)
 * for which atomic integer reads and writes are required. Since the
 * producer thread only modifies the write_frame_pos and the consumer thread
 * only modifies the read_frame_pos, no compare-and-swap or similar
 * operations are needed to avoid concurrent writes.
 */
template<class T>
class FrameRingBuffer {
  //BIRNET_PRIVATE_COPY (FrameRingBuffer);
private:
  vector<T>     m_buffer;
  volatile int  m_read_frame_pos;
  volatile int  m_write_frame_pos;
  guint         m_elements_per_frame;

  void
  write_memory_barrier()
  {
    static volatile int dummy = 0;

    /*
     * writing this dummy integer should ensure that all prior writes
     * are committed to memory
     */
    Atomic::int_set (&dummy, 0x12345678);
  }
public:
  FrameRingBuffer (guint n_frames = 0,
                   guint elements_per_frame = 1)
  {
    resize (n_frames, elements_per_frame);
  }
  /**
   * @returns the number of frames that are available for reading
   *
   * Check available read space in the ringbuffer.
   * This function may only be called from the consumer thread.
   */
  guint
  get_readable_frames()
  {
    int wpos = Atomic::int_get (&m_write_frame_pos);
    int rpos = Atomic::int_get (&m_read_frame_pos);
    int size = m_buffer.size() / m_elements_per_frame;

    if (wpos < rpos)    /* wpos == rpos -> empty ringbuffer */
      wpos += size;

    return wpos - rpos;
  }
  /**
   * @returns the number of successfully read frames
   *
   * Read data from the ringbuffer; if there is not enough data
   * in the ringbuffer, the function will return the number of frames
   * that could be read without blocking.
   *
   * This function should be called from the consumer thread.
   */
  guint
  read (guint  n_frames,
        T     *frames)
  {
    int rpos = Atomic::int_get (&m_read_frame_pos);
    guint size = m_buffer.size() / m_elements_per_frame;
    guint can_read = min (get_readable_frames(), n_frames);

    guint read1 = min (can_read, size - rpos) * m_elements_per_frame;
    fast_copy (read1, frames, &m_buffer[rpos * m_elements_per_frame]);

    guint read2 = can_read * m_elements_per_frame - read1;
    fast_copy (read2, frames + read1, &m_buffer[0]);

    Atomic::int_set (&m_read_frame_pos, (rpos + can_read) % size);
    return can_read;
  }
  /**
   * @returns the number of frames that can be written
   *
   * Check available write space in the ringbuffer.
   * This function should be called from the producer thread.
   */
  guint
  get_writable_frames()
  {
    int wpos = Atomic::int_get (&m_write_frame_pos);
    int rpos = Atomic::int_get (&m_read_frame_pos);
    guint size = m_buffer.size() / m_elements_per_frame;

    if (rpos <= wpos)    /* wpos == rpos -> empty ringbuffer */
      rpos += size;

    // the extra frame allows us to see the difference between an empty/full ringbuffer
    return rpos - wpos - 1;
  }
  /**
   * @returns the number of successfully written frames
   *
   * Write data to the ringbuffer; if there is not enough free space
   * in the ringbuffer, the function will return the amount of frames
   * consumed by a partial write (without blocking).
   *
   * This function may only be called from the producer thread.
   */
  guint
  write (guint    n_frames,
         const T *frames)
  {
    int wpos = Atomic::int_get (&m_write_frame_pos);
    guint size = m_buffer.size() / m_elements_per_frame;
    guint can_write = min (get_writable_frames(), n_frames);

    guint write1 = min (can_write, size - wpos) * m_elements_per_frame;
    fast_copy (write1, &m_buffer[wpos * m_elements_per_frame], frames);

    guint write2 = can_write * m_elements_per_frame - write1;
    fast_copy (write2, &m_buffer[0], frames + write1);
 
    // It is important that the data from the previous writes get committed
    // to memory *before* the index variable is updated. Otherwise, the
    // consumer thread could be reading invalid data, if the index variable
    // got written before the rest of the data (when unordered writes are
    // performed).
    write_memory_barrier();

    Atomic::int_set (&m_write_frame_pos, (wpos + can_write) % size);
    return can_write;
  }
  /**
   * @returns the maximum number of frames that the ringbuffer can contain
   *
   * This function can be called from any thread.
   */
  guint
  get_total_n_frames() const
  {
    // the extra frame allows us to see the difference between an empty/full ringbuffer
    return (m_buffer.size() - 1) / m_elements_per_frame;
  }
  /**
   * @returns the number of elements that are part of one frame
   *
   * This function can be called from any thread.
   */
  guint
  get_elements_per_frame() const
  {
    return m_elements_per_frame;
  }
  /**
   * Clear the ringbuffer.
   *
   * This function may not be used while either the producer thread or
   * the consumer thread are modifying the ringbuffer.
   */
  void
  clear()
  {
    Atomic::int_set (&m_read_frame_pos, 0);
    Atomic::int_set (&m_write_frame_pos, 0);
  }
  /**
   * Resize and clear the ringbuffer.
   *
   * This function may not be used while either the producer thread or
   * the consumer thread are modifying the ringbuffer.
   */
  void
  resize (guint n_frames,
          guint elements_per_frame = 1)
  {
    m_elements_per_frame = elements_per_frame;
    // the extra frame allows us to see the difference between an empty/full ringbuffer
    m_buffer.resize ((n_frames + 1) * m_elements_per_frame);
    clear();
  }
};


static BIRNET_MSG_TYPE_DEFINE (debug_pcm, "pcm", BIRNET_MSG_DEBUG, NULL);
#define DEBUG(...) sfi_debug (debug_pcm, __VA_ARGS__)

/* --- JACK PCM handle --- */
enum
{
  CALLBACK_STATE_INACTIVE = 0,
  CALLBACK_STATE_ACTIVE = 1,
  CALLBACK_STATE_PLEASE_TERMINATE = 2,
  CALLBACK_STATE_TERMINATED = 3
};

struct JackPcmHandle
{
  BsePcmHandle  handle;
  JackPcmHandle (jack_client_t *jack_client)
    : jack_client (jack_client)
  {
    memset (&handle, 0, sizeof (handle));
    callback_state = ixruns = pixruns = oxruns = poxruns = 0;
    is_down = false;
  }
  vector<jack_port_t *>  input_ports;
  vector<jack_port_t *>  output_ports;

  guint  buffer_frames; /* input/output ringbuffer size in frames */

  Cond  cond;
  Mutex  cond_mutex;
  jack_client_t *jack_client;
  FrameRingBuffer<float>  input_ringbuffer;
  FrameRingBuffer<float>  output_ringbuffer;

  int  callback_state;
  int  ixruns;
  int  oxruns;
  int  pixruns;
  int  poxruns;

  bool  is_down;
};

}

/* --- prototypes --- */
static void             bse_pcm_device_jack_class_init  (BsePcmDeviceJACKClass  *klass);
static void             bse_pcm_device_jack_init        (BsePcmDeviceJACK       *self);
static gsize            jack_device_read                (BsePcmHandle           *handle,
                                                         gfloat                 *values);
static void             jack_device_write               (BsePcmHandle           *handle,
                                                         const gfloat           *values);
static gboolean         jack_device_check_io            (BsePcmHandle           *handle,
                                                         glong                  *tiumeoutp);
static guint            jack_device_latency             (BsePcmHandle           *handle);
static void             jack_device_retrigger           (JackPcmHandle          *jack);

/* --- define object type and export to BSE --- */
static const char type_blurb[] = ("PCM driver implementation for JACK Audio Connection Kit "
                                  "(http://jackaudio.org)");
BSE_REGISTER_OBJECT (BsePcmDeviceJACK, BsePcmDevice, NULL, "", type_blurb, NULL, bse_pcm_device_jack_class_init, NULL, bse_pcm_device_jack_init);
BSE_DEFINE_EXPORTS (__FILE__);

/* --- variables --- */
static gpointer parent_class = NULL;

/* --- functions --- */
static void
bse_pcm_device_jack_init (BsePcmDeviceJACK *self)
{
  /* FIXME: move signal handling somewhere else */
  sigset_t signal_mask;
  sigemptyset (&signal_mask);
  sigaddset (&signal_mask, SIGPIPE);

  int rc = pthread_sigmask (SIG_BLOCK, &signal_mask, NULL);
  if (rc != 0)
    g_printerr ("jack driver: pthread_sigmask for SIGPIPE failed: %s\n", strerror (errno));

  jack_status_t status;

  self->jack_client = jack_client_open ("beast", JackNoStartServer, &status); /* FIXME: translations(?) */
  DEBUG ("attaching to JACK server returned status: %d\n", status);
  /* FIXME: check status for proper error reporting
   * FIXME: what about server name? (necessary if more than one jackd instance is running)
   * FIXME: get rid of device unloading, so that the jackd connection can be kept initialized
   *
   * maybe move this to a different function
   */
}

namespace {
struct DeviceDetails {
  guint ports, input_ports, output_ports, physical_ports, terminal_ports;
  vector<string> input_port_names;
  vector<string> output_port_names;
  DeviceDetails() : ports (0), input_ports (0), output_ports (0), physical_ports (0), terminal_ports (0)
  {
  }
};
}

static map<string, DeviceDetails>
query_jack_devices (BsePcmDeviceJACK *self)
{
  map<string, DeviceDetails> devices;

  /* FIXME: we need to make a difference here between
   *  - jackd is running, but no output ports can be found
   *  - jackd is not running
   */
  const char **jack_ports = self->jack_client ? jack_get_ports (self->jack_client, NULL, NULL, 0) : NULL;
  if (jack_ports)
    {
      for (guint i = 0; jack_ports[i]; i++)
        {
          const char *end = strchr (jack_ports[i], ':');
          if (end)
            {
              string device_name (jack_ports[i], end);
              DeviceDetails &details = devices[device_name];
              details.ports++;

              jack_port_t *jack_port = jack_port_by_name (self->jack_client, jack_ports[i]);
              if (jack_port)
                {
                  int flags = jack_port_flags (jack_port);
                  if (flags & JackPortIsInput)
                    {
                      details.input_ports++;
                      details.input_port_names.push_back (jack_ports[i]);
                    }
                  if (flags & JackPortIsOutput)
                    {
                      details.output_ports++;
                      details.output_port_names.push_back (jack_ports[i]);
                    }
                  if (flags & JackPortIsPhysical)
                    details.physical_ports++;
                  if (flags & JackPortIsTerminal)
                    details.terminal_ports++;
                }
            }
        }
      free (jack_ports);
    }

  return devices;
}

static SfiRing*
bse_pcm_device_jack_list_devices (BseDevice *device)
{
  BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device);
  map<string, DeviceDetails> devices = query_jack_devices (self);

  SfiRing *ring = NULL;
  for (map<string, DeviceDetails>::iterator di = devices.begin(); di != devices.end(); di++)
    {
      const string &device_name = di->first;
      DeviceDetails &details = di->second;
      BseDeviceEntry *entry;
      entry = bse_device_group_entry_new (device, g_strdup (device_name.c_str()),
                                                  g_strdup ("JACK Devices"),
                                                  g_strdup_printf ("%s (%d*input %d*output)%s",
                                                                   device_name.c_str(),
                                                                   details.input_ports,
                                                                   details.output_ports,
                                                                   details.physical_ports == details.ports ?
                                                                     " [ hardware ]" : ""));

      /* ensure that alsa_pcm always is the first item listed (it is the default device) */
      if (device_name == "alsa_pcm")
        ring = sfi_ring_prepend (ring, entry);
      else
        ring = sfi_ring_append (ring, entry);
    }

  if (!ring)
    ring = sfi_ring_append (ring, bse_device_error_new (device, g_strdup_printf ("No devices found")));
  return ring;
}

static void
jack_shutdown_callback (void *jack_handle_ptr)
{
  JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr);

  AutoLocker cond_locker (jack->cond_mutex);
  jack->is_down = true;
  jack->cond.signal();
}

static int
jack_process_callback (jack_nframes_t n_frames,
                       void          *jack_handle_ptr)
{
  typedef jack_default_audio_sample_t sample_t;

  JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr);

  int callback_state = Atomic::int_get (&jack->callback_state);

  /* still waiting for initialization to complete */
  if (callback_state == CALLBACK_STATE_ACTIVE)
    {
      bool have_cond_lock = jack->cond_mutex.trylock();

      guint n_channels = jack->handle.n_channels;
      float buffer[n_frames * n_channels];

      if (jack->handle.readable)
        {
          /* interleave input data for processing in the engine thread */
          g_assert (jack->input_ports.size() == n_channels);

          for (guint ch = 0; ch < n_channels; ch++)
            {
              sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames);
              float *p = &buffer[ch];
              for (guint i = 0; i < n_frames; i++)
                {
                  *p = *values++;
                  p += n_channels;
                }
            }

          guint frames_written = jack->input_ringbuffer.write (n_frames, buffer);
          if (frames_written != n_frames)
            Atomic::int_add (&jack->ixruns, 1);      /* input underrun detected */
        }
      if (jack->handle.writable)
        {
          guint read_frames = jack->output_ringbuffer.read (n_frames, buffer);
          if (read_frames != n_frames)
            {
              Atomic::int_add (&jack->oxruns, 1);     /* output underrun detected */
              Block::fill ((n_frames - read_frames) * n_channels, &buffer[read_frames * n_channels], 0.0);
            }

          /* deinterleave output data for processing in the engine thread */
          g_assert (jack->output_ports.size() == n_channels);

          for (guint ch = 0; ch < n_channels; ch++)
            {
              sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames);
              sample_t *values_end = values + n_frames;
              float *p = &buffer[ch];
              while (values < values_end)
                {
                  *values++ = *p;
                  p += n_channels;
                }
            }
        }

      /*
       * as we can't (always) lock our data structures from the jack realtime
       * thread, using a condition introduces the possibility for a race:
       *
       * normal operation:
       *
       * [bse thread]      writes some data to output_ringbuffer
       * [bse thread]      checks remaining space, there is no room left
       * [bse thread]      sleeps on the condition
       * [jack thread]     reads some data from output_ringbuffer
       * [jack thread]     signals the condition
       * [bse thread]      continues to write some data to output_ringbuffer
       *
       * race condition:
       *
       * [bse thread]      writes some data to output_ringbuffer
       * [bse thread]      checks remaining space, there is no room left
       * [jack thread]     reads some data from output_ringbuffer
       * [jack thread]     signals the condition
       * [bse thread]      sleeps on the condition
       *
       * since the jack callback gets called periodically, the bse thread will
       * never starve, though, in the worst case it will just miss a frame
       *
       * so we absolutely exclude the possibility for priority inversion (by
       * using trylock instead of lock), by introducing extra latency in
       * some extremely rare cases
       */
      if (have_cond_lock)
        jack->cond_mutex.unlock();
    }
  else
    {
      for (guint ch = 0; ch < jack->input_ports.size(); ch++)
        {
          sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames);
          Block::fill (n_frames, values, 0.0);
        }
      for (guint ch = 0; ch < jack->output_ports.size(); ch++)
        {
          sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames);
          Block::fill (n_frames, values, 0.0);
        }
    }
  jack->cond.signal();

  if (callback_state == CALLBACK_STATE_PLEASE_TERMINATE)
    {
      Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED);
      return 1;
    }
  return 0;
}

static void
terminate_and_free_jack (JackPcmHandle *jack)
{
  g_return_if_fail (jack->jack_client != NULL);

  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_PLEASE_TERMINATE);
  while (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_PLEASE_TERMINATE)
    {
      AutoLocker cond_locker (jack->cond_mutex);

      if (jack->is_down) /* if jack is already gone, then forget it */
        Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED);
      else
        jack->cond.wait_timed (jack->cond_mutex, 100000);
    }

  for (guint ch = 0; ch < jack->input_ports.size(); ch++)
    jack_port_disconnect (jack->jack_client, jack->input_ports[ch]);

  for (guint ch = 0; ch < jack->output_ports.size(); ch++)
    jack_port_disconnect (jack->jack_client, jack->output_ports[ch]);

  jack_deactivate (jack->jack_client);

  delete jack;
}

static BseErrorType
bse_pcm_device_jack_open (BseDevice     *device,
                          gboolean       require_readable,
                          gboolean       require_writable,
                          guint          n_args,
                          const gchar  **args)
{
  BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device);
  if (!self->jack_client)
    return BSE_ERROR_FILE_OPEN_FAILED;

  JackPcmHandle *jack = new JackPcmHandle (self->jack_client);
  BsePcmHandle *handle = &jack->handle;

  handle->readable = require_readable;
  handle->writable = require_writable;
  handle->n_channels = BSE_PCM_DEVICE (device)->req_n_channels;

  /* try setup */
  BseErrorType error = BSE_ERROR_NONE;

  const char *dname = (n_args == 1) ? args[0] : "alsa_pcm";
  handle->mix_freq = jack_get_sample_rate (self->jack_client);

  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE);

  for (guint i = 0; i < handle->n_channels; i++)
    {
      const int port_name_size = jack_port_name_size();
      char port_name[port_name_size];
      jack_port_t *port;

      snprintf (port_name, port_name_size, "in_%u", i);
      port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);
      if (port)
        jack->input_ports.push_back (port);
      else
        error = BSE_ERROR_FILE_OPEN_FAILED;

      snprintf (port_name, port_name_size, "out_%u", i);
      port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0);
      if (port)
        jack->output_ports.push_back (port);
      else
        error = BSE_ERROR_FILE_OPEN_FAILED;
    }

  /* activate */

  jack_set_process_callback (self->jack_client, jack_process_callback, jack);
  jack_on_shutdown (self->jack_client, jack_shutdown_callback, jack);
  jack_activate (self->jack_client);

  /* connect ports */

  map<string, DeviceDetails> devices = query_jack_devices (self);
  map<string, DeviceDetails>::const_iterator di;

  di = devices.find (dname);
  if (di != devices.end())
    {
      const DeviceDetails &details = di->second;

      for (guint ch = 0; ch < handle->n_channels; ch++)
        {
          if (details.output_ports > ch)
            jack_connect (self->jack_client, details.output_port_names[ch].c_str(), jack_port_name (jack->input_ports[ch]));
          if (details.input_ports > ch)
            jack_connect (self->jack_client, jack_port_name (jack->output_ports[ch]), details.input_port_names[ch].c_str());
        }
    }

  /* setup buffer size */
 
  // keep at least two jack callback sizes for dropout free audio
  guint min_buffer_frames = jack_get_buffer_size (self->jack_client) * 2;

  // keep an extra engine buffer size (this compensates also for cases where the
  // engine buffer size is not a 2^N value, which would otherwise cause the
  // buffer never to be fully filled with 2 periods of data)
  min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length;

  // honor the user defined latency specification
  //
  // FIXME: should we try to tweak local buffering so that it corresponds
  // to the user defined latency, or global buffering (including the jack
  // buffer)? we do the former, since it is easier
  guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * handle->mix_freq / 1000;
  guint buffer_frames = max (min_buffer_frames, user_buffer_frames);

  jack->input_ringbuffer.resize (buffer_frames, handle->n_channels);
  jack->output_ringbuffer.resize (buffer_frames, handle->n_channels);
  jack->buffer_frames  = jack->output_ringbuffer.get_writable_frames();

  // the ringbuffer should be exactly as big as requested
  g_assert (jack->buffer_frames == buffer_frames);
  DEBUG ("ringbuffer size = %.3fms", jack->buffer_frames / double (handle->mix_freq) * 1000);

  /* setup PCM handle or shutdown */
  if (!error)
    {
      bse_device_set_opened (device, dname, handle->readable, handle->writable);
      if (handle->readable)
        handle->read = jack_device_read;
      if (handle->writable)
        handle->write = jack_device_write;
      handle->check_io = jack_device_check_io;
      handle->latency = jack_device_latency;
      BSE_PCM_DEVICE (device)->handle = handle;

      /* will prevent dropouts at initialization, when no data is there at all */
      jack_device_retrigger (jack);
      jack_device_latency (handle);   // debugging only: print latency values
    }
  else
    {
      terminate_and_free_jack (jack);
    }
  DEBUG ("JACK: opening PCM \"%s\" readupble=%d writable=%d: %s", dname, require_readable, require_writable, bse_error_blurb (error));

  return error;
}

static void
bse_pcm_device_jack_close (BseDevice *device)
{
  JackPcmHandle *jack = (JackPcmHandle*) BSE_PCM_DEVICE (device)->handle;
  BSE_PCM_DEVICE (device)->handle = NULL;

  terminate_and_free_jack (jack);
}

static void
bse_pcm_device_jack_finalize (GObject *object)
{
  BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (object);
  if (self->jack_client)
    {
      jack_client_close (self->jack_client);
      self->jack_client = NULL;
    }

  /* chain parent class' handler */
  G_OBJECT_CLASS (parent_class)->finalize (object);
}

static void
jack_device_retrigger (JackPcmHandle *jack)
{
  g_return_if_fail (jack->jack_client != NULL);

  /*
   * we need to reset the active flag to false here, as we modify the
   * buffers in a non threadsafe way; this is why we also wait for
   * the condition, to ensure that the jack callback really isn't
   * active any more
   */
  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE);

  /* usually should not timeout, but be notified by the jack callback */
  jack->cond_mutex.lock();
  if (!jack->is_down)
    jack->cond.wait_timed (jack->cond_mutex, 100000);
  jack->cond_mutex.unlock();

  /* jack_ringbuffer_reset is not threadsafe! */
  jack->input_ringbuffer.clear();
  jack->output_ringbuffer.clear();

  /* initialize output ringbuffer with silence */
  vector<float> silence (jack->buffer_frames * jack->handle.n_channels);
  guint frames_written = jack->output_ringbuffer.write (jack->buffer_frames, &silence[0]);
  g_assert (frames_written == jack->buffer_frames);
  DEBUG ("jack_device_retrigger: %d frames written", frames_written);
}

static gboolean
jack_device_check_io (BsePcmHandle *handle,
                      glong        *timeoutp)
{
  JackPcmHandle *jack = (JackPcmHandle*) handle;
  g_return_val_if_fail (jack->jack_client != NULL, false);

/*
  int ixruns = Atomic::int_get (&jack->ixruns);
  int oxruns = Atomic::int_get (&jack->oxruns);

  if (jack->poxruns != oxruns || jack->pixruns != ixruns)
    {
      printf ("beast caused jack input xruns %d, output xruns %d\n", ixruns, oxruns);
      jack->poxruns = oxruns;
      jack->pixruns = ixruns;

      jack_device_retrigger (jack);
    }
*/
 
  /* (FIXME?)
   *
   * we can not guarantee that beast will read the data from the jack ringbuffer,
   * since this depends on the presence of a module that actually reads data in
   * the synthesis graph : so we simply ignore the ixruns variable; I am not sure
   * if thats the right thing to do, though
   */
  int oxruns = Atomic::int_get (&jack->oxruns);

  if (jack->poxruns != oxruns)
    {
      g_printerr ("%d beast jack driver xruns\n", oxruns);
      jack->poxruns = oxruns;

      jack_device_retrigger (jack);
    }

  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_ACTIVE);

  guint n_frames_avail = min (jack->output_ringbuffer.get_writable_frames(), jack->input_ringbuffer.get_readable_frames());

  /* check whether data can be processed */
  if (n_frames_avail >= handle->block_length)
    return TRUE;        /* need processing */

  /* calculate timeout until processing is possible or needed */
  guint diff_frames = handle->block_length - n_frames_avail;
  *timeoutp = diff_frames * 1000 / handle->mix_freq;
  return FALSE;
}

static guint
jack_device_latency (BsePcmHandle *handle)
{
  JackPcmHandle *jack = (JackPcmHandle*) handle;
  jack_nframes_t rlatency = 0, wlatency = 0;

  g_return_val_if_fail (jack->jack_client != NULL, 0);

  /* FIXME: the API of this function is broken, because you can't use its result
   * to sync for instance the play position pointer with the screen
   *
   * why? because it doesn't return you rlatency and wlatency seperately, but
   * only the sum of both
   *
   * when using jack, there is no guarantee that rlatency and wlatency are equal
   */
  for (guint i = 0; i < jack->input_ports.size(); i++)
    {
      jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->input_ports[i]);
      rlatency = max (rlatency, latency);
    }

  for (guint i = 0; i < jack->output_ports.size(); i++)
    {
      jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->output_ports[i]);
      wlatency = max (wlatency, latency);
    }
 
  guint total_latency = 2 * jack->buffer_frames + rlatency + wlatency;
  DEBUG ("rlatency=%.3f ms wlatency=%.3f ms ringbuffer=%.3f ms total_latency=%.3f ms",
         rlatency / double (handle->mix_freq) * 1000,
         wlatency / double (handle->mix_freq) * 1000,
         jack->buffer_frames / double (handle->mix_freq) * 1000,
         total_latency / double (handle->mix_freq) * 1000);
  return total_latency;
}

static gsize
jack_device_read (BsePcmHandle *handle,
                  gfloat       *values)
{
  JackPcmHandle *jack = (JackPcmHandle*) handle;
  g_return_val_if_fail (jack->jack_client != NULL, 0);
  g_return_val_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_ACTIVE, 0);

  guint frames_left = handle->block_length;
  while (frames_left)
    {
      guint frames_read = jack->input_ringbuffer.read (frames_left, values);

      frames_left -= frames_read;
      values += frames_read * handle->n_channels;

      if (frames_left)
        {
          AutoLocker cond_locker (jack->cond_mutex);

          /* usually should not timeout, but be notified by the jack callback */
          if (!jack->input_ringbuffer.get_readable_frames())
            jack->cond.wait_timed (jack->cond_mutex, 100000);

          if (jack->is_down)
            {
              /*
               * FIXME: we need a way to indicate an error here; beast should provide
               * an adequate reaction in case the JACK server is down (it should stop
               * playing the file, and show a dialog, if JACK can not be reconnected)
               *
               * if we have a way to abort processing, this if can be moved above
               * the condition wait; however, right now moving it there means that
               * beast will render the output as fast as possible when jack dies, and
               * this will look like a machine lockup
               */
              Block::fill (frames_left * handle->n_channels, values, 0.0);   /* <- remove me once we can indicate errors */
              return handle->block_length * handle->n_channels;
            }
        }
    }
  return handle->block_length * handle->n_channels;
}

static void
jack_device_write (BsePcmHandle *handle,
                   const gfloat *values)
{
  JackPcmHandle *jack = (JackPcmHandle*) handle;
  g_return_if_fail (jack->jack_client != NULL);
  g_return_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_ACTIVE);

  guint frames_left = handle->block_length;

  while (frames_left)
    {
      guint frames_written = jack->output_ringbuffer.write (frames_left, values);

      frames_left -= frames_written;
      values += frames_written * handle->n_channels;

      if (frames_left)
        {
          AutoLocker cond_locker (jack->cond_mutex);

          /* usually should not timeout, but be notified by the jack callback */
          if (!jack->output_ringbuffer.get_writable_frames())
            jack->cond.wait_timed (jack->cond_mutex, 100000);

          if (jack->is_down)
            {
              /*
               * FIXME: we need a way to indicate an error here; beast should provide
               * an adequate reaction in case the JACK server is down (it should stop
               * playing the file, and show a dialog, if JACK can not be reconnected)
               *
               * if we have a way to abort processing, this if can be moved above
               * the condition wait; however, right now moving it there means that
               * beast will render the output as fast as possible when jack dies, and
               * this will look like a machine lockup
               */
              return;
            }
        }
    }
}

static void
bse_pcm_device_jack_class_init (BsePcmDeviceJACKClass *klass)
{
  GObjectClass *gobject_class = G_OBJECT_CLASS (klass);
  BseDeviceClass *device_class = BSE_DEVICE_CLASS (klass);
 
  parent_class = g_type_class_peek_parent (klass);
 
  gobject_class->finalize = bse_pcm_device_jack_finalize;
 
  device_class->list_devices = bse_pcm_device_jack_list_devices;
  const gchar *name = "jack";
  const gchar *syntax = _("device");
  const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70 chars in width */
                                       _("JACK Audio Connection Kit PCM driver (http://jackaudio.org)\n"));

  /* Being BSE_RATING_PREFFERED + 1, should do the right thing: if jackd is running,
   * use it, if jackd isn't running (we'll get an error upon initialization,
   * then), use the next best driver.
   *
   * Right now we will never start the jackd process ourselves.
   */
  bse_device_class_setup (klass, BSE_RATING_PREFERRED + 1, name, syntax, info);
  device_class->open = bse_pcm_device_jack_open;
  device_class->close = bse_pcm_device_jack_close;
}

   Cu... Stefan
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: Beast Jack driver (Re: Lockfree ringbuffer review)

Tim Janik
On Thu, 1 Jun 2006, Stefan Westerfeld wrote:

> On Thu, Jun 01, 2006 at 05:14:15PM +0200, Tim Janik wrote:

>> well, what's the use case for a non-float ringbuffer?
>
> I've just read my way through the new jack midi API, and we'll need a
> ring buffer to get the midi events from the realtime thread, where they
> will be delivered by jack, and our midi thread. So yes, there is a use
> case, which will probably have unsigned char as data type, or even
> something like struct MidiEvent.

ok, that's interesting indeed.

>>> Finally, if we were to optimize the JACK driver for maximum performance,
>>> a more important factor would probably be that the ringbuffer even has
>>> this API.
>>
>> sorry? what API are you talking about?
>> void    bse_block_copy_float    (guint          n_values,
>>                                  gfloat        *values,
>>                                  const gfloat  *ivalues);
>> copying?
>>
>>> It means that we need to deinterleave and reinterleave the
>>> data JACK supplies on the stack, and then read/write it to the
>>> ringbuffer. If the ringbuffer would offer an API to access its memory
>>> directly, the extra copy would disappear. But it makes the code harder
>>> to maintain and for now my first priority has been getting things
>>> implemented correctly.
>>
>> i don't quite understand this point.
>> are you saying that replacing memcpy() by bse_block_copy_float() can't
>> be done due to interleaving? :-O
>
> No. It can be replaced, and the new version has it. However, our data
> flow looks like this:
>
>  input
>
> jack thread process() callback supplies N separate buffers
>   |
>   V
> interleave on the stack (manually implemented, in process())       *jack-thread*
>   |
>   V
> copy into the input ringbuffer (input ringbuffer write)            *jack-thread*
>   |
>   V
> copy in bse engine supplied buffer (input ringbuffer read)         *engine-thread*
>   |
>   V

> [ beast processing ]                                               *engine-thread*

note that this is processing in BSE, not beast (the GUI).
and BSE does deinterleave its data, process it and then reinterleave
because standard audio drivers want that. for jack this doesn't
make sense though, so the data flow should instead be:
   n-jack-buffers -> ringbuffer-write -> ||
   || -> ringbuffer-read -> n-bse-channel-blocks

[...]
> and look at RingBuffer_GetWriteRegions). However if we get everything we
> want - with the number of copies we're making right now - I am perfectly
> happy. Every sensible bse network will eat much more cycles than we can
> possibly spend in those extra Bse::Block::copy() calls.

i'm not happy with this. with every unneccessary copy you're doing in the
audio driver, you reduce the number of cycles the engine can spend on
audio processing eventhough it needs so much, like you just stated.

>>> namespace {
>>>
>>> /**
>>> * The FrameRingBuffer class implements a ringbuffer for the communication
>>> * between two threads. One thread - the producer thread - may only write
>>> * data to the ringbuffer. The other thread - the consumer thread - may
>>> * only read data from the ringbuffer.
>>> *
>>> * Given that these two threads only use the appropriate functions, no
>>> * other synchronization is required to ensure that the data gets safely
>>> * from the producer thread to the consumer thread. However, all operations
>>> * that are provided by the ringbuffer are non-blocking, so that you may
>>> * need a condition or other synchronization primitive if you want the
>>> * producer and/or consumer to block if the ringbuffer is full/empty.
>>
>> s/if/on the event that/ ?
>
> I think not (but I am no native speaker). What I want to say is this:
> Suppose the ringbuffer is empty. The consumer will, when calling the
> read function, just read 0 frames. A common scenario is that the
> consumer will want to wait until data is available. To implement this,
> the consumer will need to use a synchonization mechanism (i.e.
> condition).

that would need the suggested phrase "on the event that" then.
hoever, the scenario you describe is absolutely *not* common in
our application of the ringbuffer. the BSE engine does precisely
*not* want to wait on audio. it'll just come back later and read or
write again. for that, it just needs to be able to figure from the
driver how many samples later that would ideally be.

>>> *
>>> * Implementation: the synchronization between the two threads is only
>>> * implemented by two index variables (read_frame_pos and write_frame_pos)
>>> * for which atomic integer reads and writes are required. Since the
>>> * producer thread only modifies the write_frame_pos and the consumer thread
>>> * only modifies the read_frame_pos, no compare-and-swap or similar
>>> * operations are needed to avoid concurrent writes.
>>> */
>>
>> good comment btw, i now have an idea on why/how things are supposed
>> to work and can scrutinize your implementaiton accordingly ;)
>>
>>> template<class T>
>>> class FrameRingBuffer {
>>> //BIRNET_PRIVATE_COPY (FrameRingBuffer);
>>
>> what's this comment for?
>
> I wanted to call BIRNET_PRIVATE_COPY, but it didn't work, so I put //
> marks there, so it compiled. So can I use BIRNET_PRIVATE_COPY or do I
> need to put the code BIRNET_PRIVATE_COPY would put there if it worked
> here manually?

i've not seen it "not working" yet. if that is indeed the case, you
need to investigate the cause. i can't say give you advice just on the
notion of "it didn't work" though... ;)

>>> /**
>>>  * Returns the maximum number of frames that the ringbuffer can contain.
>>>  *
>>>  * This function can be called from any thread.
>>>  */
>>> guint
>>> size() const
>>> {
>>>   // the extra element allows us to see the difference between an
>>>   empty/full ringbuffer
>>>   return (m_buffer.size() - 1) / m_elements_per_frame;
>>> }
>>
>> ugh this is mighty confusing.
>> you should get rid of this function or at least clearly rename it.
>> seeing it, i had to go back and read your code all over to figure whether
>> you
>> used size() in every place or m_buffer.size().
>> if you need a name, call it total_n_frames(), allthough it seems this
>> function
>> isn't used/needed at all.
>
> Well, its not used for internal purposes. But its provided for the code
> which uses the ringbuffer. Basically if I am implementing C++ code for
> some data structure that has a size (as the ringbuffer does), then I
> should also provide an accessor that can be used to determine the size.
>
> Just as std::vector has a resize() and size() function.

i think it's still bad because its very confusing in this context.
so yes, it should still be renamed.
(and no, i don't think the libc or std:: APIs picked good method names
in every place; consequently they may very well do you a disservice as
API naming reference)

>>> void
>>> clear()
>>> {
>>>   Atomic::int_set (&m_read_frame_pos, 0);
>>>   Atomic::int_set (&m_write_frame_pos, 0);
>>> }
>>
>> is this really needed outside of resize() though?
>
> The jack driver needs it (to get into a sane state after a dropout).

how are you avoiding concurrent access when clearing due to a drop
out then?

>> hm, don't you want review on the rest of the driver as well?
>> (that'd prolly also be the part that is more interesting for Paul)
>
> Definitely. So here is an updated version of the ringbuffer and the
> whole driver source. I think the driver already works pretty well, its
> just that there are quite a few FIXMEs in there.
>
> As I am only posting the C++ source to the list, if anyone is so brave
> and wants to compile the driver (you need beast CVS for this), I've put
> the whole driver to:
>
> http://space.twc.de/~stefan/download/20060601-preview-bse-jack-0.7.0.tar.gz
>
> (sorry that it extracts to the wrong directory, but I just used make
> dist...).
>
> Anyway, here is the C++ source (the interesting part, for the review):

well, i think this is better handled by another thread.
it doesn't help to make this email even longer ;)

>   Cu... Stefan

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

bsepcmdevice-jack.cc (Re: Beast Jack driver)

Tim Janik
In reply to this post by Stefan Westerfeld
On Thu, 1 Jun 2006, Stefan Westerfeld wrote:

> Anyway, here is the C++ source (the interesting part, for the review):
>
> /* BSE - Bedevilled Sound Engine
> * Copyright (C) 2004 Tim Janik
> * Copyright (C) 2006 Stefan Westerfeld
> *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of the GNU Lesser General Public
> * License as published by the Free Software Foundation; either
> * version 2 of the License, or (at your option) any later version.
> *
> * This program is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> * Lesser General Public License for more details.
> *
> * You should have received a copy of the GNU Lesser General
> * Public License along with this program; if not, write to the
> * Free Software Foundation, Inc., 59 Temple Place, Suite 330,
> * Boston, MA 02111-1307, USA.
> */
> #include "configure.h"
> #include "bsepcmdevice-jack.hh"
> #include <bse/bseblockutils.hh>
> #include <bse/gsldatautils.h>
> #include <jack/jack.h>
> #include <string.h>
> #include <errno.h>
> #include <signal.h>
> #include <set>
> #include <string>
> #include <map>
> #include <vector>
>
> using namespace Birnet;
> using std::vector;
> using std::set;
> using std::map;
> using std::string;
> using std::min;
> using std::max;
> using std::copy;
> using Bse::Block;
>
> namespace {
>
> /**
> * This function uses std::copy to copy the n_values of data from ivalues
> * to ovalues. If a specialized version is available in bseblockutils,
> * then this - usually faster - version will be used.

well, you could even say "suppsoed to be faster", because if the blockutils
version is indeed slower, that's worthy of a bug report.

> *
> * The data in ivalues and ovalues may not overlap.
> */
> template<class Data> void
> fast_copy (guint       n_values,
>           Data       *ovalues,
>   const Data *ivalues)
> {
>  copy (ivalues, ivalues + n_values, ovalues);
> }
>
> template<> void
> fast_copy (guint        n_values,
>           float       *ovalues,
>           const float *ivalues)
> {
>  Block::copy (n_values, ovalues, ivalues);
> }
>
> template<> void
> fast_copy (guint  n_values,
>           guint32       *ovalues,
>           const guint32 *ivalues)
> {
>  Block::copy (n_values, ovalues, ivalues);
> }
>
> /**
> * The FrameRingBuffer class implements a ringbuffer for the communication
[...]

[ring buffer code that has already been reviewed...]

the ring buffer code is probably best split off into an extra tcc file,
especially if chances are that you need it in the MIDI driver as well.

> static BIRNET_MSG_TYPE_DEFINE (debug_pcm, "pcm", BIRNET_MSG_DEBUG, NULL);
> #define DEBUG(...) sfi_debug (debug_pcm, __VA_ARGS__)
>
> /* --- JACK PCM handle --- */

there should be an explanatory comment in this file on the jack thread
and when it enters what states, etc.

> enum
> {
>  CALLBACK_STATE_INACTIVE = 0,
>  CALLBACK_STATE_ACTIVE = 1,
>  CALLBACK_STATE_PLEASE_TERMINATE = 2,
>  CALLBACK_STATE_TERMINATED = 3

minor cosmetic, we usually align '='-symbols in enums.

> };

you should provide a type name for this enum, which
can be used above to tell apart valid states of your
callback_state variable.

>
> struct JackPcmHandle
> {
>  BsePcmHandle  handle;
>  JackPcmHandle (jack_client_t *jack_client)
>    : jack_client (jack_client)

cosmetic, the ':' usually goes next to the ')', i.e.:
   JackPcmHandle (jack_client_t *jack_client) :
     jack_client (jack_client)
   {}

>  {
>    memset (&handle, 0, sizeof (handle));
>    callback_state = ixruns = pixruns = oxruns = poxruns = 0;
>    is_down = false;

hm, afaik, it's better style to intiialize all these
after JackPcmHandle() :

>  }
>  vector<jack_port_t *>  input_ports;
>  vector<jack_port_t *>  output_ports;
>
>  guint  buffer_frames; /* input/output ringbuffer size in frames */
>
>  Cond  cond;
>  Mutex  cond_mutex;

erk, what exactly is protected by this cond and this mutex?
something like this definitely needs to be outlined in code
(in comments or by naming prefixes, look for BirnetMutex
in bseengineutils.c to see examples for that)

>  jack_client_t *jack_client;
>  FrameRingBuffer<float>  input_ringbuffer;
>  FrameRingBuffer<float>  output_ringbuffer;
>
>  int  callback_state;
^^^^^^^^^^
proper typing is missing here, you may take a look at bsemidireceiver.cc and
search for "enum" to find examples for internally used enums.

>  int  ixruns;
>  int  oxruns;
>  int  pixruns;
>  int  poxruns;
>
>  bool  is_down;
> };
>
> }
>
> /* --- prototypes --- */
> static void             bse_pcm_device_jack_class_init  (BsePcmDeviceJACKClass  *klass);
> static void             bse_pcm_device_jack_init        (BsePcmDeviceJACK       *self);
> static gsize            jack_device_read                (BsePcmHandle           *handle,
>                                                         gfloat                 *values);
> static void             jack_device_write               (BsePcmHandle           *handle,
>                                                         const gfloat           *values);
> static gboolean         jack_device_check_io            (BsePcmHandle           *handle,
>                                                         glong                  *tiumeoutp);
> static guint            jack_device_latency             (BsePcmHandle           *handle);
> static void             jack_device_retrigger           (JackPcmHandle          *jack);
>
> /* --- define object type and export to BSE --- */
> static const char type_blurb[] = ("PCM driver implementation for JACK Audio Connection Kit "
>                                  "(http://jackaudio.org)");
> BSE_REGISTER_OBJECT (BsePcmDeviceJACK, BsePcmDevice, NULL, "", type_blurb, NULL, bse_pcm_device_jack_class_init, NULL, bse_pcm_device_jack_init);

hm, i guess this could be nicer as BsePcmDeviceJack, espcially for the class
name BsePcmDeviceJackClass. allthoguh JACK is often upper cased like the
abbreviation BSE, we also lower case it to Bse in namespaces for readability.
and the http://jackaudio.org/ also has a few examples of "Jack" ;)


> BSE_DEFINE_EXPORTS (__FILE__);
>
> /* --- variables --- */
> static gpointer parent_class = NULL;

hm, i should fix BSE_REGISTER_OBJECT() to act like G_DEFINE_TYPE and
provide a parent_class for you.

>
> /* --- functions --- */
> static void
> bse_pcm_device_jack_init (BsePcmDeviceJACK *self)
> {
>  /* FIXME: move signal handling somewhere else */

needs to be moved into birnetthread.c in particular.

>  sigset_t signal_mask;
>  sigemptyset (&signal_mask);
>  sigaddset (&signal_mask, SIGPIPE);
>
>  int rc = pthread_sigmask (SIG_BLOCK, &signal_mask, NULL);
>  if (rc != 0)
>    g_printerr ("jack driver: pthread_sigmask for SIGPIPE failed: %s\n", strerror (errno));

this is what you'd usually use g_warning() for, since it's unexpected.

>
>  jack_status_t status;
>
>  self->jack_client = jack_client_open ("beast", JackNoStartServer, &status); /* FIXME: translations(?) */

what's this "beast" being used for? (especially since you
quesiton translation)
e.g. if it's similar to window titles, it should be "BEAST",
if it's going to be used as a program name in continueous text,
it should prolly be _("Beast") and if its meant as a technical
identifier (e.g. for serialization), it should be "beast".

maybe Paul can shed light on this?

>  DEBUG ("attaching to JACK server returned status: %d\n", status);
>  /* FIXME: check status for proper error reporting

doesn't teh jack API have something like:
   const char* jack_status_to_string (jack_status_t status);
?

>   * FIXME: what about server name? (necessary if more than one jackd instance is running)

>   * FIXME: get rid of device unloading, so that the jackd connection can be kept initialized

altering the device unloading will have to be done differently,
the attempt to connect to the jack server will have to be made
much earlier in that case.
(in other words, i don't think you need this FIXME in this place,
altering the driver loading + startup logic will be an entirely
seperate undertaking)

>   *
>   * maybe move this to a different function

that can be done once needed.

>   */
> }
>
> namespace {
> struct DeviceDetails {
>  guint ports, input_ports, output_ports, physical_ports, terminal_ports;
>  vector<string> input_port_names;
>  vector<string> output_port_names;
>  DeviceDetails() : ports (0), input_ports (0), output_ports (0), physical_ports (0), terminal_ports (0)

please put a newline after the ':'

>  {
>  }
> };
> }
>
> static map<string, DeviceDetails>
> query_jack_devices (BsePcmDeviceJACK *self)
> {
>  map<string, DeviceDetails> devices;
>
>  /* FIXME: we need to make a difference here between
>   *  - jackd is running, but no output ports can be found

curious, why exactly could that be the case?

>   *  - jackd is not running

that should/will actually be determined by jack_client_open(), right?

>   */
>  const char **jack_ports = self->jack_client ? jack_get_ports (self->jack_client, NULL, NULL, 0) : NULL;
>  if (jack_ports)
>    {
>      for (guint i = 0; jack_ports[i]; i++)
> {
>  const char *end = strchr (jack_ports[i], ':');
>  if (end)
>    {
>      string device_name (jack_ports[i], end);

hm, please don't do "using std::string;" (further above)
but instead use std::string or just String which is provided
by using Birnet.

>      DeviceDetails &details = devices[device_name];
>      details.ports++;

why are you incrementing here already, if the port
could possibly not be looked up by name?

>
>      jack_port_t *jack_port = jack_port_by_name (self->jack_client, jack_ports[i]);
>      if (jack_port)
> {
>  int flags = jack_port_flags (jack_port);
>  if (flags & JackPortIsInput)
>    {
>      details.input_ports++;
>      details.input_port_names.push_back (jack_ports[i]);
>    }
>  if (flags & JackPortIsOutput)
>    {
>      details.output_ports++;
>      details.output_port_names.push_back (jack_ports[i]);
>    }
>  if (flags & JackPortIsPhysical)
>    details.physical_ports++;
>  if (flags & JackPortIsTerminal)
>    details.terminal_ports++;

why are you counting physical_ports and terminal_ports here,
regardless of whether the port in question is an input,
output or neither port?
and, if this is purely statistical/informative counting,
why are you only counting ports that can be looked up by name?

/me scratches head...

> }
>    }
> }
>      free (jack_ports);
>    }
>
>  return devices;

ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy?

> }
>
> static SfiRing*
> bse_pcm_device_jack_list_devices (BseDevice *device)
> {
>  BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device);
>  map<string, DeviceDetails> devices = query_jack_devices (self);
>
>  SfiRing *ring = NULL;
>  for (map<string, DeviceDetails>::iterator di = devices.begin(); di != devices.end(); di++)
>    {
>      const string &device_name = di->first;
>      DeviceDetails &details = di->second;
>      BseDeviceEntry *entry;
>      entry = bse_device_group_entry_new (device, g_strdup (device_name.c_str()),
>                                                  g_strdup ("JACK Devices"),
>                                                  g_strdup_printf ("%s (%d*input %d*output)%s",
>   device_name.c_str(),
>   details.input_ports,
>   details.output_ports,
>   details.physical_ports == details.ports ?
>     " [ hardware ]" : ""));

hm, can you please elaborate the
   details.physical_ports == details.ports ?  " [ hardware ]" : ""
condition for me? ;)

>
>      /* ensure that alsa_pcm always is the first item listed (it is the default device) */

hm, doesn't jack provide some kind of rating for the devices to pick?

>      if (device_name == "alsa_pcm")
> ring = sfi_ring_prepend (ring, entry);
>      else
> ring = sfi_ring_append (ring, entry);
>    }
>
>  if (!ring)
>    ring = sfi_ring_append (ring, bse_device_error_new (device, g_strdup_printf ("No devices found")));
>  return ring;
> }
>
> static void
> jack_shutdown_callback (void *jack_handle_ptr)
> {
>  JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr);
>
>  AutoLocker cond_locker (jack->cond_mutex);
>  jack->is_down = true;
>  jack->cond.signal();
> }
>
> static int

this misses a comment on the purpose of the return value.

> jack_process_callback (jack_nframes_t n_frames,

this, along with most other functions, should have a comment
that says which thread executes the function. e.g. like the
/* UserThread */ and /* EngineThread */ comments in bsepcmmodule.c
(or possibly even more elaborate)

>                       void          *jack_handle_ptr)
> {
>  typedef jack_default_audio_sample_t sample_t;

we use mixed case typenames, so this should be:
   typedef jack_default_audio_sample_t Sample;
or:
   typedef jack_default_audio_sample_t SampleType;

>
>  JackPcmHandle *jack = static_cast <JackPcmHandle *> (jack_handle_ptr);
>
>  int callback_state = Atomic::int_get (&jack->callback_state);

hm, callback_state is not declared as volatile above...
we had this in the ringbuffer already.

>
>  /* still waiting for initialization to complete */
>  if (callback_state == CALLBACK_STATE_ACTIVE)
>    {
>      bool have_cond_lock = jack->cond_mutex.trylock();
>
>      guint n_channels = jack->handle.n_channels;
>      float buffer[n_frames * n_channels];
>
>      if (jack->handle.readable)
> {
>  /* interleave input data for processing in the engine thread */
>  g_assert (jack->input_ports.size() == n_channels);
>
>  for (guint ch = 0; ch < n_channels; ch++)
>    {
>      sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames);
>      float *p = &buffer[ch];
>      for (guint i = 0; i < n_frames; i++)
> {
>  *p = *values++;
>  p += n_channels;
> }
>    }
>
>  guint frames_written = jack->input_ringbuffer.write (n_frames, buffer);
>  if (frames_written != n_frames)
>    Atomic::int_add (&jack->ixruns, 1);      /* input underrun detected */
> }
>      if (jack->handle.writable)
> {
>  guint read_frames = jack->output_ringbuffer.read (n_frames, buffer);
>  if (read_frames != n_frames)
>    {
>      Atomic::int_add (&jack->oxruns, 1);     /* output underrun detected */

was declared non-volatile oxruns.

>      Block::fill ((n_frames - read_frames) * n_channels, &buffer[read_frames * n_channels], 0.0);
>    }
>
>  /* deinterleave output data for processing in the engine thread */
>  g_assert (jack->output_ports.size() == n_channels);
>
>  for (guint ch = 0; ch < n_channels; ch++)
>    {
>      sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames);
>      sample_t *values_end = values + n_frames;
>      float *p = &buffer[ch];
>      while (values < values_end)
> {
>  *values++ = *p;
>  p += n_channels;
> }
>    }
> }
>
>      /*

commanets shouldn't start with an empty line unless its a comment
for the documentation system.

>       * as we can't (always) lock our data structures from the jack realtime
>       * thread, using a condition introduces the possibility for a race:

this type of comment should better be reworded in a general comment on how
things are supposed to work and communicate. then it can go before the
declaration of BsePcmDeviceJACK.

>       *
>       * normal operation:
>       *
>       * [bse thread]      writes some data to output_ringbuffer
>       * [bse thread]      checks remaining space, there is no room left
>       * [bse thread]      sleeps on the condition

BSE never sleeps, waiting for audio drivers. take a look at
alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c.
i.e. allow full buffer writes if at all possible and provide
timeout information on how long it takes for enough buffer space
to be available.
that's the kind of semantic that needs to be implemented.

>       * [jack thread]     reads some data from output_ringbuffer
>       * [jack thread]     signals the condition
>       * [bse thread]      continues to write some data to output_ringbuffer

basically, the above can work if you s/sleeps on the condition/continues
with audio processing/. BSE can do that, once it has the required timeout
information.


>       *
>       * race condition:
>       *
>       * [bse thread]      writes some data to output_ringbuffer
>       * [bse thread]      checks remaining space, there is no room left
>       * [jack thread]     reads some data from output_ringbuffer
>       * [jack thread]     signals the condition
>       * [bse thread]      sleeps on the condition

such races are only possible when not using conditions properly.
you always have to follow the sequence: lock(); check(); wait(); unlock();
and on the pther side: lock(); update(); signal(); unlock();
this ensures that no update/signal goes by unnoticed by the check. see:
   http://developer.gnome.org/doc/API/2.0/glib/glib-Threads.html#GCond

>       *
>       * since the jack callback gets called periodically, the bse thread will
>       * never starve, though, in the worst case it will just miss a frame
>       *
>       * so we absolutely exclude the possibility for priority inversion (by
>       * using trylock instead of lock), by introducing extra latency in
>       * some extremely rare cases

apart from excessive newlines ;) this argumentation is faulty.
think of jack exiting after you hit your signal-lost period.
BSE will hang forever, unacceptable.

>       */
>      if (have_cond_lock)
> jack->cond_mutex.unlock();
>    }
>  else
>    {

when exactly is this branch tirggered? (lost track due to the excessive
inlined commenting above.)

isn't this the !trylock() case?
the purpose of having an atomic ringbuffer in the first place was to
not be forced into using lock/trylock/unlock in the jack thread.
so this if-branch shouldn't occour in the implementaiton at all.

as an aside, if we would be willing to pay lock/unlock in the jack thread,
we could as well pass around list nodes that maintain audio buffers.
you could get entirely rid fo the value copying in the ring buffer that way...


>      for (guint ch = 0; ch < jack->input_ports.size(); ch++)
> {
>  sample_t *values = (sample_t *) jack_port_get_buffer (jack->input_ports[ch], n_frames);
>  Block::fill (n_frames, values, 0.0);
> }
>      for (guint ch = 0; ch < jack->output_ports.size(); ch++)
> {
>  sample_t *values = (sample_t *) jack_port_get_buffer (jack->output_ports[ch], n_frames);
>  Block::fill (n_frames, values, 0.0);
> }


>    }
>  jack->cond.signal();

ah, here's the racy signalling *outside* of the mutex lock being held...

>
>  if (callback_state == CALLBACK_STATE_PLEASE_TERMINATE)
>    {
>      Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED);
>      return 1;
>    }
>  return 0;

this misses a comment on the purpose of the return value.

> }
>
> static void
> terminate_and_free_jack (JackPcmHandle *jack)
> {
>  g_return_if_fail (jack->jack_client != NULL);
>
>  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_PLEASE_TERMINATE);
>  while (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_PLEASE_TERMINATE)
>    {
>      AutoLocker cond_locker (jack->cond_mutex);
>
>      if (jack->is_down) /* if jack is already gone, then forget it */
> Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED);
>      else
> jack->cond.wait_timed (jack->cond_mutex, 100000);
>    }

is the following for real?
you disconnect the ports *after* the connection is terminated?
or, is "terminated" wrongly worded here?

>
>  for (guint ch = 0; ch < jack->input_ports.size(); ch++)
>    jack_port_disconnect (jack->jack_client, jack->input_ports[ch]);
>
>  for (guint ch = 0; ch < jack->output_ports.size(); ch++)
>    jack_port_disconnect (jack->jack_client, jack->output_ports[ch]);
>
>  jack_deactivate (jack->jack_client);
>
>  delete jack;
> }
>
> static BseErrorType
> bse_pcm_device_jack_open (BseDevice     *device,
>                          gboolean       require_readable,
>                          gboolean       require_writable,
>                          guint          n_args,
>                          const gchar  **args)
> {
>  BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device);
>  if (!self->jack_client)
>    return BSE_ERROR_FILE_OPEN_FAILED;
>
>  JackPcmHandle *jack = new JackPcmHandle (self->jack_client);
>  BsePcmHandle *handle = &jack->handle;
>
>  handle->readable = require_readable;
>  handle->writable = require_writable;
>  handle->n_channels = BSE_PCM_DEVICE (device)->req_n_channels;
>
>  /* try setup */
>  BseErrorType error = BSE_ERROR_NONE;
>
>  const char *dname = (n_args == 1) ? args[0] : "alsa_pcm";
>  handle->mix_freq = jack_get_sample_rate (self->jack_client);
>
>  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE);
>
>  for (guint i = 0; i < handle->n_channels; i++)
>    {
>      const int port_name_size = jack_port_name_size();
>      char port_name[port_name_size];
>      jack_port_t *port;
>
>      snprintf (port_name, port_name_size, "in_%u", i);

please use g_snprintf() instead.

>      port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);

what is JACK_DEFAULT_AUDIO_TYPE? shouldn't we request floats here (i
dunno the jack API though).

>      if (port)
> jack->input_ports.push_back (port);
>      else
> error = BSE_ERROR_FILE_OPEN_FAILED;
>
>      snprintf (port_name, port_name_size, "out_%u", i);
>      port = jack_port_register (self->jack_client, port_name, JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0);
>      if (port)
> jack->output_ports.push_back (port);
>      else
> error = BSE_ERROR_FILE_OPEN_FAILED;
>    }
>
>  /* activate */
>
>  jack_set_process_callback (self->jack_client, jack_process_callback, jack);
>  jack_on_shutdown (self->jack_client, jack_shutdown_callback, jack);
>  jack_activate (self->jack_client);

you're activating here without checking for if (error)?

>
>  /* connect ports */
>
>  map<string, DeviceDetails> devices = query_jack_devices (self);

querying the devices could have been done before registering
the ports above.

>  map<string, DeviceDetails>::const_iterator di;
>
>  di = devices.find (dname);
>  if (di != devices.end())
>    {
>      const DeviceDetails &details = di->second;
>
>      for (guint ch = 0; ch < handle->n_channels; ch++)
> {
>  if (details.output_ports > ch)
>    jack_connect (self->jack_client, details.output_port_names[ch].c_str(), jack_port_name (jack->input_ports[ch]));
>  if (details.input_ports > ch)
>    jack_connect (self->jack_client, jack_port_name (jack->output_ports[ch]), details.input_port_names[ch].c_str());
> }
>    }

you're not setting error in case you failed to find
the required devices to connect the channels.

>
>  /* setup buffer size */
>
>  // keep at least two jack callback sizes for dropout free audio
>  guint min_buffer_frames = jack_get_buffer_size (self->jack_client) * 2;
>
>  // keep an extra engine buffer size (this compensates also for cases where the
>  // engine buffer size is not a 2^N value, which would otherwise cause the

yeah, engine buffers are usually not 2^n sizes. see bse_engine_constrain()
in bseengine.c.

>  // buffer never to be fully filled with 2 periods of data)

"periods"? do you mean "elements" or "frames" here? (using your ringbuffer
terminology)
or do you mean buffer sizes here? jack buffers or bse buffers?

>  min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length;

couldn't you instead just align min_buffer_frames to 2 * bse buffer size,
to still fullfil the requirements you outlined, and yet yield a smaller
ring buffer (thus optimize latency)?

>
>  // honor the user defined latency specification
>  //
>  // FIXME: should we try to tweak local buffering so that it corresponds
>  // to the user defined latency, or global buffering (including the jack
>  // buffer)? we do the former, since it is easier

we should do the latter. that shouldn't be too hard, Paul said that you
could query (and inform) jack about the latency on a port.
that then just needs to be subtracted from BSE_PCM_DEVICE (device)->req_latency_ms.

>  guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms * handle->mix_freq / 1000;
>  guint buffer_frames = max (min_buffer_frames, user_buffer_frames);
>
>  jack->input_ringbuffer.resize (buffer_frames, handle->n_channels);
>  jack->output_ringbuffer.resize (buffer_frames, handle->n_channels);

the way this is written, you're doubling the latency because you apply
it to the input and the output buffer. you should just apply it to the
output buffer (and also subtract the input buffer latency before doing
that)

>  jack->buffer_frames  = jack->output_ringbuffer.get_writable_frames();

what is jack->buffer_frames good for? and why is that different from
total_n_frames() here?
>
>  // the ringbuffer should be exactly as big as requested
>  g_assert (jack->buffer_frames == buffer_frames);
>  DEBUG ("ringbuffer size = %.3fms", jack->buffer_frames / double (handle->mix_freq) * 1000);

hm, you should better use driver prefixing in DEBUG() statements like
the ALSA driver does it, i.e.:
    DEBUG ("JACK: ringbuffer size = %.3fms", jack->buffer_frames / double (handle->mix_freq) * 1000);
of course that applies to all your DEBUG() calls.

>
>  /* setup PCM handle or shutdown */
>  if (!error)
>    {
>      bse_device_set_opened (device, dname, handle->readable, handle->writable);
>      if (handle->readable)
>        handle->read = jack_device_read;
>      if (handle->writable)
>        handle->write = jack_device_write;
>      handle->check_io = jack_device_check_io;
>      handle->latency = jack_device_latency;
>      BSE_PCM_DEVICE (device)->handle = handle;
>
>      /* will prevent dropouts at initialization, when no data is there at all */

why would this "prevent dropouts"? the jack thread got started already,
so a dropout could possibly have occoured _already_.
granted, retriggering may be neccessary, but the comment doesn't make
sense to me here.

>      jack_device_retrigger (jack);
>      jack_device_latency (handle);   // debugging only: print latency values
>    }
>  else
>    {
>      terminate_and_free_jack (jack);
>    }
>  DEBUG ("JACK: opening PCM \"%s\" readupble=%d writable=%d: %s", dname, require_readable, require_writable, bse_error_blurb (error));
>
>  return error;
> }
>
> static void
> bse_pcm_device_jack_close (BseDevice *device)
> {
>  JackPcmHandle *jack = (JackPcmHandle*) BSE_PCM_DEVICE (device)->handle;
>  BSE_PCM_DEVICE (device)->handle = NULL;
>
>  terminate_and_free_jack (jack);
> }
>
> static void
> bse_pcm_device_jack_finalize (GObject *object)
> {
>  BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (object);
>  if (self->jack_client)
>    {
>      jack_client_close (self->jack_client);
>      self->jack_client = NULL;
>    }
>
>  /* chain parent class' handler */
>  G_OBJECT_CLASS (parent_class)->finalize (object);
> }
>
> static void
> jack_device_retrigger (JackPcmHandle *jack)
> {
>  g_return_if_fail (jack->jack_client != NULL);
>
>  /*

extraneous newline at coment start.

>   * we need to reset the active flag to false here, as we modify the
>   * buffers in a non threadsafe way; this is why we also wait for
>   * the condition, to ensure that the jack callback really isn't
>   * active any more
>   */
>  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE);
>
>  /* usually should not timeout, but be notified by the jack callback */
>  jack->cond_mutex.lock();
>  if (!jack->is_down)
>    jack->cond.wait_timed (jack->cond_mutex, 100000);
>  jack->cond_mutex.unlock();

hm, is_down is set by the jack thread onkly volountarily, i.e. if
it exits prematurely, you get the stale BSE thread i mntioned above.

>
>  /* jack_ringbuffer_reset is not threadsafe! */

yeah, i didn't mention this in the ringbuffer review, but
i'd agree that "reset" would be a better name than "clear"
for the resetting of the pointers.

>  jack->input_ringbuffer.clear();
>  jack->output_ringbuffer.clear();
>
>  /* initialize output ringbuffer with silence */
>  vector<float> silence (jack->buffer_frames * jack->handle.n_channels);

this is expensive, you can simply use bse_engine_const_zeros() which
provides at least BSE_STREAM_MAX_VALUES many values in constant time.
and yes, this should be fixed in the ALSA driver as well.

>  guint frames_written = jack->output_ringbuffer.write (jack->buffer_frames, &silence[0]);
>  g_assert (frames_written == jack->buffer_frames);
>  DEBUG ("jack_device_retrigger: %d frames written", frames_written);

hm, at the end of retrigger, the jack thread should go ACTIVE.
just like we call snd_pcm_prepare() in  bsepcmdevice-alsa.c:alsa_device_retrigger().

> }
>
> static gboolean
> jack_device_check_io (BsePcmHandle *handle,
>                      glong        *timeoutp)
> {
>  JackPcmHandle *jack = (JackPcmHandle*) handle;
>  g_return_val_if_fail (jack->jack_client != NULL, false);
>
> /*
>  int ixruns = Atomic::int_get (&jack->ixruns);
>  int oxruns = Atomic::int_get (&jack->oxruns);
>
>  if (jack->poxruns != oxruns || jack->pixruns != ixruns)
>    {
>      printf ("beast caused jack input xruns %d, output xruns %d\n", ixruns, oxruns);

debugging prints (if you already failed to use DEBUG()) should go to
stderr, otherwise they cause major breakage in programs that generate
data on stdout (of which we have quite bunch in CVS).

>      jack->poxruns = oxruns;
>      jack->pixruns = ixruns;
>
>      jack_device_retrigger (jack);
>    }
> */
>
>  /* (FIXME?)
>   *
>   * we can not guarantee that beast will read the data from the jack ringbuffer,
>   * since this depends on the presence of a module that actually reads data in
>   * the synthesis graph : so we simply ignore the ixruns variable; I am not sure
>   * if thats the right thing to do, though

sounds reasonably at first glance.

>   */
>  int oxruns = Atomic::int_get (&jack->oxruns);
>
>  if (jack->poxruns != oxruns)
>    {
>      g_printerr ("%d beast jack driver xruns\n", oxruns);
>      jack->poxruns = oxruns;
>
>      jack_device_retrigger (jack);

you shouldn't need to retrigger after xruns have occoured.
it has already happened and extra padding was provided already.
if you do this, you just make sure that there really is a
loud and noticable click in the output (while a simply
1-block xrun often doesn't produce too bad artefacts)

>    }
>
>  Atomic::int_set (&jack->callback_state, CALLBACK_STATE_ACTIVE);

this should have gone into retrigger, see alsa_device_retrigger() and
alsa_device_check_io().

>
>  guint n_frames_avail = min (jack->output_ringbuffer.get_writable_frames(), jack->input_ringbuffer.get_readable_frames());
>
>  /* check whether data can be processed */
>  if (n_frames_avail >= handle->block_length)
>    return TRUE;        /* need processing */
>
>  /* calculate timeout until processing is possible or needed */
>  guint diff_frames = handle->block_length - n_frames_avail;
>  *timeoutp = diff_frames * 1000 / handle->mix_freq;
>  return FALSE;
> }
>
> static guint
> jack_device_latency (BsePcmHandle *handle)
> {
>  JackPcmHandle *jack = (JackPcmHandle*) handle;
>  jack_nframes_t rlatency = 0, wlatency = 0;
>
>  g_return_val_if_fail (jack->jack_client != NULL, 0);
>
>  /* FIXME: the API of this function is broken, because you can't use its result
>   * to sync for instance the play position pointer with the screen
>   *
>   * why? because it doesn't return you rlatency and wlatency seperately, but
>   * only the sum of both
>   *
>   * when using jack, there is no guarantee that rlatency and wlatency are equal

good point.
and more importantly, for song pointer syncs, we want to use just the
wlatency. so for a first fix, we can change oss_device_latency() and
alsa_device_latency() to just return the wlatency.
we can later improve the API to also support rlatency queries.

>   */
>  for (guint i = 0; i < jack->input_ports.size(); i++)
>    {
>      jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->input_ports[i]);
>      rlatency = max (rlatency, latency);
>    }
>
>  for (guint i = 0; i < jack->output_ports.size(); i++)
>    {
>      jack_nframes_t latency = jack_port_get_total_latency (jack->jack_client, jack->output_ports[i]);
>      wlatency = max (wlatency, latency);
>    }
>
>  guint total_latency = 2 * jack->buffer_frames + rlatency + wlatency;

ah, this is where buffer_frames gets used?
can't you simply use jack->input_ringbuffer.n_total_frames() +
jack->output_ringbuffer.n_total_frames().
(also be more accurate if the buffers have different lengths,
e.g. the input buffer could possibly be shorter than the output
buffer).


>  DEBUG ("rlatency=%.3f ms wlatency=%.3f ms ringbuffer=%.3f ms total_latency=%.3f ms",
>         rlatency / double (handle->mix_freq) * 1000,
>         wlatency / double (handle->mix_freq) * 1000,
>         jack->buffer_frames / double (handle->mix_freq) * 1000,
> total_latency / double (handle->mix_freq) * 1000);
>  return total_latency;
> }
>
> static gsize
> jack_device_read (BsePcmHandle *handle,
>                  gfloat       *values)
> {
>  JackPcmHandle *jack = (JackPcmHandle*) handle;
>  g_return_val_if_fail (jack->jack_client != NULL, 0);
>  g_return_val_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_PACTIVE, 0);
>
>  guint frames_left = handle->block_length;
>  while (frames_left)
>    {
>      guint frames_read = jack->input_ringbuffer.read (frames_left, values);
>
>      frames_left -= frames_read;
>      values += frames_read * handle->n_channels;
>
>      if (frames_left)
> {
>  AutoLocker cond_locker (jack->cond_mutex);
>
>  /* usually should not timeout, but be notified by the jack callback */
>  if (!jack->input_ringbuffer.get_readable_frames())
>    jack->cond.wait_timed (jack->cond_mutex, 100000);

we can't afford to wait on jack here. engine locks have to be
super-short (examples can be found in bseengineutils.c).
i said this above already, we can't actually afford locking at all,
which is why we have an atomic ringbuffer in the first place.

>
>  if (jack->is_down)
>    {
>      /*

extraneous newline at comment start.

>       * FIXME: we need a way to indicate an error here; beast should provide
>       * an adequate reaction in case the JACK server is down (it should stop
>       * playing the file, and show a dialog, if JACK can not be reconnected)

we don't need error indication here (see alsa_device_read()). it's
good enough if we return gracefully here.
error checking code (and returns) can be executed by check_io()
and/or retrigger().

>       *
>       * if we have a way to abort processing, this if can be moved above
>       * the condition wait; however, right now moving it there means that
>       * beast will render the output as fast as possible when jack dies, and
>       * this will look like a machine lockup

no, endless loops don't look like lockups ;)
in the first case, your CPU meter runs at 100% and toggling Num-lock/etc.
keys still work (usually also the X server etc.) while in the latter case
*nothing* goes anymore.

>       */
>      Block::fill (frames_left * handle->n_channels, values, 0.0);   /* <- remove me once we can indicate errors */

nope should stay (the same way alsa_device_read() copeswith errors).

>      return handle->block_length * handle->n_channels;
>    }
> }
>    }
>  return handle->block_length * handle->n_channels;
> }
>
> static void
> jack_device_write (BsePcmHandle *handle,
>                   const gfloat *values)
> {
>  JackPcmHandle *jack = (JackPcmHandle*) handle;
>  g_return_if_fail (jack->jack_client != NULL);
>  g_return_if_fail (Atomic::int_get (&jack->callback_state) == CALLBACK_STATE_ACTIVE);
>
>  guint frames_left = handle->block_length;
>
>  while (frames_left)
>    {
>      guint frames_written = jack->output_ringbuffer.write (frames_left, values);
>
>      frames_left -= frames_written;
>      values += frames_written * handle->n_channels;
>
>      if (frames_left)
> {
>  AutoLocker cond_locker (jack->cond_mutex);
>
>  /* usually should not timeout, but be notified by the jack callback */
>  if (!jack->output_ringbuffer.get_writable_frames())
>    jack->cond.wait_timed (jack->cond_mutex, 100000);
>
>  if (jack->is_down)
>    {
>      /*
>       * FIXME: we need a way to indicate an error here; beast should provide
>       * an adequate reaction in case the JACK server is down (it should stop
>       * playing the file, and show a dialog, if JACK can not be reconnected)
>       *
>       * if we have a way to abort processing, this if can be moved above
>       * the condition wait; however, right now moving it there means that
>       * beast will render the output as fast as possible when jack dies, and
>       * this will look like a machine lockup
>       */

hrm, /* FIXME: see jack_device_read() */ would have been easier here.
my comments there also apply here of course.

>      return;
>    }
> }
>    }
> }
>
> static void
> bse_pcm_device_jack_class_init (BsePcmDeviceJACKClass *klass)
> {
>  GObjectClass *gobject_class = G_OBJECT_CLASS (klass);
>  BseDeviceClass *device_class = BSE_DEVICE_CLASS (klass);
>
>  parent_class = g_type_class_peek_parent (klass);
>
>  gobject_class->finalize = bse_pcm_device_jack_finalize;
>
>  device_class->list_devices = bse_pcm_device_jack_list_devices;
>  const gchar *name = "jack";
>  const gchar *syntax = _("device");

"device" is too short of a syntax description for users.
i don't know jack devices, but i suppose you need to say something
like _("JACK_DEVICE_NAME") here. again, see how the alsa driver
tries to guide the user here.

>  const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70 chars in width */
>                                       _("JACK Audio Connection Kit PCM driver (http://jackaudio.org)\n"));

jack version printing? again, see what the alsa driver does here.

>
>  /* Being BSE_RATING_PREFFERED + 1, should do the right thing: if jackd is running,
>   * use it, if jackd isn't running (we'll get an error upon initialization,
>   * then), use the next best driver.
>   *
>   * Right now we will never start the jackd process ourselves.
>   */
>  bse_device_class_setup (klass, BSE_RATING_PREFERRED + 1, name, syntax, info);
>  device_class->open = bse_pcm_device_jack_open;
>  device_class->close = bse_pcm_device_jack_close;
> }
>
>   Cu... Stefan


for a few closing words:

first, thanks for starting out on the implementation of a jack
driver for BSE. the task in itself is definitely non-trivial.

on the first draft, it looks like the driver still needs a lot of
work to be basically usable.
some portions already look good, but in others...
it really was disappointing for me, how much of perfectly good
template code / logic provided by bsepcmdevice-alsa.c simply got
lost or ignored in the transition to a jack driver. re-doing these
just adds up to the list of TODOs.

the single most important issue that has to be fixed here is
the mutex and condition. neither the BSE nor the JACK thread
can afford acquiring it and wait for the other thread.
that's why we need a ringbuffer based only on atomic ops. we
got that now, so the next step is to modify the code to use
it up to its fullest potential.

basically, to get there, the jack thread needs to be modified
to act more similarl to e.g. the kernel ALSA driver wrg to
xruns. the kernel driver also has fixed-size ring buffers and
still deals with drop outs perfectly well. that kind of API
semantic has to be provided for the BSE thread in the bse-alsa
driver. then, the implementations of check_io(), retrigger(),
read() and write() can be much closer to the bse-alsa driver
implementation. also, this will eliminate the need for the
mutex and the condition.

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: Beast Jack driver (Re: Lockfree ringbuffer review)

Stefan Westerfeld
In reply to this post by Tim Janik
   Hi!

I am still working on a new version of the code, where I merge your
comments, so I'll just reply to points I can immediately reply, and post
new code when done.

On Thu, Jun 01, 2006 at 10:53:31PM +0200, Tim Janik wrote:

> >>>Finally, if we were to optimize the JACK driver for maximum performance,
> >>>a more important factor would probably be that the ringbuffer even has
> >>>this API.
> >>
> >>sorry? what API are you talking about?
> >>void    bse_block_copy_float    (guint          n_values,
> >>                                 gfloat        *values,
> >>                                 const gfloat  *ivalues);
> >>copying?
> >>
> >>>It means that we need to deinterleave and reinterleave the
> >>>data JACK supplies on the stack, and then read/write it to the
> >>>ringbuffer. If the ringbuffer would offer an API to access its memory
> >>>directly, the extra copy would disappear. But it makes the code harder
> >>>to maintain and for now my first priority has been getting things
> >>>implemented correctly.
> >>
> >>i don't quite understand this point.
> >>are you saying that replacing memcpy() by bse_block_copy_float() can't
> >>be done due to interleaving? :-O
> >
> >No. It can be replaced, and the new version has it. However, our data
> >flow looks like this:
> >
> > input
> >
> >jack thread process() callback supplies N separate buffers
> >  |
> >  V
> >interleave on the stack (manually implemented, in process())      
> >*jack-thread*
> >  |
> >  V
> >copy into the input ringbuffer (input ringbuffer write)            
> >*jack-thread*
> >  |
> >  V
> >copy in bse engine supplied buffer (input ringbuffer read)        
> >*engine-thread*
> >  |
> >  V
>
> >[ beast processing ]                                              
> >*engine-thread*
>
> note that this is processing in BSE, not beast (the GUI).
> and BSE does deinterleave its data, process it and then reinterleave
> because standard audio drivers want that. for jack this doesn't
> make sense though, so the data flow should instead be:
>   n-jack-buffers -> ringbuffer-write -> ||
>   || -> ringbuffer-read -> n-bse-channel-blocks

Ok, the new version gets rid of the interleaving/deinterleaving. That
requires changes in the driver API of BSE, though.

> >>>void
> >>>clear()
> >>>{
> >>>  Atomic::int_set (&m_read_frame_pos, 0);
> >>>  Atomic::int_set (&m_write_frame_pos, 0);
> >>>}
> >>
> >>is this really needed outside of resize() though?
> >
> >The jack driver needs it (to get into a sane state after a dropout).
>
> how are you avoiding concurrent access when clearing due to a drop
> out then?

The process callback checks an atomic integer before accessing any data
structures (its called callback_state). Before resetting the ringbuffer,
this callback state will be set to "inactive" (in the engine thread),
and one condition wait will be done to ensure that if a process()
callback was still running (and thus accessing data structures), it
terminated.

Thus, when the actual clear() is executed, it is guaranteed that the
jack process callback can't interfere, and - on the other hand - the
engine thread can also not interfere, because it is the thread calling
the clear() method.

Afterwards, the atomic integer is set back to "active" to reactivate the
jack thread callback audio processing.

   Cu... Stefan
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Stefan Westerfeld
In reply to this post by Tim Janik
   Hi!

Same comment as for the other reply: I'll post fixed code later, so I'll
just answer to discussion stuff.

On Fri, Jun 02, 2006 at 01:45:01AM +0200, Tim Janik wrote:

> >enum
> >{
> > CALLBACK_STATE_INACTIVE = 0,
> > CALLBACK_STATE_ACTIVE = 1,
> > CALLBACK_STATE_PLEASE_TERMINATE = 2,
> > CALLBACK_STATE_TERMINATED = 3
>
> minor cosmetic, we usually align '='-symbols in enums.
>
> >};
>
> you should provide a type name for this enum, which
> can be used above to tell apart valid states of your
> callback_state variable.

The problem is that callback_state is an atomic integer, so I think it
needs to be volatile int, and I can't declare it of the enum type; or am
I wrong?

> > vector<jack_port_t *>  input_ports;
> > vector<jack_port_t *>  output_ports;
> >
> > guint  buffer_frames; /* input/output
> > ringbuffer size in frames */
> >
> > Cond  cond;
> > Mutex  cond_mutex;
>
> erk, what exactly is protected by this cond and this mutex?
> something like this definitely needs to be outlined in code
> (in comments or by naming prefixes, look for BirnetMutex
> in bseengineutils.c to see examples for that)

Well, I'll put a comment there, but if you want to know what it does
right now: the condition is signalled whenever the realtime callback has
been called and is thus be used to wait for events that depend on the
realtime callback (like waiting for termination or new data).

The mutex is mostly useless - its there because conditions require it
(you can't wait on a condition without one), but all code needs to work
without the mutex, because the realtime thread can not use lock() (only
trylock()) due to possible priority inversion.

> > jack_status_t status;
> >
> > self->jack_client = jack_client_open ("beast", JackNoStartServer,
> > &status); /* FIXME: translations(?) */
>
> what's this "beast" being used for? (especially since you
> quesiton translation)
> e.g. if it's similar to window titles, it should be "BEAST",
> if it's going to be used as a program name in continueous text,
> it should prolly be _("Beast") and if its meant as a technical
> identifier (e.g. for serialization), it should be "beast".
>
> maybe Paul can shed light on this?

Its the name of the client, which means that for the command line utils
for instance you'll use

beast:in_1

as port name. So this suggests that you may not want to translate it.
However, for qjackctl, its also the label that is displayed in the GUI,
in the widget where the clients (and ports) are listed.

So this suggests that you may want to translate it. However then, if you
store a session (somehow, I don't know the details), and try to restore
it in a different language, it will break.

So probably its just that jack has no model that strings might need to
be translated, and just treats the world as if there was only one
language, so the best we can do is to not translate the name.

> > DEBUG ("attaching to JACK server returned status: %d\n", status);
> > /* FIXME: check status for proper error reporting
>
> doesn't teh jack API have something like:
>   const char* jack_status_to_string (jack_status_t status);
> ?

No.

> >static map<string, DeviceDetails>
> >query_jack_devices (BsePcmDeviceJACK *self)
> >{
> > map<string, DeviceDetails> devices;
> >
> > /* FIXME: we need to make a difference here between
> >  *  - jackd is running, but no output ports can be found
>
> curious, why exactly could that be the case?

I am not sure it can happen, as you need to load some driver to be able
to start jack at all, and the driver will probably always register some
terminal ports. However, in theory it could be possible to start jack
only to connect two applications; for instance one that produces audio
(beast) and another one that sends it over the net somewhere. Then,
which application you start first is arbitary, and it might as well be
that beast gets started first.

But... well... I don't know if jack is ever going to support starting
without a driver (it would need to do something to get the timing,
sample rate and so on setup; normally probably the driver does this).

> >  *  - jackd is not running
>
> that should/will actually be determined by jack_client_open(), right?

Right. Then self->jack_client will be NULL. Or not, if jackd was once
running but crashed later...

> >      DeviceDetails &details = devices[device_name];
> >      details.ports++;
>
> why are you incrementing here already, if the port
> could possibly not be looked up by name?

Nothing of significance. If jack_port_by_name returns NULL, the port
doesn't exist any longer (or jackd crashed), and it will not be used
anyway, since the other flags don't say that it is usable either as
input or as output.

> >      jack_port_t *jack_port = jack_port_by_name (self->jack_client,
> >      jack_ports[i]);
> >      if (jack_port)
> > {
> >  int flags = jack_port_flags (jack_port);
> >  if (flags & JackPortIsInput)
> >    {
> >      details.input_ports++;
> >      details.input_port_names.push_back (jack_ports[i]);
> >    }
> >  if (flags & JackPortIsOutput)
> >    {
> >      details.output_ports++;
> >      details.output_port_names.push_back (jack_ports[i]);
> >    }
> >  if (flags & JackPortIsPhysical)
> >    details.physical_ports++;
> >  if (flags & JackPortIsTerminal)
> >    details.terminal_ports++;
>
> why are you counting physical_ports and terminal_ports here,
> regardless of whether the port in question is an input,
> output or neither port?
> and, if this is purely statistical/informative counting,
> why are you only counting ports that can be looked up by name?
>
> /me scratches head...

I can only gather statistics if I get the jack_port (because only then
jack_port_flags can be called). I gather statistics to display it later.
Intersting is the physical flag "hardware ports, like an alsa device",
and the input/output flag. I am just gathering terminal for the sake of
completeness.

> > }
> >    }
> > }
> >     free (jack_ports);
> >   }
> >
> > return devices;
>
> ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy?

Yes. So? Its not in any time critical code, so I don't bother to
optimize it.

> >}
> >
> >static SfiRing*
> >bse_pcm_device_jack_list_devices (BseDevice *device)
> >{
> > BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device);
> > map<string, DeviceDetails> devices = query_jack_devices (self);
> >
> > SfiRing *ring = NULL;
> > for (map<string, DeviceDetails>::iterator di = devices.begin(); di !=
> > devices.end(); di++)
> >   {
> >     const string &device_name = di->first;
> >     DeviceDetails &details = di->second;
> >     BseDeviceEntry *entry;
> >     entry = bse_device_group_entry_new (device, g_strdup
> >     (device_name.c_str()),
> >                                                 g_strdup ("JACK Devices"),
> >                                                 g_strdup_printf ("%s
> >                                                 (%d*input %d*output)%s",
> >   device_name.c_str(),
> >   details.input_ports,
> >   details.output_ports,
> >   details.physical_ports == details.ports ?
> >     " [
> >     hardware ]" : ""));
>
> hm, can you please elaborate the
>   details.physical_ports == details.ports ?  " [ hardware ]" : ""
> condition for me? ;)

Well... if all ports of a client are physical, then its got to be a
hardware device. For instance alsa_pcm has only physical ports here.

I am not sure what a client is that has *some* physical ports... I've
yet to see one.

> >
> >     /* ensure that alsa_pcm always is the first item listed (it is the
> >     default device) */
>
> hm, doesn't jack provide some kind of rating for the devices to pick?

No. People hardcode alsa_pcm as default all day long.

> >      *
> >      * normal operation:
> >      *
> >      * [bse thread]      writes some data to output_ringbuffer
> >      * [bse thread]      checks remaining space, there is no room left
> >      * [bse thread]      sleeps on the condition
>
> BSE never sleeps, waiting for audio drivers. take a look at
> alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c.
> i.e. allow full buffer writes if at all possible and provide
> timeout information on how long it takes for enough buffer space
> to be available.
> that's the kind of semantic that needs to be implemented.

Yes, it does block in the ALSA driver. However, I think I'll come back
to it in a seperate mail, since you said it shouldn't or maybe should,
so I'll investigate.

> >      * [bse thread]      writes some data to output_ringbuffer
> >      * [bse thread]      checks remaining space, there is no room left
> >      * [jack thread]     reads some data from output_ringbuffer
> >      * [jack thread]     signals the condition
> >      * [bse thread]      sleeps on the condition
>
> such races are only possible when not using conditions properly.
> you always have to follow the sequence: lock(); check(); wait(); unlock();
> and on the pther side: lock(); update(); signal(); unlock();
> this ensures that no update/signal goes by unnoticed by the check. see:
>   http://developer.gnome.org/doc/API/2.0/glib/glib-Threads.html#GCond

Cool. Just that I can't lock() in the RT thread, because of priority
inversion.

I've read a little kernel code today, and fs/pipe.c has a
PIPE_MUTEX(...) which is being locked. So I am not sure I can use
write() to wakeup the master thread either.

However pthread_cond_signal in glibc also seems to have a kind of a
mutex (so its also a problem), although I am not sure if I interpret the
code correctly. But if I do, then I am running out of ideas how to get
this right at all.

Well, I think I have to think/read/ask others more about it until I know
whats the right thing to do, to get the synchronization between the
engine thread and the jack realtime callback right.

> >      * since the jack callback gets called periodically, the bse thread
> >      will
> >      * never starve, though, in the worst case it will just miss a frame
> >      *
> >      * so we absolutely exclude the possibility for priority inversion (by
> >      * using trylock instead of lock), by introducing extra latency in
> >      * some extremely rare cases
>
> apart from excessive newlines ;) this argumentation is faulty.
> think of jack exiting after you hit your signal-lost period.
> BSE will hang forever, unacceptable.

Won't because all conditions are timeout bound and there is a shutdown
handler that libjack will call if jack dies, so that eventually
jack->is_down gets set, upon which nothing will hang (look at the code
again to see for yourself).

>
> >      */
> >     if (have_cond_lock)
> > jack->cond_mutex.unlock();
> >   }
> > else
> >   {
>
> when exactly is this branch tirggered? (lost track due to the excessive
> inlined commenting above.)

There are different activation states. Thats the case where the callback
has marked to be inactive (shouldn't touch ringbuffers and such), so it
will just write out zero blocks.

> isn't this the !trylock() case?
> the purpose of having an atomic ringbuffer in the first place was to
> not be forced into using lock/trylock/unlock in the jack thread.
> so this if-branch shouldn't occour in the implementaiton at all.

Trylock is there to make some stuff (like updating the ringbuffers)
happen within a lock, so that in principle, things that wait on new
ringbuffer state will not fall in the condition trap you mentioned
above. However, since its only a *try*lock, it can't prevent the
condition trap entierly. Thus the lengthly comment.

> as an aside, if we would be willing to pay lock/unlock in the jack thread,
> we could as well pass around list nodes that maintain audio buffers.
> you could get entirely rid fo the value copying in the ring buffer that
> way...

No, because the engine block size and the jack block size are not
necessarily dividable, so you've got to do some copying to make get
map the "half engine blocks" onto jack blocks, and the "half jack
blocks" onto engine blocks. Also the jack buffers get invalid after the
process() callback, and the engine buffers get invalid out of the engine
process() so you've got to copy the data somewhere.

The only way to get rid of copying entierly would be to run the engine
process cycle in the jack thread, either directly via lock, or with two
blocking context switches back and forth. But that'd probably violate a
million restrictions that apply within the jack callback.


> >     for (guint ch = 0; ch < jack->input_ports.size(); ch++)
> > {
> >  sample_t *values = (sample_t *) jack_port_get_buffer
> >  (jack->input_ports[ch], n_frames);
> >  Block::fill (n_frames, values, 0.0);
> > }
> >     for (guint ch = 0; ch < jack->output_ports.size(); ch++)
> > {
> >  sample_t *values = (sample_t *) jack_port_get_buffer
> >  (jack->output_ports[ch], n_frames);
> >  Block::fill (n_frames, values, 0.0);
> > }
>
>
> >   }
> > jack->cond.signal();
>
> ah, here's the racy signalling *outside* of the mutex lock being held...
>
> >
> > if (callback_state == CALLBACK_STATE_PLEASE_TERMINATE)
> >   {
> >     Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED);
> >     return 1;
> >   }
> > return 0;
>
> this misses a comment on the purpose of the return value.

FYI:

0 = run on
1 = die now

But I'll add a comment.

> >static void
> >terminate_and_free_jack (JackPcmHandle *jack)
> >{
> > g_return_if_fail (jack->jack_client != NULL);
> >
> > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_PLEASE_TERMINATE);
> > while (Atomic::int_get (&jack->callback_state) ==
> > CALLBACK_STATE_PLEASE_TERMINATE)
> >   {
> >     AutoLocker cond_locker (jack->cond_mutex);
> >
> >     if (jack->is_down) /* if jack is already gone, then forget it */
> > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_TERMINATED);
> >     else
> > jack->cond.wait_timed (jack->cond_mutex, 100000);
> >   }
>
> is the following for real?
> you disconnect the ports *after* the connection is terminated?
> or, is "terminated" wrongly worded here?

Its not the connection that terminated. Its just that the process
callback will no longer called (i.e. the client has become inactive).

In principle, you could start new process callback cycles, by calling
jack_activate again. But can't happen with the driver model bse gives
me.

It can happen, that jackd has crashed inbetween, so I am disconnecting
ports that are no longer there at all. But it seems that libjack can
handle it (by seeing that writing the disconnect requests to the jack
pipe will fail), so in this case it will return an error which I'll
ignore.

> > for (guint ch = 0; ch < jack->input_ports.size(); ch++)
> >   jack_port_disconnect (jack->jack_client, jack->input_ports[ch]);
> >
> > for (guint ch = 0; ch < jack->output_ports.size(); ch++)
> >   jack_port_disconnect (jack->jack_client, jack->output_ports[ch]);
> >
> > jack_deactivate (jack->jack_client);
> >
> > delete jack;
> >}
> >

> > for (guint i = 0; i < handle->n_channels; i++)
> >   {
> >     const int port_name_size = jack_port_name_size();
> >     char port_name[port_name_size];
> >     jack_port_t *port;
> >
> >     snprintf (port_name, port_name_size, "in_%u", i);
>
> please use g_snprintf() instead.
>
> >     port = jack_port_register (self->jack_client, port_name,
> >     JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);
>
> what is JACK_DEFAULT_AUDIO_TYPE? shouldn't we request floats here (i
> dunno the jack API though).

JACK_DEFAULT_AUDIO_TYPE is float. There is no other audio type at all,
i.e. there is no negotaition or anything. It also corresponds to my
sample_t typedef above (is now called JackSample typedef to be
UpperLowerCaseNamed), which is also float. Otherwise, the whole thing
will break into pieces anyway, because I'll just copy things.

> > /* activate */
> >
> > jack_set_process_callback (self->jack_client, jack_process_callback,
> > jack);
> > jack_on_shutdown (self->jack_client, jack_shutdown_callback, jack);
> > jack_activate (self->jack_client);
>
> you're activating here without checking for if (error)?

Good point. Will fix.

> > /* connect ports */
> >
> > map<string, DeviceDetails> devices = query_jack_devices (self);
>
> querying the devices could have been done before registering
> the ports above.

So what? I can only connect the ports anyway.

> > map<string, DeviceDetails>::const_iterator di;
> >
> > di = devices.find (dname);
> > if (di != devices.end())
> >   {
> >     const DeviceDetails &details = di->second;
> >
> >     for (guint ch = 0; ch < handle->n_channels; ch++)
> > {
> >  if (details.output_ports > ch)
> >    jack_connect (self->jack_client,
> >    details.output_port_names[ch].c_str(), jack_port_name
> >    (jack->input_ports[ch]));
> >  if (details.input_ports > ch)
> >    jack_connect (self->jack_client, jack_port_name
> >    (jack->output_ports[ch]), details.input_port_names[ch].c_str());
> > }
> >   }
>
> you're not setting error in case you failed to find
> the required devices to connect the channels.

I don't want to. If you for instance start beast with

-pjack=jamin

to use jamin for mastering, and later the user chooses to terminate
jamin, then we can still run. Its just that then the ports will be
initially unconnected, which is not a problem, because it can be easily
fixed in qjackctl.

> > /* setup buffer size */
> >
> > // keep at least two jack callback sizes for dropout free audio
> > guint min_buffer_frames = jack_get_buffer_size (self->jack_client) * 2;
> >
> > // keep an extra engine buffer size (this compensates also for cases
> > where the
> > // engine buffer size is not a 2^N value, which would otherwise cause the
>
> yeah, engine buffers are usually not 2^n sizes. see bse_engine_constrain()
> in bseengine.c.
>
> > // buffer never to be fully filled with 2 periods of data)
>
> "periods"? do you mean "elements" or "frames" here? (using your ringbuffer
> terminology)
> or do you mean buffer sizes here? jack buffers or bse buffers?

Jack buffer sizes.

>
> > min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length;
>
> couldn't you instead just align min_buffer_frames to 2 * bse buffer size,
> to still fullfil the requirements you outlined, and yet yield a smaller
> ring buffer (thus optimize latency)?

Don't know. Trouble comes when we miss the - racy - condition. That
produces one period of extra latency if things go badly wrong. Thus a
stateful synchronization which can't miss something (such as a pipe())
would be superior. Then I suppose we could do as you suggested.

> > // honor the user defined latency specification
> > //
> > // FIXME: should we try to tweak local buffering so that it corresponds
> > // to the user defined latency, or global buffering (including the jack
> > // buffer)? we do the former, since it is easier
>
> we should do the latter. that shouldn't be too hard, Paul said that you
> could query (and inform) jack about the latency on a port.
> that then just needs to be subtracted from BSE_PCM_DEVICE
> (device)->req_latency_ms.

It can change. If the beast output is jamin, that'd be adding latency.
If the user reconnects jamin to a convolution which adds latency, then
the total output latency query would change while we're running. Thats
why I am a bit reluctant to do it, as its results are not well-defined.

It could even happen that we are not connected initially (latency = 0),
and only get connected afterwards by a patchbay, where the user is
loading a session. Thus in such a case we would always misbehave.

>
> > guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms *
> > handle->mix_freq / 1000;
> > guint buffer_frames = max (min_buffer_frames, user_buffer_frames);
> >
> > jack->input_ringbuffer.resize (buffer_frames, handle->n_channels);
> > jack->output_ringbuffer.resize (buffer_frames, handle->n_channels);
>
> the way this is written, you're doubling the latency because you apply
> it to the input and the output buffer. you should just apply it to the
> output buffer (and also subtract the input buffer latency before doing
> that)

The input buffer should not produce latency. Ideally, the filled frames
of the input buffer and the filled frames of the output buffer should
add up to exactly buffer_frames. Note that the input_buffer is *empty*
initially while the output_buffer is *full* initially, and the jack
callback methodically only adds as many frames to the input_buffer as it
takes from the output_buffer, whereas the engine process cycle does the
inverse.

> > jack->buffer_frames  = jack->output_ringbuffer.get_writable_frames();
>
> what is jack->buffer_frames good for? and why is that different from
> total_n_frames() here?

Its not different. Don't know. Maybe I'll get rid of it in the new
version.

> > // the ringbuffer should be exactly as big as requested
> > g_assert (jack->buffer_frames == buffer_frames);
> > DEBUG ("ringbuffer size = %.3fms", jack->buffer_frames / double
> > (handle->mix_freq) * 1000);
>
> hm, you should better use driver prefixing in DEBUG() statements like
> the ALSA driver does it, i.e.:
>    DEBUG ("JACK: ringbuffer size = %.3fms", jack->buffer_frames / double
>    (handle->mix_freq) * 1000);
> of course that applies to all your DEBUG() calls.

I think the DEBUG() automagically does this. At least I saw that the
output contained JACK already, so I didn't bother to add it.

> > /* setup PCM handle or shutdown */
> > if (!error)
> >   {
> >     bse_device_set_opened (device, dname, handle->readable,
> >     handle->writable);
> >     if (handle->readable)
> >       handle->read = jack_device_read;
> >     if (handle->writable)
> >       handle->write = jack_device_write;
> >     handle->check_io = jack_device_check_io;
> >     handle->latency = jack_device_latency;
> >     BSE_PCM_DEVICE (device)->handle = handle;
> >
> >     /* will prevent dropouts at initialization, when no data is there at
> >     all */
>
> why would this "prevent dropouts"? the jack thread got started already,
> so a dropout could possibly have occoured _already_.

No the jack callback is still operating in the "inactive" state, so it
will not touch our data yet. Yes. It will write zeros. And no. Normally
you will not hear this.

The point here is to fill the output ringbuffer fully with zeros (the
alsa driver does the same), so that beast has some time to produce some
audio that will be played. Otherwise it could be that there are dropouts
at song start because jack is consuming the data about as fast at beast
can produce it initially.

> granted, retriggering may be neccessary, but the comment doesn't make
> sense to me here.
>
> >     jack_device_retrigger (jack);
> >     jack_device_latency (handle);   // debugging only: print latency
> >     values
> >   }
> > else
> >   {
> >     terminate_and_free_jack (jack);
> >   }
> > DEBUG ("JACK: opening PCM \"%s\" readupble=%d writable=%d: %s", dname,
> > require_readable, require_writable, bse_error_blurb (error));
> >
> > return error;
> >}
> >
> >static void
> >bse_pcm_device_jack_close (BseDevice *device)
> >{
> > JackPcmHandle *jack = (JackPcmHandle*) BSE_PCM_DEVICE (device)->handle;
> > BSE_PCM_DEVICE (device)->handle = NULL;
> >
> > terminate_and_free_jack (jack);
> >}
> >
> >static void
> >bse_pcm_device_jack_finalize (GObject *object)
> >{
> > BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (object);
> > if (self->jack_client)
> >   {
> >     jack_client_close (self->jack_client);
> >     self->jack_client = NULL;
> >   }
> >
> > /* chain parent class' handler */
> > G_OBJECT_CLASS (parent_class)->finalize (object);
> >}
> >
> >static void
> >jack_device_retrigger (JackPcmHandle *jack)
> >{
> > g_return_if_fail (jack->jack_client != NULL);
> >
> > /*
>
> extraneous newline at coment start.
>
> >  * we need to reset the active flag to false here, as we modify the
> >  * buffers in a non threadsafe way; this is why we also wait for
> >  * the condition, to ensure that the jack callback really isn't
> >  * active any more
> >  */
> > Atomic::int_set (&jack->callback_state, CALLBACK_STATE_INACTIVE);
> >
> > /* usually should not timeout, but be notified by the jack callback */
> > jack->cond_mutex.lock();
> > if (!jack->is_down)
> >   jack->cond.wait_timed (jack->cond_mutex, 100000);
> > jack->cond_mutex.unlock();
>
> hm, is_down is set by the jack thread onkly volountarily, i.e. if
> it exits prematurely, you get the stale BSE thread i mntioned above.

Nope. Its set by the shutdown handler.

>
> >
> > /* jack_ringbuffer_reset is not threadsafe! */
>
> yeah, i didn't mention this in the ringbuffer review, but
> i'd agree that "reset" would be a better name than "clear"
> for the resetting of the pointers.

Although the comment refers back to the old days, where I used jacks
ringbuffer. Thus it also is called jack_ringbuffer_reset, and not
FrameRingBuffer<float>::clear() in the comment.

> >static gsize
> >jack_device_read (BsePcmHandle *handle,
> >                 gfloat       *values)
> >{
> [...]
>
> >
> >  if (jack->is_down)
> >    {
> >      /*
>
> extraneous newline at comment start.
>
> >       * FIXME: we need a way to indicate an error here; beast
> >       should provide
> >       * an adequate reaction in case the JACK server is down (it
> >       should stop
> >       * playing the file, and show a dialog, if JACK can not be
> >       reconnected)
>
> we don't need error indication here (see alsa_device_read()). it's
> good enough if we return gracefully here.
> error checking code (and returns) can be executed by check_io()
> and/or retrigger().

Yes, we do. If JACK dies, I want to know as user. Otherwise I'll just
see that nothing happens. Why? And why is beast continuing to play if
still nothing happens. Why should it...?

> >       *
> >       * if we have a way to abort processing, this if can be moved
> >       above
> >       * the condition wait; however, right now moving it there
> >       means that
> >       * beast will render the output as fast as possible when jack
> >       dies, and
> >       * this will look like a machine lockup
>
> no, endless loops don't look like lockups ;)
> in the first case, your CPU meter runs at 100% and toggling Num-lock/etc.
> keys still work (usually also the X server etc.) while in the latter case
> *nothing* goes anymore.

But it isn't a situation I would want to throw the user into either.

> >       */
> >      Block::fill (frames_left * handle->n_channels, values, 0.0);  
> >      /* <- remove me once we can indicate errors */
>
> nope should stay (the same way alsa_device_read() copeswith errors).

Its uncommon for a hardware device to go away while the application
runs. Its unfortunately common for the jack server to go away. That
might not be sloppy programming. The jack server might as well go away
intentionally because we missed a deadline. For instance because the
user was running jack and beast without rt prio enabled but had a
semi-low latency. Just stopping in such cases is rather uninformative,
and people may think its a beast bug or something.

We should be able to present a reason why this happened to the user
inside a nice dialog and then stop trying to play audio. Its pointless
after jackd terminated our connection or terminated itself or crashed or
something.

> > const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70
> > chars in width */
> >                                      _("JACK Audio Connection Kit PCM
> >                                      driver (http://jackaudio.org)\n"));
>
> jack version printing? again, see what the alsa driver does here.

There is no jack version I can ask for in the API, so I can't print one.
I could print the version of libjack I was compiled against (i.e.
something that pkg-config produced or so), but the server version might
be different, as long as the protocol version of the server and the
protocol version of libjack match.

> the single most important issue that has to be fixed here is
> the mutex and condition. neither the BSE nor the JACK thread
> can afford acquiring it and wait for the other thread.
> that's why we need a ringbuffer based only on atomic ops. we
> got that now, so the next step is to modify the code to use
> it up to its fullest potential.
>
> basically, to get there, the jack thread needs to be modified
> to act more similarl to e.g. the kernel ALSA driver wrg to
> xruns. the kernel driver also has fixed-size ring buffers and
> still deals with drop outs perfectly well. that kind of API
> semantic has to be provided for the BSE thread in the bse-alsa
> driver. then, the implementations of check_io(), retrigger(),
> read() and write() can be much closer to the bse-alsa driver
> implementation. also, this will eliminate the need for the
> mutex and the condition.

I think I don't know enough yet to give a qualified answer on what to
do: especially I want to know, if

 * write to pipe
 * pthread_condition_signal
 * (maybe semaphores)
 * (maybe something else I forgot)

are equally capable in their realtimeness properties, and suitable for
notifying the bse master thread from the jack realtime thread, and so
which is the one to choose to let the bse thread know when to compute
date.

Timeouts are a bad solution because they either occur too soon (in which
case the engine thread may wake up, but can't do the computation), or
too late, because there is always jitter. In any case you loose precious
time or waste precious CPU cycles or both.

I know that for practical purposes, there is no other solution the bse
framework is capable than write-to-pipe, but I have seen in kernel code
that the implentation is not lock-free. Can this lead to priority
inversion? Can conditions do the same? What is safe and what is not?
Right now, I don't know.

   Cu... Stefan
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Tim Janik
On Mon, 5 Jun 2006, Stefan Westerfeld wrote:

>   Hi!
>
> Same comment as for the other reply: I'll post fixed code later, so I'll
> just answer to discussion stuff.
>
> On Fri, Jun 02, 2006 at 01:45:01AM +0200, Tim Janik wrote:
>>> enum
>>> {
>>> CALLBACK_STATE_INACTIVE = 0,
>>> CALLBACK_STATE_ACTIVE = 1,
>>> CALLBACK_STATE_PLEASE_TERMINATE = 2,
>>> CALLBACK_STATE_TERMINATED = 3
>>
>> minor cosmetic, we usually align '='-symbols in enums.
>>
>>> };
>>
>> you should provide a type name for this enum, which
>> can be used above to tell apart valid states of your
>> callback_state variable.
>
> The problem is that callback_state is an atomic integer, so I think it
> needs to be volatile int, and I can't declare it of the enum type; or am
> I wrong?

nope, that's a good point, C++ doesn't guarantee that enums are stored
with int size, so you should indeed use a volatile int there to use the
atomic functions. however, the decl line should then also have a comment,
outlining the real type.

>>> vector<jack_port_t *>  input_ports;
>>> vector<jack_port_t *>  output_ports;
>>>
>>> guint  buffer_frames; /* input/output
>>> ringbuffer size in frames */
>>>
>>> Cond  cond;
>>> Mutex  cond_mutex;
>>
>> erk, what exactly is protected by this cond and this mutex?
>> something like this definitely needs to be outlined in code
>> (in comments or by naming prefixes, look for BirnetMutex
>> in bseengineutils.c to see examples for that)
>
> Well, I'll put a comment there, but if you want to know what it does
> right now: the condition is signalled whenever the realtime callback has
> been called and is thus be used to wait for events that depend on the
> realtime callback (like waiting for termination or new data).
>
> The mutex is mostly useless - its there because conditions require it
> (you can't wait on a condition without one), but all code needs to work
> without the mutex, because the realtime thread can not use lock() (only
> trylock()) due to possible priority inversion.

ok. note that beast doesn't 100% perfectly prevent priority inversion in
other scenarios either. this issue is slowly being worked on (the idea
is to move all communication to atomic ops guarded by BirnetGuard, but
that's an entire new thread ;)
until then, the jack RT thread should use a pipe write() to notify the
engine and sequencer threads, which both create their own pipe fds that
they also poll() on.

>>> jack_status_t status;
>>>
>>> self->jack_client = jack_client_open ("beast", JackNoStartServer,
>>> &status); /* FIXME: translations(?) */
>>
>> what's this "beast" being used for? (especially since you
>> quesiton translation)
>> e.g. if it's similar to window titles, it should be "BEAST",
>> if it's going to be used as a program name in continueous text,
>> it should prolly be _("Beast") and if its meant as a technical
>> identifier (e.g. for serialization), it should be "beast".
>>
>> maybe Paul can shed light on this?
>
> Its the name of the client, which means that for the command line utils
> for instance you'll use
>
> beast:in_1
>
> as port name. So this suggests that you may not want to translate it.
> However, for qjackctl, its also the label that is displayed in the GUI,
> in the widget where the clients (and ports) are listed.

ok, it has to be untranslated "beast" then.


>>> DEBUG ("attaching to JACK server returned status: %d\n", status);
>>> /* FIXME: check status for proper error reporting
>>
>> doesn't teh jack API have something like:
>>   const char* jack_status_to_string (jack_status_t status);
>> ?
>
> No.

fine, then we need to roll our own function that does this.
for existing example code that stringifies foreign errors,
you can take a look at gslvorbis-cutter.c:ov_error_blurb().

it might be agood idea to submit this to jack once done btw.

>>> static map<string, DeviceDetails>
>>> query_jack_devices (BsePcmDeviceJACK *self)
>>> {
>>> map<string, DeviceDetails> devices;
>>>
>>> /* FIXME: we need to make a difference here between
>>>  *  - jackd is running, but no output ports can be found
>>
>> curious, why exactly could that be the case?
>
> I am not sure it can happen, as you need to load some driver to be able
> to start jack at all, and the driver will probably always register some
> terminal ports. However, in theory it could be possible to start jack
> only to connect two applications; for instance one that produces audio
> (beast) and another one that sends it over the net somewhere. Then,
> which application you start first is arbitary, and it might as well be
> that beast gets started first.
>
> But... well... I don't know if jack is ever going to support starting
> without a driver (it would need to do something to get the timing,
> sample rate and so on setup; normally probably the driver does this).

ok, then the querying logic should simply treat a running jackd without
output ports as if no jackd was running.

>>>  *  - jackd is not running
>>
>> that should/will actually be determined by jack_client_open(), right?
>
> Right. Then self->jack_client will be NULL. Or not, if jackd was once
> running but crashed later...

that should be taken care of by the cleanup handler then (afaiu).

>>>    }
>>> }
>>>     free (jack_ports);
>>>   }
>>>
>>> return devices;
>>
>> ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy?
>
> Yes. So? Its not in any time critical code, so I don't bother to
> optimize it.

well, it's not exactly "hard to optimize" here. just return a pointer.

>>> static SfiRing*
>>> bse_pcm_device_jack_list_devices (BseDevice *device)
>>> {
>>> BsePcmDeviceJACK *self = BSE_PCM_DEVICE_JACK (device);
>>> map<string, DeviceDetails> devices = query_jack_devices (self);
>>>
>>> SfiRing *ring = NULL;
>>> for (map<string, DeviceDetails>::iterator di = devices.begin(); di !=
>>> devices.end(); di++)
>>>   {
>>>     const string &device_name = di->first;
>>>     DeviceDetails &details = di->second;
>>>     BseDeviceEntry *entry;
>>>     entry = bse_device_group_entry_new (device, g_strdup
>>>     (device_name.c_str()),
>>>                                                 g_strdup ("JACK Devices"),
>>>                                                 g_strdup_printf ("%s
>>>                                                 (%d*input %d*output)%s",
>>>   device_name.c_str(),
>>>   details.input_ports,
>>>   details.output_ports,
>>>   details.physical_ports == details.ports ?
>>>     " [
>>>     hardware ]" : ""));
>>
>> hm, can you please elaborate the
>>   details.physical_ports == details.ports ?  " [ hardware ]" : ""
>> condition for me? ;)
>
> Well... if all ports of a client are physical, then its got to be a
> hardware device. For instance alsa_pcm has only physical ports here.
>
> I am not sure what a client is that has *some* physical ports... I've
> yet to see one.

a combined virtual alsa device? (combing a virtual and a hw device)

>>>      * normal operation:
>>>      *
>>>      * [bse thread]      writes some data to output_ringbuffer
>>>      * [bse thread]      checks remaining space, there is no room left
>>>      * [bse thread]      sleeps on the condition
>>
>> BSE never sleeps, waiting for audio drivers. take a look at
>> alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c.
>> i.e. allow full buffer writes if at all possible and provide
>> timeout information on how long it takes for enough buffer space
>> to be available.
>> that's the kind of semantic that needs to be implemented.
>
> Yes, it does block in the ALSA driver. However, I think I'll come back
> to it in a seperate mail, since you said it shouldn't or maybe should,
> so I'll investigate.

the rationale provided here:
   http://mail.gnome.org/archives/beast/2006-June/msg00005.html
for why writes shouldn't block applies also to the bse-alsa driver.
the fact that bse-alsa could indeed block in its write function
since we set the device to nonoblock=0 is merely the guarding
i'm talking about at the end of the above email.

>>>      * [bse thread]      writes some data to output_ringbuffer
>>>      * [bse thread]      checks remaining space, there is no room left
>>>      * [jack thread]     reads some data from output_ringbuffer
>>>      * [jack thread]     signals the condition
>>>      * [bse thread]      sleeps on the condition
>>
>> such races are only possible when not using conditions properly.
>> you always have to follow the sequence: lock(); check(); wait(); unlock();
>> and on the pther side: lock(); update(); signal(); unlock();
>> this ensures that no update/signal goes by unnoticed by the check. see:
>>   http://developer.gnome.org/doc/API/2.0/glib/glib-Threads.html#GCond
>
> Cool. Just that I can't lock() in the RT thread, because of priority
> inversion.
>
> I've read a little kernel code today, and fs/pipe.c has a
> PIPE_MUTEX(...) which is being locked. So I am not sure I can use
> write() to wakeup the master thread either.

well, forget about priority inversion for the moment. it's very hard
to trigger anyway. a much stronger reason to use a pipe is that the
engine and synthesizer threads use poll() and not cond_wait.

> However pthread_cond_signal in glibc also seems to have a kind of a
> mutex (so its also a problem), although I am not sure if I interpret the
> code correctly. But if I do, then I am running out of ideas how to get
> this right at all.
>
> Well, I think I have to think/read/ask others more about it until I know
> whats the right thing to do, to get the synchronization between the
> engine thread and the jack realtime callback right.

just use a pipe wirte() and start to concentrate on other stuff ;)

>>>      */
>>>     if (have_cond_lock)
>>> jack->cond_mutex.unlock();
>>>   }
>>> else
>>>   {
>>
>> when exactly is this branch tirggered? (lost track due to the excessive
>> inlined commenting above.)
>
> There are different activation states. Thats the case where the callback
> has marked to be inactive (shouldn't touch ringbuffers and such), so it
> will just write out zero blocks.

this (and similar things) really need to go into a state machine description
comment at the start of the source file.

>>>     port = jack_port_register (self->jack_client, port_name,
>>>     JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);
>>
>> what is JACK_DEFAULT_AUDIO_TYPE? shouldn't we request floats here (i
>> dunno the jack API though).
>
> JACK_DEFAULT_AUDIO_TYPE is float. There is no other audio type at all,
> i.e. there is no negotaition or anything.

odd, should have a comment like:
   /* request the _only_ audio type jack has, which is float */
so rationale is preserved for future maintenance.

>>> /* connect ports */
>>>
>>> map<string, DeviceDetails> devices = query_jack_devices (self);
>>
>> querying the devices could have been done before registering
>> the ports above.
>
> So what? I can only connect the ports anyway.

"So what?" ? there's no point in connectiong ports if the
devices can't be found.

>>> map<string, DeviceDetails>::const_iterator di;
>>>
>>> di = devices.find (dname);
>>> if (di != devices.end())
>>>   {
>>>     const DeviceDetails &details = di->second;
>>>
>>>     for (guint ch = 0; ch < handle->n_channels; ch++)
>>> {
>>>  if (details.output_ports > ch)
>>>    jack_connect (self->jack_client,
>>>    details.output_port_names[ch].c_str(), jack_port_name
>>>    (jack->input_ports[ch]));
>>>  if (details.input_ports > ch)
>>>    jack_connect (self->jack_client, jack_port_name
>>>    (jack->output_ports[ch]), details.input_port_names[ch].c_str());
>>> }
>>>   }
>>
>> you're not setting error in case you failed to find
>> the required devices to connect the channels.
>
> I don't want to. If you for instance start beast with
>
> -pjack=jamin
>
> to use jamin for mastering, and later the user chooses to terminate
> jamin, then we can still run. Its just that then the ports will be
> initially unconnected, which is not a problem, because it can be easily
> fixed in qjackctl.

sorry, would you care to enlighten me here please?
what's -pjack=jamin supposed to mean?
should the beast output ports be auto-connected to input ports named "jamin"?
and, if input ports named "jamin" are not present, that'll silently be
ignored?
will the jack callback thread still properly run and request BSE engine data
if the beast output port is unconnected?

if that's the case, i think at least the behaviour of silently ignoring
unsuccessfull connections has to be changed. we also throw proper errors
in other drivers if unconectable devices are being adressed.
(if unsuccessful connections really are common usage with jack, we can
still add device flags, like we do with OSS, so you could say
-pjack=jamin,auto for automated connections)

>>> min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length;
>>
>> couldn't you instead just align min_buffer_frames to 2 * bse buffer size,
>> to still fullfil the requirements you outlined, and yet yield a smaller
>> ring buffer (thus optimize latency)?
>
> Don't know. Trouble comes when we miss the - racy - condition. That
> produces one period of extra latency if things go badly wrong. Thus a
> stateful synchronization which can't miss something (such as a pipe())
> would be superior.

sorry? what do you mean by "stateful synchronization"?

> Then I suppose we could do as you suggested.

fine.

>>> // honor the user defined latency specification
>>> //
>>> // FIXME: should we try to tweak local buffering so that it corresponds
>>> // to the user defined latency, or global buffering (including the jack
>>> // buffer)? we do the former, since it is easier
>>
>> we should do the latter. that shouldn't be too hard, Paul said that you
>> could query (and inform) jack about the latency on a port.
>> that then just needs to be subtracted from BSE_PCM_DEVICE
>> (device)->req_latency_ms.
>
> It can change. If the beast output is jamin, that'd be adding latency.
> If the user reconnects jamin to a convolution which adds latency, then
> the total output latency query would change while we're running. Thats
> why I am a bit reluctant to do it, as its results are not well-defined.

i see. at the very least the latency tooltip has to be adapted then
to reflect this.

> It could even happen that we are not connected initially (latency = 0),
> and only get connected afterwards by a patchbay, where the user is
> loading a session. Thus in such a case we would always misbehave.

what's that about being non-connected.
does jack still request audio from beast when unconnected?
and, you said jack clients would always auto connect to alsa_pcm
and that alsa_pcm should always be present. so why would we start
up disconnected?

>>> guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms *
>>> handle->mix_freq / 1000;
>>> guint buffer_frames = max (min_buffer_frames, user_buffer_frames);
>>>
>>> jack->input_ringbuffer.resize (buffer_frames, handle->n_channels);
>>> jack->output_ringbuffer.resize (buffer_frames, handle->n_channels);
>>
>> the way this is written, you're doubling the latency because you apply
>> it to the input and the output buffer. you should just apply it to the
>> output buffer (and also subtract the input buffer latency before doing
>> that)
>
> The input buffer should not produce latency. Ideally, the filled frames
> of the input buffer and the filled frames of the output buffer should
> add up to exactly buffer_frames. Note that the input_buffer is *empty*
> initially while the output_buffer is *full* initially, and the jack
> callback methodically only adds as many frames to the input_buffer as it
> takes from the output_buffer, whereas the engine process cycle does the
> inverse.

that means input latency increases with every output underrun, because
in that case you're faking extra output values. basically, in duplex
scenrios under latency constraints you have to properly resync input
and output and can't just pad output values.

>>> jack->buffer_frames  = jack->output_ringbuffer.get_writable_frames();
>>
>> what is jack->buffer_frames good for? and why is that different from
>> total_n_frames() here?
>
> Its not different. Don't know. Maybe I'll get rid of it in the new
> version.

"Don't know."? ;)
ok, if you need help here, take my word that if it's useless you should
get rid of it ;)


>>>       * FIXME: we need a way to indicate an error here; beast
>>>       should provide
>>>       * an adequate reaction in case the JACK server is down (it
>>>       should stop
>>>       * playing the file, and show a dialog, if JACK can not be
>>>       reconnected)
>>
>> we don't need error indication here (see alsa_device_read()). it's
>> good enough if we return gracefully here.
>> error checking code (and returns) can be executed by check_io()
>> and/or retrigger().
>
> Yes, we do. If JACK dies, I want to know as user. Otherwise I'll just
> see that nothing happens. Why? And why is beast continuing to play if
> still nothing happens. Why should it...?

i already said that. but i can say it again if that helps you.
you should gracefully return and ignore the error in write().
you can check for error conditions in check_io() and there we
can decide and undertake appropriate measures to deal with the
error. e.g. by resyncing the stream (that's at least interesting
for ALSA/OSS or by notifying the user, etc.)

>>>       * if we have a way to abort processing, this if can be moved
>>>       above
>>>       * the condition wait; however, right now moving it there
>>>       means that
>>>       * beast will render the output as fast as possible when jack
>>>       dies, and
>>>       * this will look like a machine lockup
>>
>> no, endless loops don't look like lockups ;)
>> in the first case, your CPU meter runs at 100% and toggling Num-lock/etc.
>> keys still work (usually also the X server etc.) while in the latter case
>> *nothing* goes anymore.
>
> But it isn't a situation I would want to throw the user into either.

right. simply fix check_io() then, beast won't busily write audio data
if check_io() doesn't tell it so.

>>>      Block::fill (frames_left * handle->n_channels, values, 0.0);
>>>      /* <- remove me once we can indicate errors */
>>
>> nope should stay (the same way alsa_device_read() copeswith errors).
>
> Its uncommon for a hardware device to go away while the application
> runs. Its unfortunately common for the jack server to go away. That
> might not be sloppy programming. The jack server might as well go away
> intentionally because we missed a deadline. For instance because the
> user was running jack and beast without rt prio enabled but had a
> semi-low latency. Just stopping in such cases is rather uninformative,
> and people may think its a beast bug or something.
>
> We should be able to present a reason why this happened to the user
> inside a nice dialog and then stop trying to play audio. Its pointless
> after jackd terminated our connection or terminated itself or crashed or
> something.

ok, then put an informative g_printerr() containing all the relevant
error scenario information into check_io() with a FIXME next to it.
we can then start discussing ways to treat processing errors in the engine
and possible recovery from such scenrios. (basically processing errors
are currently not supported by the engine design)

>>> const gchar *info = g_intern_printf (/* TRANSLATORS: keep this text to 70
>>> chars in width */
>>>                                      _("JACK Audio Connection Kit PCM
>>>                                      driver (http://jackaudio.org)\n"));
>>
>> jack version printing? again, see what the alsa driver does here.
>
> There is no jack version I can ask for in the API, so I can't print one.
> I could print the version of libjack I was compiled against (i.e.
> something that pkg-config produced or so), but the server version might
> be different, as long as the protocol version of the server and the
> protocol version of libjack match.

ok, if jack doesn't provide us more, we'll use the compiled-against
version then. after all, the runtime libjack should be compatible with
that (ELF ensures that) and also compatible with the running jackd
(libjack checks that).

>> the single most important issue that has to be fixed here is
>> the mutex and condition. neither the BSE nor the JACK thread
>> can afford acquiring it and wait for the other thread.
>> that's why we need a ringbuffer based only on atomic ops. we
>> got that now, so the next step is to modify the code to use
>> it up to its fullest potential.
>>
>> basically, to get there, the jack thread needs to be modified
>> to act more similarl to e.g. the kernel ALSA driver wrg to
>> xruns. the kernel driver also has fixed-size ring buffers and
>> still deals with drop outs perfectly well. that kind of API
>> semantic has to be provided for the BSE thread in the bse-alsa
>> driver. then, the implementations of check_io(), retrigger(),
>> read() and write() can be much closer to the bse-alsa driver
>> implementation. also, this will eliminate the need for the
>> mutex and the condition.
>
> I think I don't know enough yet to give a qualified answer on what to
> do: especially I want to know, if
>
> * write to pipe
> * pthread_condition_signal
> * (maybe semaphores)
> * (maybe something else I forgot)
>
> are equally capable in their realtimeness properties, and suitable for
> notifying the bse master thread from the jack realtime thread, and so
> which is the one to choose to let the bse thread know when to compute
> date.

well, let me simply tell you that the only viable way is to use pipe.
if not for any of the various reasons you've already heared from me on the
topic, just for the simple fact that the BSE engine thread and the
BSE sequencer thread *already* support them.
that is not the case for any of the "alternatives" you list.

> Timeouts are a bad solution because they either occur too soon (in which
> case the engine thread may wake up, but can't do the computation), or
> too late, because there is always jitter. In any case you loose precious
> time or waste precious CPU cycles or both.

note that the BSE audio driver model *and* the BSE engine model are based
on timeouts. that's what check_io() provides and what the engine needs.
you won't get any other programming model with BSE, regardless of whether
you like it or not ;)

> I know that for practical purposes, there is no other solution the bse
> framework is capable than write-to-pipe, but I have seen in kernel code
> that the implentation is not lock-free. Can this lead to priority
> inversion?

you can ask on lkml to figure if/what liunx prevents priority inversion
for pipe access.

> Can conditions do the same?

yeah, signaling via conditions also involves locks/syncronization
*somewhere* in the path between two threads.

> What is safe and what is not?
> Right now, I don't know.

simply stop worrying, like i recommended above ;)

>   Cu... Stefan

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Stefan Westerfeld
   Hi!

On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote:

> >>>vector<jack_port_t *>  input_ports;
> >>>vector<jack_port_t *>  output_ports;
> >>>
> >>>guint  buffer_frames; /* input/output
> >>>ringbuffer size in frames */
> >>>
> >>>Cond  cond;
> >>>Mutex  cond_mutex;
> >>
> >>erk, what exactly is protected by this cond and this mutex?
> >>something like this definitely needs to be outlined in code
> >>(in comments or by naming prefixes, look for BirnetMutex
> >>in bseengineutils.c to see examples for that)
> >
> >Well, I'll put a comment there, but if you want to know what it does
> >right now: the condition is signalled whenever the realtime callback has
> >been called and is thus be used to wait for events that depend on the
> >realtime callback (like waiting for termination or new data).
> >
> >The mutex is mostly useless - its there because conditions require it
> >(you can't wait on a condition without one), but all code needs to work
> >without the mutex, because the realtime thread can not use lock() (only
> >trylock()) due to possible priority inversion.
>
> ok. note that beast doesn't 100% perfectly prevent priority inversion in
> other scenarios either. this issue is slowly being worked on (the idea
> is to move all communication to atomic ops guarded by BirnetGuard, but
> that's an entire new thread ;)

However, these are two kinds of separate issues:
(1) priority inversion occuring within the jack realtime thread
(2) priority inversion occuring elsewhere in beast

When (1) is being triggered, the jackd server can get dropouts while
writing audio, and might decide that we are the client that is badly
implemented. So it might shutdown the connection we have, leaving the
other clients running.

When (2) is being triggered, we miss the opportunity to refill the
ringbuffer in time, so there will be a click in the output, but at least
the audio processing still continues; but we also may have still enough
data buffered in the rinbuffer, and nothing will happen; that will also
depend on how the user configured the beast latency.

So I am trying harder to avoid (1) than (2). If beast would not be
likely to produce more underruns in (2) than in (1), we might as well
run the beast processing within the jack RT thread. But we don't and
thats good as it is.

I still agree with your concluding remarks, though, which are:

> until then, the jack RT thread should use a pipe write() to notify the
> engine and sequencer threads, which both create their own pipe fds that
> they also poll() on.

After thinking some more about it, and also reading Paul's oppinion:

  http://permalink.gmane.org/gmane.linux.audio.devel/11725

I think that pipe write() is probably ok for a solution running under
linux. Its simply that linux is no realtime operating system, and so
whatever we do in our case, we can get no guarantee that it will work in
100% of the cases. So if this was an realtime airplane control software,
we should probably use some other operating systen.

However, for an audio software its probably enough if it works reliable
enough for daily music production, and it pipe writes should deliver
this degree of reliability.

> >>>DEBUG ("attaching to JACK server returned status: %d\n", status);
> >>>/* FIXME: check status for proper error reporting
> >>
> >>doesn't teh jack API have something like:
> >>  const char* jack_status_to_string (jack_status_t status);
> >>?
> >
> >No.
>
> fine, then we need to roll our own function that does this.
> for existing example code that stringifies foreign errors,
> you can take a look at gslvorbis-cutter.c:ov_error_blurb().
>
> it might be agood idea to submit this to jack once done btw.

Its not that writing our own function will help us a lot though. Its not
that every function returns a jack_status_t. Just the open function.

The usual way for getting stringified errors in other situations is to
set the error callback with jack_set_error_function.

> >>>    }
> >>> }
> >>>    free (jack_ports);
> >>>  }
> >>>
> >>>return devices;
> >>
> >>ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy?
> >
> >Yes. So? Its not in any time critical code, so I don't bother to
> >optimize it.
>
> well, it's not exactly "hard to optimize" here. just return a pointer.

I am trying to avoid manual new/delete in all code I write, because its
a potential source of errors. So returning a pointer is not so good. It
also makes map accesses somewhat strange to read, i.e.

  (*devices)["foo"] = bar;

instead of

  devices["foo"] = bar;

Under some circumstances, the C++ compiler will also construct objects
that are returned from a function within the memory of the caller, so
that no copy will done at all. But I don't know if the device map case
is one of these cases, or not.

So if you insist that it must be "optimized", I would suggest changing
the API to

static void
query_jack_devices (BsePcmDeviceJack           *self,
                    map<string, DeviceDetails> &devices)

then I can still keep my automatic deletion feature and nice to read
map access code.

> >>>     * normal operation:
> >>>     *
> >>>     * [bse thread]      writes some data to output_ringbuffer
> >>>     * [bse thread]      checks remaining space, there is no room left
> >>>     * [bse thread]      sleeps on the condition
> >>
> >>BSE never sleeps, waiting for audio drivers. take a look at
> >>alsa_device_write() and alsa_device_check_io() in bsepcmdevice-alsa.c.
> >>i.e. allow full buffer writes if at all possible and provide
> >>timeout information on how long it takes for enough buffer space
> >>to be available.
> >>that's the kind of semantic that needs to be implemented.
> >
> >Yes, it does block in the ALSA driver. However, I think I'll come back
> >to it in a seperate mail, since you said it shouldn't or maybe should,
> >so I'll investigate.
>
> the rationale provided here:
>   http://mail.gnome.org/archives/beast/2006-June/msg00005.html
> for why writes shouldn't block applies also to the bse-alsa driver.
> the fact that bse-alsa could indeed block in its write function
> since we set the device to nonoblock=0 is merely the guarding
> i'm talking about at the end of the above email.

Ok, investigating the question pointed out to in the mail you linked
there is still on my todo.

> >Cool. Just that I can't lock() in the RT thread, because of priority
> >inversion.
> >
> >I've read a little kernel code today, and fs/pipe.c has a
> >PIPE_MUTEX(...) which is being locked. So I am not sure I can use
> >write() to wakeup the master thread either.
>
> well, forget about priority inversion for the moment. it's very hard
> to trigger anyway. a much stronger reason to use a pipe is that the
> engine and synthesizer threads use poll() and not cond_wait.

Yes, I changed my mind. I will use write().

> >>>/* connect ports */
> >>>
> >>>map<string, DeviceDetails> devices = query_jack_devices (self);
> >>
> >>querying the devices could have been done before registering
> >>the ports above.
> >
> >So what? I can only connect the ports anyway.
>
> "So what?" ? there's no point in connectiong ports if the
> devices can't be found.

I will not "connect" the points if the device can not be found, since
then the (di != devices.end()) condition (4 lines below from here) will
be false. However, I still need to register the beast output ports, so
that beast can write output, although then it will be unconnected when
writing output, which is perfectly valid in the jack world. It happens
all the time. The user can just connect it afterwards using a patchbay.

> >>>map<string, DeviceDetails>::const_iterator di;
> >>>
> >>>di = devices.find (dname);
> >>>if (di != devices.end())
> >>>  {
> >>>    const DeviceDetails &details = di->second;
> >>>
> >>>    for (guint ch = 0; ch < handle->n_channels; ch++)
> >>> {
> >>>  if (details.output_ports > ch)
> >>>    jack_connect (self->jack_client,
> >>>    details.output_port_names[ch].c_str(), jack_port_name
> >>>    (jack->input_ports[ch]));
> >>>  if (details.input_ports > ch)
> >>>    jack_connect (self->jack_client, jack_port_name
> >>>    (jack->output_ports[ch]), details.input_port_names[ch].c_str());
> >>> }
> >>>  }
> >>
> >>you're not setting error in case you failed to find
> >>the required devices to connect the channels.
> >
> >I don't want to. If you for instance start beast with
> >
> >-pjack=jamin
> >
> >to use jamin for mastering, and later the user chooses to terminate
> >jamin, then we can still run. Its just that then the ports will be
> >initially unconnected, which is not a problem, because it can be easily
> >fixed in qjackctl.
>
> sorry, would you care to enlighten me here please?
> what's -pjack=jamin supposed to mean?
> should the beast output ports be auto-connected to input ports named
> "jamin"?
Yes.

> and, if input ports named "jamin" are not present, that'll silently be
> ignored?
Yes.

> will the jack callback thread still properly run and request BSE engine data
> if the beast output port is unconnected?
Yes.

> if that's the case, i think at least the behaviour of silently ignoring
> unsuccessfull connections has to be changed. we also throw proper errors
> in other drivers if unconectable devices are being adressed.
> (if unsuccessful connections really are common usage with jack, we can
> still add device flags, like we do with OSS, so you could say
> -pjack=jamin,auto for automated connections)

I don't know from your example what you envision the syntax to be like.
There are four good cases as I see the situation.

(1) request beast to autoconnect, and request beast to find a suitable
    device to autoconnect to without asking

-> this is what -pjack does right now, and I think it should be kept
   this way

(2) request beast to connect to a specific client if possible, but
    continue if that doesn't work (we could have some [X] report errors
    on jack connection problems message, then the user could disable
    the error, or we could make this configurable)

-> this is what -pjack=jamin does right now

(3) request beast to connect to a specific client, and let it fail and
    report an error if it doesn't work

(4) request beast to start without connecting a client

-> this is what -pjack=unconnected does right now, as long as there is
   no jack client called "unconnected", in which case this behaves like
   (2)

What I wasn't too sure about when designing the syntax is whether the
position of the argument should play a role. Suppose we also decided
some day to add ro (readonly) and wo (writeonly) as options. Should then

-p jack=auto,ro,alsa_pcm

be the same like

-p jack=alsa_pcm,ro,auto

This rules out the possibility of connecting clients called "auto" or
"ro", which is maybe a bad decision. So maybe we should start with
something like

-p jack=connect:alsa_pcm                          
-p jack=connect:alsa_pcm,wo

or so? I am really not sure what would be a good and clean syntax here,
that handles all cases that are needed.

> >>>min_buffer_frames += BSE_PCM_DEVICE (device)->req_block_length;
> >>
> >>couldn't you instead just align min_buffer_frames to 2 * bse buffer size,
> >>to still fullfil the requirements you outlined, and yet yield a smaller
> >>ring buffer (thus optimize latency)?
> >
> >Don't know. Trouble comes when we miss the - racy - condition. That
> >produces one period of extra latency if things go badly wrong. Thus a
> >stateful synchronization which can't miss something (such as a pipe())
> >would be superior.
>
> sorry? what do you mean by "stateful synchronization"?

By stateful synchronization I mean that writing to a pipe will
definitely wakeup the engine thread later, whereas signalling a
condition will only wakeup the engine thread if it has been sleeping on
the condition. The former is better.

> >It could even happen that we are not connected initially (latency = 0),
> >and only get connected afterwards by a patchbay, where the user is
> >loading a session. Thus in such a case we would always misbehave.
>
> what's that about being non-connected.
> does jack still request audio from beast when unconnected?

Yes.

> and, you said jack clients would always auto connect to alsa_pcm
> and that alsa_pcm should always be present. so why would we start
> up disconnected?

Thats not what I wanted to say. I just pointed out that lots of people
hardcode alsa_pcm somewhere in their code. However, whether a jack
application should startup connected or unconnected, or how defaulting
or auto connecting should work for "normal" jack applications is an open
issue as far as I understand from reading the mailing list. There have
been some threads suggesting this or that (like api extensions, extra
default flags on ports and the like), and some people have been pointing
out that they like this or that behaviour better.

So I think until people have figured out what they want, I'd like to see
beast to support both: connect on open and open unconnected (so that
some external program can do the work).

> >>>guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms *
> >>>handle->mix_freq / 1000;
> >>>guint buffer_frames = max (min_buffer_frames, user_buffer_frames);
> >>>
> >>>jack->input_ringbuffer.resize (buffer_frames, handle->n_channels);
> >>>jack->output_ringbuffer.resize (buffer_frames, handle->n_channels);
> >>
> >>the way this is written, you're doubling the latency because you apply
> >>it to the input and the output buffer. you should just apply it to the
> >>output buffer (and also subtract the input buffer latency before doing
> >>that)
> >
> >The input buffer should not produce latency. Ideally, the filled frames
> >of the input buffer and the filled frames of the output buffer should
> >add up to exactly buffer_frames. Note that the input_buffer is *empty*
> >initially while the output_buffer is *full* initially, and the jack
> >callback methodically only adds as many frames to the input_buffer as it
> >takes from the output_buffer, whereas the engine process cycle does the
> >inverse.
>
> that means input latency increases with every output underrun, because
> in that case you're faking extra output values. basically, in duplex
> scenrios under latency constraints you have to properly resync input
> and output and can't just pad output values.

I don't think it can happen. After an output underrun, the
jack_device_retrigger code clears both ringbuffers (input and output),
and then refills the output buffer with zeros.

> >>>       * FIXME: we need a way to indicate an error here; beast
> >>>       should provide
> >>>       * an adequate reaction in case the JACK server is down (it
> >>>       should stop
> >>>       * playing the file, and show a dialog, if JACK can not be
> >>>       reconnected)
> >>
> >>we don't need error indication here (see alsa_device_read()). it's
> >>good enough if we return gracefully here.
> >>error checking code (and returns) can be executed by check_io()
> >>and/or retrigger().
> >
> >Yes, we do. If JACK dies, I want to know as user. Otherwise I'll just
> >see that nothing happens. Why? And why is beast continuing to play if
> >still nothing happens. Why should it...?
>
> i already said that. but i can say it again if that helps you.
> you should gracefully return and ignore the error in write().
> you can check for error conditions in check_io() and there we
> can decide and undertake appropriate measures to deal with the
> error. e.g. by resyncing the stream (that's at least interesting
> for ALSA/OSS or by notifying the user, etc.)

Yes, right. If I implement check_io properly, one of the problems goes
away.

> >Timeouts are a bad solution because they either occur too soon (in which
> >case the engine thread may wake up, but can't do the computation), or
> >too late, because there is always jitter. In any case you loose precious
> >time or waste precious CPU cycles or both.
>
> note that the BSE audio driver model *and* the BSE engine model are based
> on timeouts. that's what check_io() provides and what the engine needs.
> you won't get any other programming model with BSE, regardless of whether
> you like it or not ;)

Well, pipes are another programming model. I don't think we'll need
timeouts when we can use pipes. But right now I can only make
theoretical claims about it: I will have to work on the code to see
whether the code can be written in a way so that check_io will rely on
the notifications of pipes (and the lockless ringbuffers) only. Then it
will be able to return an arbitary timeout, but the pipe will still
ensure that everything goes well.

   Cu... Stefan
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Tim Janik
On Tue, 6 Jun 2006, Stefan Westerfeld wrote:

> On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote:

>> ok. note that beast doesn't 100% perfectly prevent priority inversion in
>> other scenarios either. this issue is slowly being worked on (the idea
>> is to move all communication to atomic ops guarded by BirnetGuard, but
>> that's an entire new thread ;)
>
> However, these are two kinds of separate issues:
> (1) priority inversion occuring within the jack realtime thread
> (2) priority inversion occuring elsewhere in beast
>
> When (1) is being triggered, the jackd server can get dropouts while
> writing audio, and might decide that we are the client that is badly
> implemented. So it might shutdown the connection we have, leaving the
> other clients running.
>
> When (2) is being triggered, we miss the opportunity to refill the
> ringbuffer in time, so there will be a click in the output, but at least
> the audio processing still continues; but we also may have still enough
> data buffered in the rinbuffer, and nothing will happen; that will also
> depend on how the user configured the beast latency.
>
> So I am trying harder to avoid (1) than (2).

these consideraitons shouldn't be made without taking into account the
probability for priority inversion scenrios which is very close to 0
in practice. so this is an academic concern for the most part.

> If beast would not be
> likely to produce more underruns in (2) than in (1), we might as well
> run the beast processing within the jack RT thread. But we don't and
> thats good as it is.
>
> I still agree with your concluding remarks, though, which are:

great!

>> until then, the jack RT thread should use a pipe write() to notify the
>> engine and sequencer threads, which both create their own pipe fds that
>> they also poll() on.
>
> After thinking some more about it, and also reading Paul's oppinion:
>
>  http://permalink.gmane.org/gmane.linux.audio.devel/11725
>
> I think that pipe write() is probably ok for a solution running under
> linux. Its simply that linux is no realtime operating system, and so
> whatever we do in our case, we can get no guarantee that it will work in
> 100% of the cases.

with the recent low latency in linux kernels, it's starting to become a
very good pseudo realtime operating system:
   http://lac.zkm.de/2006/papers/lac2006_lee_revell.pdf
   (Realtime Audio vs. Linux 2.6  paper by  Lee Revell)

> So if this was an realtime airplane control software,
> we should probably use some other operating systen.

exactly! ;)
i'm so glad you recognize we're _not_ coding realtime airplane control
software here which runs redundantly in 4 self monitoring processors ;)

> However, for an audio software its probably enough if it works reliable
> enough for daily music production, and it pipe writes should deliver
> this degree of reliability.

yes, i'd hope so. if not it's time for filing a kernel bug.

[...]

>   Cu... Stefan

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Tim Janik
In reply to this post by Stefan Westerfeld

i'm taking Paul off the Cc: list, since we're probably just pestering
with most stuff. Paul, FYI this is also all available through gmane:
   http://thread.gmane.org/gmane.comp.gnome.beast/171/focus=178
or of course our list archives:
   http://mail.gnome.org/archives/beast/

On Tue, 6 Jun 2006, Stefan Westerfeld wrote:

> On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote:

>>>>> DEBUG ("attaching to JACK server returned status: %d\n", status);
>>>>> /* FIXME: check status for proper error reporting
>>>>
>>>> doesn't teh jack API have something like:
>>>>  const char* jack_status_to_string (jack_status_t status);
>>>> ?
>>>
>>> No.
>>
>> fine, then we need to roll our own function that does this.
>> for existing example code that stringifies foreign errors,
>> you can take a look at gslvorbis-cutter.c:ov_error_blurb().
>>
>> it might be agood idea to submit this to jack once done btw.
>
> Its not that writing our own function will help us a lot though. Its not
> that every function returns a jack_status_t. Just the open function.

it will help to improve the debugging output. we've done that for all
other drivers, we should also do it here. and it prolly wouldn't hurt
JACK to export status stringification.

> The usual way for getting stringified errors in other situations is to
> set the error callback with jack_set_error_function.

dunno how that's applicable here...

>   Cu... Stefan

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Tim Janik
In reply to this post by Stefan Westerfeld
On Tue, 6 Jun 2006, Stefan Westerfeld wrote:

> On Mon, Jun 05, 2006 at 04:31:23AM +0200, Tim Janik wrote:

>>>>>    free (jack_ports);
>>>>>  }
>>>>>
>>>>> return devices;
>>>>
>>>> ugh, isn't this a full fledged recursive map+strings+DeviceDetails copy?
>>>
>>> Yes. So? Its not in any time critical code, so I don't bother to
>>> optimize it.
>>
>> well, it's not exactly "hard to optimize" here. just return a pointer.
>
> I am trying to avoid manual new/delete in all code I write, because its
> a potential source of errors. So returning a pointer is not so good. It
> also makes map accesses somewhat strange to read, i.e.
>
>  (*devices)["foo"] = bar;
>
> instead of
>
>  devices["foo"] = bar;
>
> Under some circumstances, the C++ compiler will also construct objects
> that are returned from a function within the memory of the caller, so
> that no copy will done at all. But I don't know if the device map case
> is one of these cases, or not.

well, it'd be good to find out about that then ;)
i.e. does std::map do the kind of internal data refcounting that
std::string does?

> So if you insist that it must be "optimized", I would suggest changing
> the API to
>
> static void
> query_jack_devices (BsePcmDeviceJack           *self,
>                    map<string, DeviceDetails> &devices)
>
> then I can still keep my automatic deletion feature and nice to read
> map access code.

sure. it'd still be interesting to know how expensive map copies are though ;)

>>>> you're not setting error in case you failed to find
>>>> the required devices to connect the channels.
>>>
>>> I don't want to. If you for instance start beast with
>>>
>>> -pjack=jamin
>>>
>>> to use jamin for mastering, and later the user chooses to terminate
>>> jamin, then we can still run. Its just that then the ports will be
>>> initially unconnected, which is not a problem, because it can be easily
>>> fixed in qjackctl.
>>
>> sorry, would you care to enlighten me here please?
>> what's -pjack=jamin supposed to mean?
>> should the beast output ports be auto-connected to input ports named
>> "jamin"?
> Yes.
>
>> and, if input ports named "jamin" are not present, that'll silently be
>> ignored?
> Yes.
>
>> will the jack callback thread still properly run and request BSE engine data
>> if the beast output port is unconnected?
> Yes.
>
>> if that's the case, i think at least the behaviour of silently ignoring
>> unsuccessfull connections has to be changed. we also throw proper errors
>> in other drivers if unconectable devices are being adressed.
>> (if unsuccessful connections really are common usage with jack, we can
>> still add device flags, like we do with OSS, so you could say
>> -pjack=jamin,auto for automated connections)
>
> I don't know from your example what you envision the syntax to be like.
> There are four good cases as I see the situation.
>
> (1) request beast to autoconnect, and request beast to find a suitable
>    device to autoconnect to without asking
>
> -> this is what -pjack does right now, and I think it should be kept
>   this way

you mean, the shorthand -p jack is equivalent to -p jack=alsa_pcm ?
i'd say that makes sense.

> (2) request beast to connect to a specific client if possible, but
>    continue if that doesn't work (we could have some [X] report errors
>    on jack connection problems message, then the user could disable
>    the error, or we could make this configurable)
>
> -> this is what -pjack=jamin does right now

i don't think continuation is a good idea here, as i already outlined.
we don't accept failing destinations in the other drivers either.
in general, we don't by default silently ignore warnings. and shouldn't
do that here either. about making the warning/error dialog optional,
that is always a good idea anyway. the point is about having the warning/
error in the first place though, and yes, that should be the case as it
is with other drivers.
however, you said it would make sense in some scenrios to let -p jack=port
effectively behave like -p jack=unconnected in case "port" can't be found.
that's what i suggested the "auto" flag for. i.e. -p jack=port,auto could
either succeed in connecting to "port" or behave like -p jack=unconnected
if "port" couldn#t be connected to.

> (3) request beast to connect to a specific client, and let it fail and
>    report an error if it doesn't work

that's exactly like (2), since reporting an error/warning will be a dialog
with a [x] deduping option.

> (4) request beast to start without connecting a client
>
> -> this is what -pjack=unconnected does right now, as long as there is
>   no jack client called "unconnected", in which case this behaves like
>   (2)

is "unconnected" some kind of 'standard' in the jack world?
what do other programs use as identifier for this mode?
i'm asking because i think "unconnected" is maybe a bit more tedious
than just "open". what do you think?

> What I wasn't too sure about when designing the syntax is whether the
> position of the argument should play a role.

definitely, take a peek at beast --bse-driver-list. note that i've not
seen anything about your "syntax" yet though, you just wrote "jack=PORT"
in the docs. in any case, you should basically pick up what the OSS driver
does, for jack that'd be (mockup):

   jack jack=PORT,MODE                 // or s/MODE/FLAGS/ if you prefer
     JACK sound daemon PCM driver:
       PORT - Jack port name (use "open" for unconnected ports)
       MODE - may contain "rw" or "wo" for read-write or write-only
              access; adding "hs" forces hard sync on underruns.
              Adding "auto" allows failing port connects and behaves
              as if "open" was specified as PORT.
     Devices:
    >        alsa_pcm (read-write)


> Suppose we also decided
> some day to add ro (readonly) and wo (writeonly) as options. Should then
>
> -p jack=auto,ro,alsa_pcm
>
> be the same like
>
> -p jack=alsa_pcm,ro,auto

nope, like OSS, you're supposed to use jack=RHS with:
   RHS          := DEVICEorPORT [ ',' FLAG ]*
   DEVICEorPORT := 'alsa_pcm' | 'open'
   FLAG         := 'rw' | 'wo' | 'auto' | 'hs'

> This rules out the possibility of connecting clients called "auto" or
> "ro", which is maybe a bad decision. So maybe we should start with
> something like
>
> -p jack=connect:alsa_pcm
> -p jack=connect:alsa_pcm,wo
>
> or so? I am really not sure what would be a good and clean syntax here,
> that handles all cases that are needed.

i don't think we have to do that just yet without any use cases.
using the port name as first arg is good and easy enough and allowes
for arbitrary (albeit not comma-comprised) port names.


>>> It could even happen that we are not connected initially (latency = 0),
>>> and only get connected afterwards by a patchbay, where the user is
>>> loading a session. Thus in such a case we would always misbehave.
>>
>> what's that about being non-connected.
>> does jack still request audio from beast when unconnected?
>
> Yes.

ok, that's where you'd then usually want to use -p jack=PATCHBAY-INPUT,auto
if i understand correctly. right?

>> and, you said jack clients would always auto connect to alsa_pcm
>> and that alsa_pcm should always be present. so why would we start
>> up disconnected?
>
> Thats not what I wanted to say. I just pointed out that lots of people
> hardcode alsa_pcm somewhere in their code. However, whether a jack
> application should startup connected or unconnected, or how defaulting
> or auto connecting should work for "normal" jack applications is an open
> issue as far as I understand from reading the mailing list. There have
> been some threads suggesting this or that (like api extensions, extra
> default flags on ports and the like), and some people have been pointing
> out that they like this or that behaviour better.
>
> So I think until people have figured out what they want, I'd like to see
> beast to support both: connect on open and open unconnected (so that
> some external program can do the work).

ok, i see.
well, ... and default to "alsa_pcm" i presume?

>>>>> guint user_buffer_frames = BSE_PCM_DEVICE (device)->req_latency_ms *
>>>>> handle->mix_freq / 1000;
>>>>> guint buffer_frames = max (min_buffer_frames, user_buffer_frames);
>>>>>
>>>>> jack->input_ringbuffer.resize (buffer_frames, handle->n_channels);
>>>>> jack->output_ringbuffer.resize (buffer_frames, handle->n_channels);
>>>>
>>>> the way this is written, you're doubling the latency because you apply
>>>> it to the input and the output buffer. you should just apply it to the
>>>> output buffer (and also subtract the input buffer latency before doing
>>>> that)
>>>
>>> The input buffer should not produce latency. Ideally, the filled frames
>>> of the input buffer and the filled frames of the output buffer should
>>> add up to exactly buffer_frames. Note that the input_buffer is *empty*
>>> initially while the output_buffer is *full* initially, and the jack
>>> callback methodically only adds as many frames to the input_buffer as it
>>> takes from the output_buffer, whereas the engine process cycle does the
>>> inverse.
>>
>> that means input latency increases with every output underrun, because
>> in that case you're faking extra output values. basically, in duplex
>> scenrios under latency constraints you have to properly resync input
>> and output and can't just pad output values.
>
> I don't think it can happen. After an output underrun, the
> jack_device_retrigger code clears both ringbuffers (input and output),
> and then refills the output buffer with zeros.

well, note that for the sake of making resyncronization/retriggering
less nasty when you encounter underruns, we might want to do less
than fully retriggerring the device.
e.g. we could "soft-sync" by writing an extra 0-buffer into jack and
by throwing away a single input buffer (to keep the input latency).
doing that will need the jack thread to write-access the read-pointer
of the ring buffer though, which you code currently doesn't account for.
or, it might also be possible to have an atomic throw-away-input-blocks
counter that the BSE driver trhead could decrement and ignore incoming
buffers.... that's a bit hackish in the presence of a limited ring buffer
size though, so adjusting the read-pointer from the jack thread would be
much better.
reducing underrun impact by using soft-sync vs. hard-sync is also
done by the OSS driver ("hs" flag).

>>> Timeouts are a bad solution because they either occur too soon (in which
>>> case the engine thread may wake up, but can't do the computation), or
>>> too late, because there is always jitter. In any case you loose precious
>>> time or waste precious CPU cycles or both.
>>
>> note that the BSE audio driver model *and* the BSE engine model are based
>> on timeouts. that's what check_io() provides and what the engine needs.
>> you won't get any other programming model with BSE, regardless of whether
>> you like it or not ;)
>
> Well, pipes are another programming model. I don't think we'll need
> timeouts when we can use pipes.

it's not a question of "need". we *have* timeout behaviour already.

> But right now I can only make
> theoretical claims about it: I will have to work on the code to see
> whether the code can be written in a way so that check_io will rely on
> the notifications of pipes (and the lockless ringbuffers) only.

fine. if we switched to pipe based notification, we should do that for
*all* drivers. since the changes related to this are not going to be
trivial, i really want to seperate this from the general jack driver
development. so please lets concentrate on finishing up bse-jack with
the current timing model and after that look into timing alternatives.

> Then it
> will be able to return an arbitary timeout, but the pipe will still
> ensure that everything goes well.
>
>   Cu... Stefan

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Esben Stien
In reply to this post by Tim Janik
Tim Janik <[hidden email]> writes:

> you said jack clients would always auto connect to alsa_pcm

This is wrong. The general consensus is that JACK applications should
never auto connect to anything. Connection management is handled
externally. There is talk of a new addition to the API, where the user
may set a flag to indicate if the application should auto connect to a
specific set of ports, but it's not there yet.

--
Esben Stien is b0ef@e     s      a            
         http://www. s     t    n m
          irc://irc.  b  -  i  .   e/%23contact
           sip:b0ef@   e     e
           jid:b0ef@    n     n
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: bsepcmdevice-jack.cc (Re: Beast Jack driver)

Tim Janik
On Wed, 7 Jun 2006, Esben Stien wrote:

> Tim Janik <[hidden email]> writes:
>
>> you said jack clients would always auto connect to alsa_pcm
>
> This is wrong. The general consensus is that JACK applications should
> never auto connect to anything. Connection management is handled
> externally. There is talk of a new addition to the API, where the user
> may set a flag to indicate if the application should auto connect to a
> specific set of ports, but it's not there yet.

a flag for auto-connect settable by the user?
we could certainly add that to our preferences.

> --
> Esben Stien is b0ef@e     s      a
>         http://www. s     t    n m
>          irc://irc.  b  -  i  .   e/%23contact
>           sip:b0ef@   e     e
>           jid:b0ef@    n     n

---
ciaoTJ
_______________________________________________
beast mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/beast