For 2.6.30
Oprofile Buffer
- One cpu_buffer for each CPU
- NMI or IRQ handler add record to cpu_buffer
- Timer based per-cpu work queue remove record from cpu_buffer
- One reader and one writer, so it is easy for mutual exclusion and synchronization
- fix size, may overflow if write too much or not removed quickly enough
- One event_buffer
- Timer based per-cpu work queue add record to event_buffer from that of cpu_buffer
- User space program remove record from event_buffer. User space program is waken up based on threshold.
- One reader and multiple writer, mutex is used on write side, because writers are in process context (work queue).
- fix size, may overflow if write too much or not read out quickly enough
- Synchronization between cpu_buffer, event_buffer and use space program
- Used for profile, so record flow bandwidth is predictable
- Maximum bandwidth for cpu_buffer: CPU number * (cpu_buffer size / timer interval)
- Used for profile, so record flow bandwidth is predictable
Unified trace ring buffer
- One ring_buffer_per_cpu for each CPU
- In fact two levels of records, the first level is struct buffer_page, the second level is struct ring_buffer_event.
- Preempt disabling is used to provide mutual exclusion between process contexts.
- Inside one struct buffer_page, atomic operation (local_add_return) is used to provide mutual exclusion for struct ring_buffer_event. Because process context can only be preempted by IRQ and NMI, and process context can not continue until IRQ and NMI finishes. The length of struct buffer_page is known (local_add_return may exceed buffer_page length).
- In writer side, "write" indicates allocated records, while "commit" indicates completed records.
- spinlock (ring_buffer_per_cpu->lock) and IRQ disable is used to provide mutual exclusion for struct buffer_page (reader and writer). In NMI environment, if try_spin_lock failed, the record is discarded, this acceptable for tracing.
- At least 3 pages are needed for each CPU, 2 buffer_page, 1 reader page.
- ring_buffer_per_cpu->reader_lock is used to provide mutual exclusion between multiple readers, while ring_buffer_per_cpu->lock is used to provide mutual exclusion between reader and writer.
- fix size, may overflow if write too much or not consumed quickly enough
- Iterator (iter) can be used on reader side
Mcelog buffer
- One global buffer with fixed record length, fixed size. May overflow.
- Writer side lock-less, implemented using cmpxchg() + finished flag. Records are added in NMI/timer context.
- Memory order should be considered explicitly because of lock-less
- Reader side is protected by mutex, because normally there is only one reader. Records are removed in process context.
- Multiple writers, one reader, throughput bottleneck lies in reader side.
No comments:
Post a Comment