Summary

The current design

設計(する)
of the Read trait is nonoptimal as it requires that the buffer passed to its various
さまざまな
methods be pre-initialized even though the contents will be immediately
直後に、直接的に
overwritten. This RFC proposes an interface to allow
許可する、可能にする
implementors and consumers of Read types to robustly and soundly work with uninitialized
未初期化の
buffers.

Motivation

Background

The core of the Read trait looks like this:

#![allow(unused)] fn main() { pub trait Read { /// Reads data into `buf`, returning the number of bytes written. fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>; } }

Code working with a reader needs to create the buffer that will be passed to read; the simple approach is something like this:

#![allow(unused)] fn main() { let mut buf = [0; 1024]; let nread = reader.read(&mut buf)?; process_data(&buf[..nread]); }

However, that approach isn't ideal since the work spent to zero the buffer is wasted. The reader should be overwriting the part of the buffer we're working with, after all. Ideally, we wouldn't have to perform any initialization

初期化
at all:

#![allow(unused)] fn main() { let mut buf: [u8; 1024] = unsafe { MaybeUninit::uninit().assume_init() }; let nread = reader.read(&mut buf)?; process_data(&buf[..nread]); }

However, whether it is allowed

許可する、可能にする
to call
呼び出し
assume_init() on an array
配列
of uninitialized
未初期化の
integers
整数
is still subject of discussion. And either way, this is definitely unsound when working with an arbitrary
任意の
reader. The Read trait is not unsafe, so the soundness of working with an implementation
実装
can't depend on the "reasonableness" of the implementation
実装
for soundness. The implementation
実装
could read from the buffer, or return the wrong number of bytes read:

#![allow(unused)] fn main() { struct BrokenReader; impl Read for BrokenReader { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { Ok(buf.len()) } } struct BrokenReader2; impl Read for BrokenReader2 { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { if buf[0] == 0 { buf[0] = 1; } else { buf[0] = 2; } Ok(1) } } }

In either case, the process_data call

呼び出し
above would be working with uninitialized
未初期化の
memory. Uninitialized
未初期化の
memory is a dangerous (and often misunderstood) beast. Uninitialized
未初期化の
memory does not have an arbitrary
任意の
value; it actually has an undefined value. Undefined values can very quickly turn into undefined behavior.
ふるまい
Check out Ralf's blog post for a more extensive discussion of uninitialized
未初期化の
memory.

But how bad are undefined values really?

Are undefined values really that bad in practice? Consider

考える、みなす
a function that tries to use an uninitialized
未初期化の
buffer with a reader:

#![allow(unused)] fn main() { fn unsound_read_u32_be<R>(r: &mut R) -> io::Result<u32> where R: Read, { let mut buf: [u8; 4] = unsafe { MaybeUninit::uninit().assume_init() }; r.read_exact(&mut buf)?; Ok(u32::from_be_bytes(buf)) } }

Now consider

考える、みなす
this function that tries to use unsound_read_u32_be:

#![allow(unused)] fn main() { pub fn blammo() -> NonZeroU32 { let n = unsound_read_u32_be(&mut BrokenReader).unwrap(); NonZeroU32::new(n).unwrap_or(NonZeroU32::new(1).unwrap()) } }

It should clearly only be able to return a nonzero value, but if we compile it using rustc 1.42.0 for the x86_64-unknown-linux-gnu target, the function compiles down to this:

example::blammo: ret

That means that it will return whatever arbitrary

任意の
number happened to be in the %rax register. That could very well happen to be 0, which violates
違反する
the invariant of NonZeroU32 and any upstream callers
呼び出し側
of blammo will have a bad time. Because the value that unsound_read_u32_be returned was undefined, the compiler completely removed the check for 0!

We want to be able to take

とる
advantage of the improved performance of avoiding
避ける、回避する
buffer initialization
初期化
without triggering
引き起こす
undefined behavior
ふるまい
in safe code.

Why not just initialize?

If working with uninitialized

未初期化の
buffers carries these risks, why should we bother with it at all? Code dealing with IO in both the standard library and the ecosystem today already works with uninitialized
未初期化の
buffers because there are concrete,
具体的な/具象的な
nontrivial performance improvements from doing so:

Given

与えられた
that the ecosystem has already found that uninitialized
未初期化の
buffer use is important enough to deal with, the standard library should provide a more robust framework to work with.

In addition,

追加
working with regular
普通の、正規の
initialized
初期化
buffers can be more complex
複素数、複文の
than working with uninitialized
未初期化の
buffers! Back in 2015, the standard library's implementation
実装
of Read::read_to_end was found to be wildly inefficient due to insufficiently careful management
管理
of buffer sizes because it was initializing
初期化
them. The fix improved the performance of small reads by over 4,000x! If the buffer did not need to be initialized,
初期化
the simpler implementation
実装
would have been fine.

Guide-level explanation

The ReadBuf type manages a progressively initialized

初期化
buffer of bytes. It is primarily used to avoid
避ける、回避する
buffer initialization
初期化
overhead when working with types implementing the Read trait. It wraps a buffer of possibly-uninitialized bytes and tracks how much of the buffer has been initialized
初期化
and how much of the buffer has been filled. Tracking the set
セットする、集合
of initialized
初期化
bytes allows
許可する、可能にする
initialization
初期化
costs to only be paid once, even if the buffer is used repeatedly in a loop.

Here's a small example of working with a reader using a ReadBuf:

#![allow(unused)] fn main() { // The base level buffer uses the `MaybeUninit` type to avoid having to initialize the whole 8kb of memory up-front. let mut buf = [MaybeUninit::<u8>::uninit(); 8192]; // We then wrap that in a `ReadBuf` to track the state of the buffer. let mut buf = ReadBuf::uninit(&mut buf); loop { // Read some data into the buffer. some_reader.read_buf(&mut buf)?; // If nothing was written into the buffer, we're at EOF. if buf.filled().is_empty() { break; } // Otherwise, process the data. process_data(buf.filled()); // And then clear the buffer out so we can read into it again. This just resets the amount of filled data to 0, // but preserves the memory of how much of the buffer has been initialized. buf.clear(); } }

It is important that we created the ReadBuf outside of the loop. If we instead created it in each loop iteration

反復、繰り返し
we would fail to preserve the knowledge of how much of it has been initialized.
初期化

When implementing Read, the author can choose between an entirely safe interface that exposes an initialized

初期化
buffer, or an unsafe interface that allows
許可する、可能にする
the code to work directly
直接
with the uninitialized
未初期化の
buffer for higher performance.

A safe Read implementation:

実装

#![allow(unused)] fn main() { impl Read for MyReader { fn read_buf(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<()> { // Get access to the unwritten part of the buffer, making sure it has been fully initialized. Since `ReadBuf` // tracks the initialization state of the buffer, this is "free" after the first time it's called. let unfilled: &mut [u8] = buf.initialize_unfilled(); // Fill the whole buffer with some nonsense. for (i, byte) in unfilled.iter_mut().enumerate() { *byte = i as u8; } // And indicate that we've written the whole thing. let len = unfilled.len(); buf.add_filled(len); Ok(()) } } }

An unsafe Read implementation:

実装

#![allow(unused)] fn main() { impl Read for TcpStream { fn read_buf(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<()> { unsafe { // Get access to the filled part of the buffer, without initializing it. This method is unsafe; we are // responsible for ensuring that we don't "de-initialize" portions of it that have previously been // initialized. let unfilled: &mut [MaybeUninit<u8>] = buf.unfilled_mut(); // We're just delegating to the libc read function, which returns an `isize`. The return value indicates // an error if negative and the number of bytes read otherwise. let nread = libc::read(self.fd, unfilled.as_mut_ptr().cast::<libc::c_void>(), unfilled.len()); if nread < 0 { return Err(io::Error::last_os_error()); } let nread = nread as usize; // If the read succeeded, tell the buffer that the read-to portion has been initialized. This method is // unsafe; we are responsible for ensuring that this portion of the buffer has actually been initialized. buf.assume_init(nread); // And indicate that we've written the bytes as well. Unlike `assume_initialized`, this method is safe, // and asserts that the written portion of the buffer does not advance beyond the initialized portion of // the buffer. If we didn't call `assume_init` above, this call could panic. buf.add_filled(nread); Ok(()) } } } }

Reference-level explanation

#![allow(unused)] fn main() { /// A wrapper around a byte buffer that is incrementally filled and initialized. /// /// This type is a sort of "double cursor". It tracks three regions in the buffer: a region at the beginning of the /// buffer that has been logically filled with data, a region that has been initialized at some point but not yet /// logically filled, and a region at the end that is fully uninitialized. The filled region is guaranteed to be a /// subset of the initialized region. /// /// In summary, the contents of the buffer can be visualized as: /// ```not_rust /// [ capacity ] /// [ filled | unfilled ] /// [ initialized | uninitialized ] /// ``` pub struct ReadBuf<'a> { buf: &'a mut [MaybeUninit<u8>], filled: usize, initialized: usize, } impl<'a> ReadBuf<'a> { /// Creates a new `ReadBuf` from a fully initialized buffer. #[inline] pub fn new(buf: &'a mut [u8]) -> ReadBuf<'a> { ... } /// Creates a new `ReadBuf` from a fully uninitialized buffer. /// /// Use `assume_init` if part of the buffer is known to be already inintialized. #[inline] pub fn uninit(buf: &'a mut [MaybeUninit<u8>]) -> ReadBuf<'a> { ... } /// Returns the total capacity of the buffer. #[inline] pub fn capacity(&self) -> usize { ... } /// Returns a shared reference to the filled portion of the buffer. #[inline] pub fn filled(&self) -> &[u8] { ... } /// Returns a mutable reference to the filled portion of the buffer. #[inline] pub fn filled_mut(&mut self) -> &mut [u8] { ... } /// Returns a shared reference to the initialized portion of the buffer. /// /// This includes the filled portion. #[inline] pub fn initialized(&self) -> &[u8] { ... } /// Returns a mutable reference to the initialized portion of the buffer. /// /// This includes the filled portion. #[inline] pub fn initialized_mut(&mut self) -> &mut [u8] { ... } /// Returns a mutable reference to the unfilled part of the buffer without ensuring that it has been fully /// initialized. /// /// # Safety /// /// The caller must not de-initialize portions of the buffer that have already been initialized. #[inline] pub unsafe fn unfilled_mut(&mut self) -> &mut [MaybeUninit<u8>] { ... } /// Returns a mutable reference to the unfilled part of the buffer, ensuring it is fully initialized. /// /// Since `ReadBuf` tracks the region of the buffer that has been initialized, this is effectively "free" after /// the first use. #[inline] pub fn initialize_unfilled(&mut self) -> &mut [u8] { ... } /// Returns a mutable reference to the first `n` bytes of the unfilled part of the buffer, ensuring it is /// fully initialized. /// /// # Panics /// /// Panics if `self.remaining()` is less than `n`. #[inline] pub fn initialize_unfilled_to(&mut self, n: usize) -> &mut [u8] { ... } /// Returns the number of bytes at the end of the slice that have not yet been filled. #[inline] pub fn remaining(&self) -> usize { ... } /// Clears the buffer, resetting the filled region to empty. /// /// The number of initialized bytes is not changed, and the contents of the buffer are not modified. #[inline] pub fn clear(&mut self) { ... } /// Increases the size of the filled region of the buffer. /// /// The number of initialized bytes is not changed. /// /// # Panics /// /// Panics if the filled region of the buffer would become larger than the initialized region. #[inline] pub fn add_filled(&mut self, n: usize) { ... } /// Sets the size of the filled region of the buffer. /// /// The number of initialized bytes is not changed. /// /// Note that this can be used to *shrink* the filled region of the buffer in addition to growing it (for /// example, by a `Read` implementation that compresses data in-place). /// /// # Panics /// /// Panics if the filled region of the buffer would become larger than the initialized region. #[inline] pub fn set_filled(&mut self, n: usize) { ... } /// Asserts that the first `n` unfilled bytes of the buffer are initialized. /// /// `ReadBuf` assumes that bytes are never de-initialized, so this method does nothing when called with fewer /// bytes than are already known to be initialized. /// /// # Safety /// /// The caller must ensure that the first `n` unfilled bytes of the buffer have already been initialized. #[inline] pub unsafe fn assume_init(&mut self, n: usize) { ... } /// Appends data to the buffer, advancing the written position and possibly also the initialized position. /// /// # Panics /// /// Panics if `self.remaining()` is less than `buf.len()`. #[inline] pub fn append(&mut self, buf: &[u8]) { ... } } }

The Read trait uses this type in some of its methods:

#![allow(unused)] fn main() { pub trait Read { /// Pull some bytes from this source into the specified buffer. /// /// This is equivalent to the `read` method, except that it is passed a `ReadBuf` rather than `[u8]` to allow use /// with uninitialized buffers. The new data will be appended to any existing contents of `buf`. /// /// The default implementation delegates to `read`. fn read_buf(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<()> { let n = self.read(buf.initialize_unfilled())?; buf.add_filled(n); Ok(()) } ... } }

The ReadBuf type wraps a buffer of maybe-initialized bytes and tracks how much of the buffer has already been initialized.

初期化
This tracking is crucial because it avoids
避ける、回避する
repeated initialization
初期化
of already-initialized portions of the buffer. It additionally provides
与える
the guarantee
保証する
that the initialized
初期化
portion of the buffer is actually initialized
初期化
! A subtle characteristic of MaybeUninit is that you can de-initialize values in addition
追加
to initializing
初期化
them, and this API protects
守る
against that.

It additionally tracks the amount of data read into the buffer directly

直接
so that code working with Read implementations
実装
can be guaranteed
保証する
that the region of the buffer that the reader claims was written to is minimally initialized.
初期化
Thinking back to the BrokenReader in the motivation section,
the worst an implementation
実装
can now do (without writing unsound unsafe code) is to fail to actually write useful data into the buffer. Code using a BrokenReader may see bad data in the buffer, but the bad data at least has defined
定義する
contents now!

Note that read is still a required method of the Read trait. It can be easily written to delegate to read_buf:

#![allow(unused)] fn main() { impl Read for SomeReader { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { let mut buf = ReadBuf::new(buf); self.read_buf(&mut buf)?; Ok(buf.filled().len()) } fn read_buf(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<()> { ... } } }

Some of Read's convenience methods will be modified to take

とる
advantage of read_buf, and some new convenience methods will be added:
たす

#![allow(unused)] fn main() { pub trait Read { /// Read the exact number of bytes required to fill `buf`. /// /// This is equivalent to the `read_exact` method, except that it is passed a `ReadBuf` rather than `[u8]` to /// allow use with uninitialized buffers. fn read_buf_exact(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<()> { while buf.remaining() > 0 { let prev_filled = buf.filled().len(); match self.read_buf(&mut buf) { Ok(()) => {} Err(e) if e.kind() == io::ErrorKind::Interrupted => continue, Err(e) => return Err(e), } if buf.filled().len() == prev_filled { return Err(io::Error::new(io::ErrorKind::UnexpectedEof, "failed to fill buffer")); } } Ok(()) } fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> { let initial_len = buf.len(); let mut initialized = 0; loop { if buf.len() == buf.capacity() { buf.reserve(32); } let mut read_buf = ReadBuf::uninit(buf.spare_capacity_mut()); unsafe { read_buf.assume_init(initialized); } match self.read_buf(&mut read_buf) { Ok(()) => {} Err(e) if e.kind() = io::ErrorKind::Interrupted => continue, Err(e) => return Err(e), } if read_buf.filled().is_empty() { break; } initialized = read_buf.initialized().len() - read_buf.filled().len(); let new_len = buf.len() + read_buf.filled().len(); unsafe { buf.set_len(new_len); } } Ok(buf.len() - initial_len) } } pub fn copy<R, W>(reader: &mut R, writer: &mut W) -> io::Result<u64> where R: Read, W: Write, { let mut buf = [MaybeUninit::uninit(); 4096]; let mut buf = ReadBuf::uninit(&mut buf); let mut len = 0; loop { match reader.read_buf(&mut buf) { Ok(()) => {}, Err(e) if e.kind() == io::ErrorKind::Interrupted => continue, Err(e) => return Err(e), }; if buf.filled().is_empty() { break; } len += buf.filled().len() as u64; writer.write_all(buf.filled())?; buf.clear(); } Ok(len) } }

The existing std::io::Initializer type and Read::initializer method will be removed.

Vectored reads use a similar

似ている、同様の
API:

#![allow(unused)] fn main() { /// A possibly-uninitialized version of `IoSliceMut`. /// /// It is guaranteed to have exactly the same layout and ABI as `IoSliceMut`. pub struct MaybeUninitIoSliceMut<'a> { ... } impl<'a> MaybeUninitIoSliceMut<'a> { /// Creates a new `MaybeUninitIoSliceMut` from a slice of maybe-uninitialized bytes. #[inline] pub fn new(buf: &'a mut [MaybeUninit<u8>]) -> MaybeUninitIoSliceMut<'a> { ... } } impl<'a> Deref for MaybeUninitIoSliceMut<'a> { type Target = [MaybeUninit<u8>]; ... } impl<'a> DerefMut for MaybeUninitIoSliceMut<'a> { ... } /// A wrapper over a set of incrementally-initialized buffers. pub struct ReadBufs<'a> { ... } impl<'a> ReadBufs<'a> { /// Creates a new `ReadBufs` from a set of fully initialized buffers. #[inline] pub fn new(bufs: &'a mut [IoSliceMut<'a>]) -> ReadBufs<'a> { ... } /// Creates a new `ReadBufs` from a set of fully uninitialized buffers. /// /// Use `assume_init` if part of the buffers are known to be already initialized. #[inline] pub fn uninit(bufs: &'a mut [MaybeUninitIoSliceMut<'a>]) -> ReadBufs<'a> { ... } ... } pub trait Read { /// Pull some bytes from this source into the specified set of buffers. /// /// This is equivalent to the `read_vectored` method, except that it is passed a `ReadBufs` rather than /// `[IoSliceMut]` to allow use with uninitialized buffers. The new data will be appended to any existing contents /// of `bufs`. /// /// The default implementation delegates to `read_vectored`. fn read_buf_vectored(&mut self, bufs: &mut ReadBufs<'_>) -> io::Result<()> { ... } } }

Drawbacks

This introduces a nontrivial amount of complexity to one of the standard library's core traits, and results

結果、戻り値
in sets
セットする、集合
of almost-but-not-quite identical
同一の(である)
methods (read/read_buf, read_exact/read_buf_exact, etc). It's unfortunate that an implementor of Read based
基となる、基底(の)
on read_buf needs to add a boilerplate read implementation.
実装

Rationale and alternatives
代わりのもの、選択肢

Any solution to this problem needs to satisfy a set

セットする、集合
of constraints:
制約

  1. It needs to be backwards compatible. Duh.
  2. It needs to be efficiently backwards compatible. Code that doesn't write unsafe should not be penalized by the new APIs. For example, code working with a reader written before these new APIs were introduced should not become slower once that code starts trying to use the new APIs.
  3. It must be compatible with dyn Read. Trait objects are used pervasively in IO code, so a solution can't depend on monomorphization or specialization.
  4. It needs to work with both normal and vectored IO (via read_vectored).
  5. It needs to be composable. Readers are very commonly nested (e.g. GzipReader<TlsStream<TcpStream>>), and wrapper readers should be able to opt-in to fast paths supported by their inner
    内側の
    reader.
  6. A reader that does want to work directly
    直接
    with uninitialized
    未初期化の
    memory does, at some reasonable point, need to write the word unsafe.

This RFC covers the proposed solution. For in-depth coverage of other options and the rationale for this particular approach over others, please refer

参照する
to this Dropbox Paper writeup or my discussion with Niko Matsakis.

The proposal in the Dropbox Paper does differ from the proposal in this RFC in one significant way: its definition

定義
of read_buf returns an io::Result<usize> like read does, and the ReadBuf only tracks the initialized
初期化
region and not the written-to region:

#![allow(unused)] fn main() { pub trait Read { fn read_buf(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<usize> { ... } } }

This has a subtle but important drawback. From the perspective of code working with a Read implementation,

実装
the initialization
初期化
state of the buffer can be trusted to be correct, but the number of bytes read cannot! This mix of trusted and untrusted information can be quite a footgun for unsafe code working with a reader. For example, read_to_end needs to remember to assert that the number of bytes read is less than the number of bytes initialized
初期化
before calling
呼び出し
set_len on the Vec<u8> that it's reading into. Moving that bit of state into ReadBuf avoids
避ける、回避する
the issue by allowing
許可する、可能にする
ReadBuf to guarantee
保証する
that these two values stay consistent.

The concept of ReadBuf is not inherently tied to working with u8 buffers; it could alternatively be parameterized

仮引数
over the value type and hypothetically used in other contexts. However, the API for such a type can be iterated
繰り返す、反復する
on in an external crate.

Prior art

The standard library currently has the concept of a buffer "initializer". The Read trait has an (unstable) method which returns an Initializer object which can take

とる
a &mut [u8] of uninitialized
未初期化の
memory and initialize
初期化する
it as needed for use with the associated
関連付けられた
reader. Then the buffer is just passed to read as normal.

The tokio::io::AsyncRead trait has a somewhat similar

似ている、同様の
approach, with a prepare_uninitialized_buffer method which takes
とる
a &mut [MaybeUninit<u8>] slice and initializes
初期化
it if necessary.

Refer

参照する
to the links in the "Rationale and alternatives"
代わりのもの、選択肢
section
above for a discussion of the issues with these approaches.

Unresolved questions

Should read_buf return the number of bytes read like read does or should the ReadBuf track it instead? Some operations,

演算、操作
like checking for EOF, are a bit simpler if read_buf returns the value, but the confusion around what is and is not trustworthy is worrysome for unsafe code working with Read implementations.
実装

Future possibilities

Some of the complexity in the implementation

実装
of read_to_end above is due to having to manually track how much of the Vec<u8>'s spare capacity
容量
has already been initialized
初期化
between iterations
反復、繰り返し
of the read loop. There is probably some kind of abstraction that could be defined
定義する
to encapsulate that logic.

Users shouldn't be required to manually write a version of read that delegates to read_buf. We should be able to eventually add a default implementation

実装
of read, along with a requirement that one of read and read_buf must be overridden.