Summary

The purpose of this RFC is to provide a framework for SIMD to be used on stable Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the scaffolding for other platforms as well as a future portable SIMD design

設計(する)
(to be fleshed out in another RFC).

Motivation

Stable Rust today does not typically

一般的に、典型的に
expose all of the capabilities of the platform that you're running on. A notable gap in Rust's support includes SIMD (single instruction
命令
multiple
複数の
data) support. For example on x86 you don't currently have explicit
明示的な
access to the 128, 256, and 512 bit registers on the CPU. LLVM is in general
一般
an excellent optimizing compiler and often attempts to make use of these registers (auto vectorizing code), but it unfortunately is still somewhat limited and doesn't express the full power
累乗
of the various
さまざまな
SIMD intrinsics.

The goal of this RFC is to enable using SIMD intrinsics on stable Rust, and in general

一般
provide a means to access the architecture-specific functionality of each vendor. For example the AES intrinsics on x86 would also be made available through this RFC, not only the SIMD-related AVX intrinsics.

Note that this is certainly not the first discussion to broach the topic of SIMD in Rust, but rather this has been an ongoing discussion for quite some time now! For example the simd crate started long ago, we've had rfcs, we've had a lot of discussions on internals, and the stdsimd crate has been implemented.

実装する

This RFC draws from much of the historical feedback and design

設計(する)
that we've done around SIMD in Rust and is targeted at providing
与える
path forward for using SIMD on stable Rust while allowing
許可する、可能にする
the compiler to change in the future and retain a stable interface.

Guide-level explanation

Let's say you've just heard about this fancy feature called

呼び出し
"auto vectorization" in LLVM and you want to take
とる
advantage of it. For example you've got a function like this you'd like to make faster:

#![allow(unused)] fn main() { pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) { for ((a, b), c) in a.iter().zip(b).zip(c) { *c = *a + *b; } } }

When inspecting the assembly you notice that rustc is making use of the %xmmN registers which you've read is related to SSE on your CPU. You know, however, that your CPU supports up to AVX2 which has bigger registers, so you'd like to get access to them!

Your first solution to this problem is to compile with -C target-feature=+avx2, and after that you see the %ymmN registers being used, yay! Unfortunately though you're publishing this binary

2進数
on CPUs which may not actually have AVX2 as a feature, so you don't want to enable AVX2 for the entire program. Instead what you can do is enable it for just this function:

#![allow(unused)] fn main() { #[target_feature(enable = "avx2")] pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) { for ((a, b), c) in a.iter().zip(b).zip(c) { *c = *a + *b; } } }

And sure enough you see the %ymmN registers getting used in this function! Note, however, that because you've explicitly

明示的に
enabled a feature you're required to declare
宣言する
the function as unsafe, as specified
特定する、指定する、規定する
in RFC 2045 (although this requirement is likely to be relaxed in RFC 2212). This worked as a proof of concept but what you still need to do is dispatch at runtime whether the local CPU that you're running on supports AVX2 or not. Thankfully, though, libstd has a handy macro for this!

#![allow(unused)] fn main() { pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) { // Note that this `unsafe` block is safe because we're testing // that the `avx2` feature is indeed available on our CPU. if is_target_feature_detected!("avx2") { unsafe { foo_avx2(a, b, c) } } else { foo_fallback(a, b, c) } } #[target_feature(enable = "avx2")] unsafe fn foo_avx2(a: &[u8], b: &[u8], c: &mut [u8]) { foo_fallback(a, b, c) // the function below is inlined here } fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) { for ((a, b), c) in a.iter().zip(b).zip(c) { *c = *a + *b; } } }

And sure enough once again we see that foo is dispatching at runtime to the appropriate function, and only foo_avx2 is using our %ymmN registers!

Ok great! At this point we've seen how to enable CPU features for functions-at-a-time as well as how they could be used in a larger context

文脈、背景
to do runtime dispatch to the most appropriate implementation.
実装
As we saw in the motivation, however, we're just relying on LLVM to auto-vectorize here which often isn't good enough or otherwise
さもなければ
doesn't expose the functionality we want.

For explicit

明示的な
and guaranteed simd on stable Rust you'll be using a new module in the standard library, std::arch. The std::arch module is defined
定義する
by vendors/architectures, not us actually! For example Intel publishes a list
リスト、列挙する
of intrinsics
as does ARM. These exact functions and their signatures
シグネチャ
will be available in std::arch with types translated to Rust (e.g. int32_t becomes i32). Vendor specific
特定の
types like __m128i on Intel will also live in std::arch.

For example let's say that we're writing a function that encodes

符号化する
a &[u8] in ascii hex and we want to convert
変換する
&[1, 2] to "0102". The stdsimd crate currently has this as an example, and let's take
とる
a look at a few snippets
断片
from that.

First up you'll see the dispatch routine like we wrote above:

#![allow(unused)] fn main() { fn hex_encode<'a>(src: &[u8], dst: &'a mut [u8]) -> Result<&'a str, usize> { let len = src.len().checked_mul(2).unwrap(); if dst.len() < len { return Err(len); } #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] { if is_target_feature_detected!("avx2") { return unsafe { hex_encode_avx2(src, dst) }; } if is_target_feature_detected!("sse4.1") { return unsafe { hex_encode_sse41(src, dst) }; } } hex_encode_fallback(src, dst) } }

Here we have some routine business about hex encoding in general,

一般
and then for x86/x86_64 platforms we have optimized versions specifically
特に
for avx2 and sse41. Using the is_target_feature_detected! macro in libstd we saw above we'll dispatch to the correct one at runtime.

Taking

とる
a closer look at hex_encode_sse41 we see that it starts out with a bunch of weird looking function calls:
呼び出し

#![allow(unused)] fn main() { let ascii_zero = _mm_set1_epi8(b'0' as i8); let nines = _mm_set1_epi8(9); let ascii_a = _mm_set1_epi8((b'a' - 9 - 1) as i8); let and4bits = _mm_set1_epi8(0xf); }

As it turns out though, these are all Intel SIMD intrinsics! For example _mm_set1_epi8 is defined

定義する
as creating an instance
実例
of __m128i, a 128-bit integer
整数
register. The intrinsic specificall sets
セットする、集合
all bytes to the first argument.
引数

These functions are all imported through std::arch::* at the top of the example (in this case stdsimd::vendor::*). We go on to use a bunch of these intrinsics throughout the hex_encode_sse41 function to actually do the hex encoding.

The example listed

リスト、列挙する
currently has some tests/benchmarks as well, and if we run the benchmarks we'll see:

test benches::large_default ... bench: 73,432 ns/iter (+/- 12,526) = 14279 MB/s test benches::large_fallback ... bench: 1,711,030 ns/iter (+/- 286,642) = 612 MB/s test benches::small_default ... bench: 30 ns/iter (+/- 18) = 3900 MB/s test benches::small_fallback ... bench: 204 ns/iter (+/- 74) = 573 MB/s test benches::x86::large_avx2 ... bench: 69,742 ns/iter (+/- 9,157) = 15035 MB/s test benches::x86::large_sse41 ... bench: 108,463 ns/iter (+/- 70,250) = 9667 MB/s test benches::x86::small_avx2 ... bench: 25 ns/iter (+/- 8) = 4680 MB/s test benches::x86::small_sse41 ... bench: 25 ns/iter (+/- 14) = 4680 MB/s

Or in other words, our runtime dispatch implementation

実装
("default") is 20 times faster than the fallback implementation
実装
(no explicit
明示的な
SIMD). Furthermore the AVX2 implementation
実装
is nearly 2x faster than the SSE4.1 implementation
実装
for large inputs, and the SSE4.1 implementation
実装
is over 10x faster than the default fallback as well.

With std::arch and is_target_feature_detected! we've now written a program that's 20x faster on supported hardware, yet it also continues to run on older hardware as well! Not bad for a few dozen lines on each function!


Note that this RFC is explicitly

明示的に
not attempting to stabilize/design a set
セットする、集合
of "portable simd operations".
演算、操作
The contents of std::arch are platform specific
特定の
and provide no guarantees
保証する
about portability.
移植性
Efforts in the past, however, such as with simd.js and the simd crate show that it's desirable and useful to have a set
セットする、集合
of types which are usable across platforms.

Furthermore LLVM does quite a good job with a portable u32x4 type, for example, in terms

項、用語
of platform support and speed on platforms that support it. This RFC is not going to go too much into the details about these types, but rather these guidelines will still hold:
保有する、有効である

  • The intrinsics will not take
    とる
    portable types as arguments.
    引数
    For example u32x4 and __m128i will be different types on x86. The two types, however, will be convertible between one another (either via transmutes or via explicit
    明示的な
    functions). This conversion
    変換
    will have zero run-time
    実行時の(に)
    cost.
  • The portable simd types will likely live in a module like std::simd rather than std::arch.

The design

設計(する)
around these portable types are ongoing, however, and stay tuned for an RFC for the std::simd module!

Reference-level explanation

Stable SIMD in Rust ends up requiring a surprising number of both language

言語
and library features to be productive. Thankfully, though, there's has been quite a bit of experimentation over time with SIMD in Rust and we've gotten a lot of good experience along the way! In this section,
though, we'll be going into the various
さまざまな
features in detail.

The #[target_feature] Attribute

The #[target_feature] attribute was specified

特定する、指定する、規定する
in RFC 2045 and remains unchanged from that specification.
仕様、仕様書
As a quick recap it allows
許可する、可能にする
you to add this attribute to functions:

#![allow(unused)] fn main() { #[target_feature(enable = "avx2")] }

The only currently allowed

許可する、可能にする
key is enable (one day we may allow
許可する、可能にする
disable). The string values accepted
受け付ける、受理する
by enable will be separately stabilized but are likely to be guided by vendor definitions.
定義
For example in Intel's intrinsic guide it lists
リスト、列挙する
functions under "AVX2", so we're likely to stabilize the name avx2 for Rust.

There's a good number of these features supported by the compiler today. It's expected that when stabilizing other pieces of this RFC the names of the following

下記の、次に続く、追従する
existing features for x86 will be stabilized:

  • aes
  • avx2
  • avx
  • bmi2
  • bmi - to be renamed to bmi1, the name Intel gives it
  • fma
  • fxsr
  • lzcnt
  • popcnt
  • rdrnd
  • rdseed
  • sse2
  • sse3
  • sse4.1
  • sse4.2
  • sse
  • ssse3
  • xsave
  • xsavec
  • xsaveopt
  • xsaves

Note that AVX-512 names are missing from this list,

リスト、列挙する
but that's because we haven't implemented
実装する
any AVX-512 intrinsics yet. Those'll get stabilized on their own once implemented.
実装する
Additionally note that mmx is missing from this list.
リスト、列挙する
For reasons discussed later, it's proposed that MMX types are omitted
省略する
from the first pass of stabilization. AMD also has some specific
特定の
features supported (sse4a, tbm), and so do ARM, MIPS, and PowerPC, but none of these feature names a proposed for becoming stable in the first pass.

The target_feature value in #[cfg]

In addition

追加
to enabling target features for a function the compiler will also allow
許可する、可能にする
statically testing whether a particular target feature is enabled. This corresponds
照応する
to the cfg_target_feature feature today in rustc, and can be seen via:

#![allow(unused)] fn main() { #[cfg(target_feature = "avx")] fn foo() { // implementation that can use `avx` } #[cfg(not(target_feature = "avx"))] fn foo() { // a fallback implementation } }

Additionally this is also made available to cfg!:

#![allow(unused)] fn main() { if cfg!(target_feature = "avx") { println!("this program was compiled with AVX support"); } }

The #[cfg] attribute and cfg! macro statically resolve

解決する
and do not do runtime dispatch. Tweaking these functions is currently done via the -C target-feature flag to the compiler. This flag to the compiler accepts
受け付ける、受理する
a similar
似ている、同様の
set
セットする、集合
of strings to the ones specified
特定する、指定する、規定する
above and is already "stable".

The is_target_feature_detected! Macro

One mode of operation

演算、操作
with intrinsics is to compile part of a program with certain CPU features enabled but not the entire program. This way a portable program can be compiled which runs across a broad range
範囲
of hardware which can still benefit from optimized implementations
実装
for particular hardware at runtime.

The crux of this support in libstd is this macro provided

与える
by libstd, is_target_feature_detected!. The macro will accept
受け付ける、受理する
one argument,
引数
a string literal. The string can be any feature passed to #[target_feature(enable = ...)] for the platform you're compiling for. Finally, the macro will resolve
解決する
to a bool result.
結果、戻り値

For example on x86 you could write:

#![allow(unused)] fn main() { if is_target_feature_detected!("sse4.1") { println!("this cpu has sse4.1 features enabled!"); } }

It would, however, be an error to write this on x86 cpus:

#![allow(unused)] fn main() { is_target_feature_detected!("neon"); //~ COMPILE ERROR: neon is an ARM feature, not x86 is_target_feature_detected!("foo"); //~ COMPILE ERROR: unknown target feature for x86 }

The macro is intended to be implemented

実装する
in the std crate (not core) and made available via the normal macro preludes. The implementation
実装
of this macro is expected to be what stdsimd does today, notably:

  • The first time the macro is invoked
    呼び出す
    all the local CPU features will be detected.
  • The detected features will then be cached globally (when possible and currently in a bitset) for the rest of the execution
    実行
    of the program.
  • Further
    さらなる、それ以上
    invocations
    呼び出し
    of is_target_feature_detected! are expected to be cheap runtime dispatches. (aka load a value and check whether a bit is set)
  • Exception:
    例外
    in some cases the result
    結果、戻り値
    of the macro is statically known: for example, is_target_feature_detected!("sse2") when the binary
    2進数
    is being compiled with "sse42" globally. In these cases, none of the steps above are performed and the macro just expands to true.

The exact method of CPU feature detection various

さまざまな
by platform, OS, and architecture. For example on x86 we make heavy use of the cpuid instruction
命令
whereas on ARM the implementation
実装
currently uses getauxval//proc mounted information on Linux. It's expected that the detection will vary for each particular target, as necessary.

Note that the implementation

実装
details of the macro today prevent
防ぐ
it from being located in libcore. If getauxval or /proc is used that requires libc to be available or File in one form
形式、形態、形作る
or another. These concerns are currently std-only (not available in libcore). This is also a conservative route for x86 where it is possible to do CPU feature detection in libcore (as it's just the cpuid instruction), but for consistency across platforms the macro will only be available in libstd for now. This placement can of course be relaxed in the future if necessary.

The std::arch Module

This is where the real meat is. A new module will be added

たす
to the standard library, std::arch. This module will also be available in core::arch (and std will simply reexport it). The contents of this module provide no portability
移植性
guarantees
保証する
(like std::os and unlike the rest of std). APIs present
ある
on one platform may not be present
ある
on another.

The contents of the arch modules are defined

定義する
by, well, architectures! For example Intel has an intrinsics guide which will serve as a guideline for all contents in the arch module itself. The standard library will not deviate in naming or type signature
シグネチャ
of any intrinsic defined
定義する
by an architecture.

For example most Intel intrinsics start with _mm_ or _mm256_ for 128 and 256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel says. Note that all intrinsics will also be unsafe, according

according to 〜に応じて
to RFC 2045.

Function signatures

シグネチャ
defined
定義する
by architectures are typically
一般的に、典型的に
defined
定義する
in terms
項、用語
of C types. In Rust, however, those aren't always available! Instead the intrinsics will be defined
定義する
in terms
項、用語
of Rust-specific types. Some types are easily translated such as int32_t, but otherwise
さもなければ
a different mapping may be applied
適用する
per-architecture.

The current proposed mapping for x86 intrinsics is:

What Intel saysRust Type
void**mut u8
chari8
shorti16
inti32
long longi64
const inti32 [0]

[0] required to be compile-time constants.

定数

Other than these exceptions

例外
the x86 intrinsics will be defined
定義する
exactly
正確に
as Intel defines
定義する
them. This will necessitate new types in the std::arch modules for SIMD registers! For example these new types will all be present
ある
in std::arch on x86 platforms:

  • __m128
  • __m128d
  • __m128i
  • __m256
  • __m256d
  • __m256i

(note that AVX-512 types will come in the future!)

Infrastructure-wise the contents of std::arch are expected to continue to be defined

定義する
in the stdsimd crate/repository. Intrinsics defined
定義する
here go through a rigorous test suite involving automatic
自動の
verification against the upstream architecture defintion, verification that the correct instruction
命令
is generated
生成する
by LLVM, and at least one runtime test for each intrinsic to ensure
保証する
it not only compiles but also produces
産出、産出する
correct results.
結果、戻り値
It's expected that stabilized intrinsics will meet these critera to the best of their ability.

Currently today on x86 and ARM platforms the stdsimd crate performs all these checks, but these checks are not yet implemented

実装する
for PowerPC/MIPS/etc, but that's always just some more work to do!

It's not expected that the contents of std::arch will remain static

静的な
for all time. Rather intrinsics will continue to be implemented
実装する
in stdsimd and make their way into the main Rust repository. For example there are not currently any implemented
実装する
AVX-512 intrinsics, but that doesn't mean we won't implement
実装する
them! Rather once implemented
実装する
they'll be stabilized and included in libstd following
下記の、次に続く、追従する
the Rust release model.

The types in std::arch

It's worth paying close attention to the types in std::arch. Types like __m128i are intended to represent

表現する
a 128-bit packed SIMD register on x86, but there's nothing stopping you from using types like Option<__m128i> in your program! Most generic containers and such probably aren't written with packed SIMD types in mind, and it'd be a bummer if everything stopped working once you used a packed SIMD type in one of them.

Instead it will be required that the types defined

定義する
in std::arch do indeed work when used in "nonstandard" contexts. For example Option<__m128i> should never produce
産出する
a compiler error or a codegen error when used (it may just be slower than you expect). This requires special care to be taken both in representation
表現
of these arguments
引数
as well as their ABI.

Implementation-wise these packed SIMD types are implemented

実装する
in terms
項、用語
of a "portable" vector in LLVM. LLVM as a results
結果、戻り値
gets most of this logic correct for us in terms
項、用語
of having these compile without errors in many contexts. The ABI, however, had to be special cased as it was a location where LLVM didn't always help us.

The Rust ABI will currently be implemented

実装する
such that all related packed-SIMD types are passed via memory instead of by-value. This means that regardless
〜に関わらず
of the target features enabled for a function everything should agree on how packed SIMD arguments
引数
are passed across boundaries and whatnot.

Again though, note that this section

is largely an implementation
実装
detail of SIMD in Rust today, though it's enabling usage without a lot of codegen errors popping up all over the place.

Intrinsics in std::arch and constant
定数
arguments
引数

There are a number of intrinsics on x86 (and other) platforms that require their arguments

引数
to be constants
定数
rather than decided at runtime. For example _mm_insert_pi16 requires its third argument
引数
to be a constant
定数
value where only the lowest two bits are used. The Rust type system, however, does not currently have a stable way of expressing this information.

Eventually we will likely have some form

形式、形態、形作る
of const arguments
引数
or const machinery to guarantee
保証する
that these functions are called
呼び出し
and monomorphized with constant
定数
arguments,
引数
but for now this RFC proposes taking
とる
a more conservative route forward. Instead we'll, for the time being, forbid the functions from being invoked
呼び出す
with non-constant
非定数
arguments.
引数
Prototyped in #48018 the stdsimd crate will have an unstable attribute where the compiler can help provide this guarantee.
保証する
As an extra precaution as well #48078 also implements
実装する
disallowing
許可しない
taking
とる
a function pointer to these intrinsics, requiring a direct invocation.
呼び出し

It's hoped that this restriction

制限
will allow
許可する、可能にする
stdsimd to be forward compatible with a future const-powered world of Rust but in the meantime not otherwise
さもなければ
block stabilization of these intrinsics.

Portable packed SIMD

So-called "portable" packed SIMD types are currently implemented

実装する
in both the stdsimd and simd crates. These types look like u8x16 and explicitly
明示的に
specify
特定する、指定する、規定する
how many lanes they have (16 in this case) and what type each line is (u8 in this case). These types are intended to unconditionally available (like the rest of libstd) and simply optimized much more aggressively on platforms that have native support for the various
さまざまな
operations.
演算、操作

For example u8x16::add may be implemented

実装する
differently on i586 vs i686, and also entirely differently implemented
実装する
on ARM. The idea with portable packed SIMD types is that they represent
表現する
a broad intersection of fast behavior
ふるまい
across a broad range
範囲
of platforms.

It's intended that this RFC neither includes nor rules out the addition

追加
of portable packed-SIMD types in Rust. It's expected that an upcoming RFC will propose the addition
追加
of these types in a std::simd module. These types will be orthogonal to scalable-vector types which are expected to be proposed in another, also different, RFC. What this RFC does do, however, is explicitly
明示的に
specify
特定する、指定する、規定する
that:

  • The portable SIMD types (both packed and scalable) will not be used in intrinsics.
  • The per-architecture SIMD types will be distinct
    区別された/独立した
    types from the portable SIMD types.

Or, in other words, it's intended that portable SIMD types are entirely decoupled from intrinsics. If they both end up being implemented

実装する
then there will be jkro-cost interoperation between them, but neither will necessarily depend on the other.

Not stabilizing MMX in this RFC

This RFC proposed notably omitting

省略する
the MMX intrinsics, or those related to __m64 in other words. The MMX type __m64 and the intrinsics have been somewhat problematic in a number of ways. Known cases include:

Due to these issues having an unclear conclusion as well as a seeming lack of desire to stabilize MMX intrinsics, the __m64 and all related intrinsics will not be stabilized via this RFC.

Drawbacks

This RFC represents

表現する
a significant addition
追加
to the standard library, maybe one of the largest we've ever done! As a result
結果、戻り値
alternate implementations
実装
of Rust will likely have a difficult time catching up to rustc/LLVM with all the SIMD intrinsics. Additionaly the semantics of "packed SIMD types should work everywhere" may be overly difficult to implement
実装する
in alternate implementations.
実装
It is worth noting that both Cranelift and GCC support packed SIMD types.

Due to the enormity of what's being added

たす
to the standard library it's also infeasible to carefully review each addition
追加
in isolation. While there are a number of automatic
自動の
verifications in place we're likely to inevitably make a mistake when stabilizing something. Fixing a stabilization can often be quite difficult and costly.

Rationale and alternatives
代わりのもの、選択肢

Over the years quite a few iterations

反復、繰り返し
have happened for SIMD in Rust. This RFC draws from as many of those as it can and attempts to strike a balance between exposing functionality while still allowing
許可する、可能にする
us to implement
実装する
everything in a stable fashion for years to come (and without blocking us from updating LLVM, for example). Despite this there's a few alternatives
代わりのもの、選択肢
we could do as well.

Portable types in architecture interfaces

It was initially attempted in the stdsimd crate that we would use the portable types on all of the intrinsics. For example instead of:

#![allow(unused)] fn main() { pub unsafe fn _mm_set1_epi8(val: i8) -> __m128i; }

we would instead define

定義する

#![allow(unused)] fn main() { pub unsafe fn _mm_set1_epi8(val: i8) -> i8x16; }

The latter definition

定義
here is much easier for a beginner to SIMD to read (or at least I gawked when I first saw __m128i).

The downside of this approach, however, is that Intel isn't telling us what to do. While that may sound simple, this RFC is proposing an addition

追加
of thousands of functions to the standard library in a stable manner. It's infeasible for any one person (or even the entire libs team) to scrutinize all functions and assess whether the correct signature
シグネチャ
is applied
適用する
(aka was it i8x16 or i16x8?)

Furthermore not all intrinsics from Intel actually have an interpretation

解釈
with one of the portable types. For example some intrinsics take
とる
an integer
整数
constant
定数
which when 0 interprets
解釈する
the input as u8x16 and when 1 interprets
解釈する
it as u16x8 (as an example). This effectively means that there isn't a correct choice in all situations for what portable type should be used.

Consequently

結果として
it's proposed that instead of portable types the exact architecture types are used in all intrinsics. This provides
与える
us a much easier route to stabilization ("make sure it's what Intel says") along with no need to interpret
解釈する
what Intel does and attempt to find the most appropriate type.

There is interest by both current stdsimd maintainers and users to expose a "better-typed" SIMD API in crates.io that builds on top of the intrinsics proposed for stabilization here.

Stabilizing SIMD implementation
実装
details

Another alternative

代わりのもの、選択肢
to the bulk of this RFC is allowing
許可する、可能にする
more raw
生の
access to the internals of LLVM. For example stabilizing #[repr(simd)] or the ability to write extern "platform-intrinsics" { ... } or #[link_llvm_intrinsic...]. This is certainly a much smaller surface area to stabilize (aka not thousands of intrinsics).

This avenue was decided against, however, for a few reasons:

  • Such raw
    生の
    interfaces may change over time as they simply represent
    表現する
    LLVM as a current point in time rather than what LLVM wants to do in the future.
  • Alternate implementations
    実装
    of rustc or alternate rustc backends like Cranelift may not expose the same sort of functionality that LLVM provides,
    与える
    or implementing
    実装する
    the interfaces may be much more difficult in alternate backends than in LLVM's.

As a result,

結果、戻り値
it's intended that instead of exposing raw
生の
building blocks (and allowing
許可する、可能にする
stdsimd to live on crates.io) we'll instead pull in stdsimd to the standard library and expose it as the stable interface to SIMD in Rust.

Unresolved questions

There's a number of unresolved questions around stabilizing SIMD today which don't pose serious blockers and may also wish to be considered

みなす、考慮する
open bugs rather than blocking stabilization:

Relying on unexported LLVM APIs

The static

静的な
detection performed by cfg! and #[cfg] currently relies on a Rust-specific patch to LLVM. LLVM internal knows all about hierarchies
階層
of features and such. For example if you compile with -C target-feature=+avx2 then cfg!(target_feature = "sse2") also needs to resolve
解決する
to true. Rustc, however, does not know about these features and relies on learning this information through LLVM.

Unfortunately though LLVM does not actually export this information for us to consume (as far as we know). As a result

結果、戻り値
we have a local patch which exposed this information for us to read. The consequence
結果
of this implementation
実装
detail is that when compiled against the system LLVM the cfg! macro may not work correctly when used in conjunction with -C target-feature or -C target-cpu flags.

It appears

現れる
that clang vendors and/or duplicates LLVM's functionality in this regard. It's an option for rustc to do the same but it may also be an option to expose the information in upstream LLVM. So far there appears
現れる
to have been no attempts to upstream this patch into LLVM itself.

Packed SIMD types in extern functions are not sound

The packed SIMD types have particular care paid to them with respect to their ABI in Rust and how they're passed between functions, notably to ensure

保証する
that they work properly throughout Rust programs. The "fix" to pass them in memory over function calls,
呼び出し
however, was only applied
適用する
to the "Rust" ABI and not any other function ABIs.

A consequence

結果
of this change is that if you instead label all your functions extern then the same bug will arise. It may be possible to implement
実装する
a "lint" or a compiler error of sorts to forbid this situation in the short term.
項、用語
We could also possibly
ことによると、可能性としてあり得ることに
accept
受け付ける、受理する
this as a known bug for the time being.

What if we're wrong?

Despite the CI infrastructure of the stdsimd crate it seems inevitable that we'll get an intrinsic wrong at some point. What do we do in a situation like that? This situation is somewhat analagous to the libc crate but there you can fix the problem downstream (just have a corrected type/definition) for vendor intrinsics it's not so easy.

Currently it seems that our only recourse would be to add a 2 suffix to the function name or otherwise

さもなければ
indicate there's a corrected version, but that's not always the best...