Feature Name: vendor_intrinsics
Start Date: 2018-02-04
RFC PR: rust-lang/rfcs#2325
Rust Issue: rust-lang/rust#48556

Summary

The purpose of this RFC is to provide a framework for SIMD to be used on stable Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the scaffolding for other platforms as well as a future portable SIMD design

設計（する）

(to be fleshed out in another RFC).

Motivation

Stable Rust today does not typically

一般的に、典型的に

expose all of the capabilities of the platform that you're running on. A notable gap in Rust's support includes SIMD (single instruction

命令

multiple

複数の

data) support. For example on x86 you don't currently have explicit

明示的な

access to the 128, 256, and 512 bit registers on the CPU. LLVM is in general

一般

an excellent optimizing compiler and often attempts to make use of these registers (auto vectorizing code), but it unfortunately is still somewhat limited and doesn't express the full power

累乗

of the various

さまざまな

SIMD intrinsics.

The goal of this RFC is to enable using SIMD intrinsics on stable Rust, and in general

一般

provide a means to access the architecture-specific functionality of each vendor. For example the AES intrinsics on x86 would also be made available through this RFC, not only the SIMD-related AVX intrinsics.

Note that this is certainly not the first discussion to broach the topic of SIMD in Rust, but rather this has been an ongoing discussion for quite some time now! For example the simd crate started long ago, we've had rfcs, we've had a lot of discussions on internals, and the stdsimd crate has been implemented.

実装する

This RFC draws from much of the historical feedback and design

設計（する）

that we've done around SIMD in Rust and is targeted at providing

与える

path forward for using SIMD on stable Rust while allowing

許可する、可能にする

the compiler to change in the future and retain a stable interface.

Guide-level explanation

Let's say you've just heard about this fancy feature called

呼び出し

"auto vectorization" in LLVM and you want to take

とる

advantage of it. For example you've got a function like this you'd like to make faster:



#![allow(unused)]
fn main() {
pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
    for ((a, b), c) in a.iter().zip(b).zip(c) {
        *c = *a + *b;
    }
}
}

When inspecting the assembly you notice that rustc is making use of the %xmmN registers which you've read is related to SSE on your CPU. You know, however, that your CPU supports up to AVX2 which has bigger registers, so you'd like to get access to them!

Your first solution to this problem is to compile with -C target-feature=+avx2, and after that you see the %ymmN registers being used, yay! Unfortunately though you're publishing this binary

２進数

on CPUs which may not actually have AVX2 as a feature, so you don't want to enable AVX2 for the entire program. Instead what you can do is enable it for just this function:



#![allow(unused)]
fn main() {
#[target_feature(enable = "avx2")]
pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
    for ((a, b), c) in a.iter().zip(b).zip(c) {
        *c = *a + *b;
    }
}
}

And sure enough you see the %ymmN registers getting used in this function! Note, however, that because you've explicitly

明示的に

enabled a feature you're required to declare

宣言する

the function as unsafe, as specified

特定する、指定する、規定する

in RFC 2045 (although this requirement is likely to be relaxed in RFC 2212). This worked as a proof of concept but what you still need to do is dispatch at runtime whether the local CPU that you're running on supports AVX2 or not. Thankfully, though, libstd has a handy macro for this!



#![allow(unused)]
fn main() {
pub fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
    // Note that this `unsafe` block is safe because we're testing
    // that the `avx2` feature is indeed available on our CPU.
    if is_target_feature_detected!("avx2") {
        unsafe { foo_avx2(a, b, c) }
    } else {
        foo_fallback(a, b, c)
    }
}

#[target_feature(enable = "avx2")]
unsafe fn foo_avx2(a: &[u8], b: &[u8], c: &mut [u8]) {
    foo_fallback(a, b, c) // the function below is inlined here
}

fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {
    for ((a, b), c) in a.iter().zip(b).zip(c) {
        *c = *a + *b;
    }
}
}

And sure enough once again we see that foo is dispatching at runtime to the appropriate function, and only foo_avx2 is using our %ymmN registers!

Ok great! At this point we've seen how to enable CPU features for functions-at-a-time as well as how they could be used in a larger context

文脈、背景

to do runtime dispatch to the most appropriate implementation.

実装

As we saw in the motivation, however, we're just relying on LLVM to auto-vectorize here which often isn't good enough or otherwise

さもなければ

doesn't expose the functionality we want.

For explicit

明示的な

and guaranteed simd on stable Rust you'll be using a new module in the standard library, std::arch. The std::arch module is defined

定義する

by vendors/architectures, not us actually! For example Intel publishes a list

リスト、列挙する

of intrinsics as does ARM. These exact functions and their signatures

シグネチャ

will be available in std::arch with types translated to Rust (e.g. int32_t becomes i32). Vendor specific

特定の

types like __m128i on Intel will also live in std::arch.

For example let's say that we're writing a function that encodes

符号化する

a &[u8] in ascii hex and we want to convert

変換する

&[1, 2] to "0102". The stdsimd crate currently has this as an example, and let's take

とる

a look at a few snippets

断片

from that.

First up you'll see the dispatch routine like we wrote above:



#![allow(unused)]
fn main() {
fn hex_encode<'a>(src: &[u8], dst: &'a mut [u8]) -> Result<&'a str, usize> {
    let len = src.len().checked_mul(2).unwrap();
    if dst.len() < len {
        return Err(len);
    }

    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    {
        if is_target_feature_detected!("avx2") {
            return unsafe { hex_encode_avx2(src, dst) };
        }
        if is_target_feature_detected!("sse4.1") {
            return unsafe { hex_encode_sse41(src, dst) };
        }
    }

    hex_encode_fallback(src, dst)
}
}

Here we have some routine business about hex encoding in general,

一般

and then for x86/x86_64 platforms we have optimized versions specifically

特に

for avx2 and sse41. Using the is_target_feature_detected! macro in libstd we saw above we'll dispatch to the correct one at runtime.

Taking

とる

a closer look at hex_encode_sse41 we see that it starts out with a bunch of weird looking function calls:

呼び出し



#![allow(unused)]
fn main() {
let ascii_zero = _mm_set1_epi8(b'0' as i8);
let nines = _mm_set1_epi8(9);
let ascii_a = _mm_set1_epi8((b'a' - 9 - 1) as i8);
let and4bits = _mm_set1_epi8(0xf);
}

As it turns out though, these are all Intel SIMD intrinsics! For example _mm_set1_epi8 is defined

定義する

as creating an instance

実例

of __m128i, a 128-bit integer

整数

セットする、集合

all bytes to the first argument.

引数

These functions are all imported through std::arch::* at the top of the example (in this case stdsimd::vendor::*). We go on to use a bunch of these intrinsics throughout the hex_encode_sse41 function to actually do the hex encoding.

The example listed

リスト、列挙する

currently has some tests/benchmarks as well, and if we run the benchmarks we'll see:


test benches::large_default    ... bench:      73,432 ns/iter (+/- 12,526) = 14279 MB/s
test benches::large_fallback   ... bench:   1,711,030 ns/iter (+/- 286,642) = 612 MB/s
test benches::small_default    ... bench:          30 ns/iter (+/- 18) = 3900 MB/s
test benches::small_fallback   ... bench:         204 ns/iter (+/- 74) = 573 MB/s
test benches::x86::large_avx2  ... bench:      69,742 ns/iter (+/- 9,157) = 15035 MB/s
test benches::x86::large_sse41 ... bench:     108,463 ns/iter (+/- 70,250) = 9667 MB/s
test benches::x86::small_avx2  ... bench:          25 ns/iter (+/- 8) = 4680 MB/s
test benches::x86::small_sse41 ... bench:          25 ns/iter (+/- 14) = 4680 MB/s

Or in other words, our runtime dispatch implementation

実装

("default") is 20 times faster than the fallback implementation

実装

(no explicit

明示的な

SIMD). Furthermore the AVX2 implementation

実装

is nearly 2x faster than the SSE4.1 implementation

実装

for large inputs, and the SSE4.1 implementation

実装

is over 10x faster than the default fallback as well.

With std::arch and is_target_feature_detected! we've now written a program that's 20x faster on supported hardware, yet it also continues to run on older hardware as well! Not bad for a few dozen lines on each function!

Note that this RFC is explicitly

明示的に

not attempting to stabilize/design a set

セットする、集合

of "portable simd operations".

演算、操作

The contents of std::arch are platform specific

特定の

and provide no guarantees

保証する

about portability.

移植性

Efforts in the past, however, such as with simd.js and the simd crate show that it's desirable and useful to have a set

セットする、集合

of types which are usable across platforms.

Furthermore LLVM does quite a good job with a portable u32x4 type, for example, in terms

項、用語

of platform support and speed on platforms that support it. This RFC is not going to go too much into the details about these types, but rather these guidelines will still hold:

保有する、有効である

The intrinsics will not take
とる
portable types as arguments.
引数
For example u32x4 and __m128i will be different types on x86. The two types, however, will be convertible between one another (either via transmutes or via explicit
明示的な
functions). This conversion
変換
will have zero run-time
実行時の(に)
cost.
The portable simd types will likely live in a module like std::simd rather than std::arch.

The design

設計（する）

around these portable types are ongoing, however, and stay tuned for an RFC for the std::simd module!

Reference-level explanation

Stable SIMD in Rust ends up requiring a surprising number of both language

言語

and library features to be productive. Thankfully, though, there's has been quite a bit of experimentation over time with SIMD in Rust and we've gotten a lot of good experience along the way! In this section,

節

though, we'll be going into the various

さまざまな

features in detail.

The `#[target_feature]` Attribute

The #[target_feature] attribute was specified

特定する、指定する、規定する

in RFC 2045 and remains unchanged from that specification.

仕様、仕様書

As a quick recap it allows

許可する、可能にする

you to add this attribute to functions:



#![allow(unused)]
fn main() {
#[target_feature(enable = "avx2")]
}

The only currently allowed

許可する、可能にする

key is enable (one day we may allow

許可する、可能にする

disable). The string values accepted

受け付ける、受理する

by enable will be separately stabilized but are likely to be guided by vendor definitions.

定義

For example in Intel's intrinsic guide it lists

リスト、列挙する

functions under "AVX2", so we're likely to stabilize the name avx2 for Rust.

There's a good number of these features supported by the compiler today. It's expected that when stabilizing other pieces of this RFC the names of the following

下記の、次に続く、追従する

existing features for x86 will be stabilized:

aes
avx2
avx
bmi2
bmi - to be renamed to bmi1, the name Intel gives it
fma
fxsr
lzcnt
popcnt
rdrnd
rdseed
sse2
sse3
sse4.1
sse4.2
sse
ssse3
xsave
xsavec
xsaveopt
xsaves

Note that AVX-512 names are missing from this list,

リスト、列挙する

but that's because we haven't implemented

実装する

any AVX-512 intrinsics yet. Those'll get stabilized on their own once implemented.

実装する

Additionally note that mmx is missing from this list.

リスト、列挙する

For reasons discussed later, it's proposed that MMX types are omitted

省略する

from the first pass of stabilization. AMD also has some specific

特定の

features supported (sse4a, tbm), and so do ARM, MIPS, and PowerPC, but none of these feature names a proposed for becoming stable in the first pass.

The `target_feature` value in `#[cfg]`

In addition

追加

to enabling target features for a function the compiler will also allow

許可する、可能にする

statically testing whether a particular target feature is enabled. This corresponds

照応する

to the cfg_target_feature feature today in rustc, and can be seen via:



#![allow(unused)]
fn main() {
#[cfg(target_feature = "avx")]
fn foo() {
    // implementation that can use `avx`
}

#[cfg(not(target_feature = "avx"))]
fn foo() {
    // a fallback implementation
}
}

Additionally this is also made available to cfg!:



#![allow(unused)]
fn main() {
if cfg!(target_feature = "avx") {
    println!("this program was compiled with AVX support");
}
}

The #[cfg] attribute and cfg! macro statically resolve

解決する

and do not do runtime dispatch. Tweaking these functions is currently done via the -C target-feature flag to the compiler. This flag to the compiler accepts

受け付ける、受理する

a similar

似ている、同様の

set

セットする、集合

of strings to the ones specified

特定する、指定する、規定する

above and is already "stable".

The `is_target_feature_detected!` Macro

One mode of operation

演算、操作

with intrinsics is to compile part of a program with certain CPU features enabled but not the entire program. This way a portable program can be compiled which runs across a broad range

範囲

of hardware which can still benefit from optimized implementations

実装

for particular hardware at runtime.

The crux of this support in libstd is this macro provided

与える

by libstd, is_target_feature_detected!. The macro will accept

受け付ける、受理する

one argument,

引数

a string literal. The string can be any feature passed to #[target_feature(enable = ...)] for the platform you're compiling for. Finally, the macro will resolve

解決する

to a bool result.

結果、戻り値

For example on x86 you could write:



#![allow(unused)]
fn main() {
if is_target_feature_detected!("sse4.1") {
    println!("this cpu has sse4.1 features enabled!");
}
}

It would, however, be an error to write this on x86 cpus:



#![allow(unused)]
fn main() {
is_target_feature_detected!("neon"); //~ COMPILE ERROR: neon is an ARM feature, not x86
is_target_feature_detected!("foo"); //~ COMPILE ERROR: unknown target feature for x86
}

The macro is intended to be implemented

実装する

in the std crate (not core) and made available via the normal macro preludes. The implementation

実装

of this macro is expected to be what stdsimd does today, notably:

The first time the macro is invoked
呼び出す
all the local CPU features will be detected.
The detected features will then be cached globally (when possible and currently in a bitset) for the rest of the execution
実行
of the program.
Further
さらなる、それ以上
invocations
呼び出し
of is_target_feature_detected! are expected to be cheap runtime dispatches. (aka load a value and check whether a bit is set)
Exception:
例外
in some cases the result
結果、戻り値
of the macro is statically known: for example, is_target_feature_detected!("sse2") when the binary
２進数
is being compiled with "sse42" globally. In these cases, none of the steps above are performed and the macro just expands to true.

The exact method of CPU feature detection various

さまざまな

by platform, OS, and architecture. For example on x86 we make heavy use of the cpuid instruction

命令

whereas on ARM the implementation

実装

currently uses getauxval//proc mounted information on Linux. It's expected that the detection will vary for each particular target, as necessary.

Note that the implementation

実装

details of the macro today prevent

防ぐ

it from being located in libcore. If getauxval or /proc is used that requires libc to be available or File in one form

形式、形態、形作る

or another. These concerns are currently std-only (not available in libcore). This is also a conservative route for x86 where it is possible to do CPU feature detection in libcore (as it's just the cpuid instruction), but for consistency across platforms the macro will only be available in libstd for now. This placement can of course be relaxed in the future if necessary.

The `std::arch` Module

This is where the real meat is. A new module will be added

たす

to the standard library, std::arch. This module will also be available in core::arch (and std will simply reexport it). The contents of this module provide no portability

移植性

guarantees

保証する

(like std::os and unlike the rest of std). APIs present

ある

on one platform may not be present

ある

on another.

The contents of the arch modules are defined

定義する

by, well, architectures! For example Intel has an intrinsics guide which will serve as a guideline for all contents in the arch module itself. The standard library will not deviate in naming or type signature

シグネチャ

of any intrinsic defined

定義する

by an architecture.

For example most Intel intrinsics start with _mm_ or _mm256_ for 128 and 256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel says. Note that all intrinsics will also be unsafe, according

according to 〜に応じて

to RFC 2045.

Function signatures

シグネチャ

defined

定義する

by architectures are typically

一般的に、典型的に

defined

定義する

in terms

項、用語

of C types. In Rust, however, those aren't always available! Instead the intrinsics will be defined

定義する

in terms

項、用語

of Rust-specific types. Some types are easily translated such as int32_t, but otherwise

さもなければ

a different mapping may be applied

適用する

per-architecture.

The current proposed mapping for x86 intrinsics is:

What Intel says	Rust Type
`void*`	`*mut u8`
`char`	`i8`
`short`	`i16`
`int`	`i32`
`long long`	`i64`
`const int`	`i32` [0]

[0] required to be compile-time constants.

定数

Other than these exceptions

例外

the x86 intrinsics will be defined

定義する

exactly

正確に

as Intel defines

定義する

them. This will necessitate new types in the std::arch modules for SIMD registers! For example these new types will all be present

ある

in std::arch on x86 platforms:

__m128
__m128d
__m128i
__m256
__m256d
__m256i

(note that AVX-512 types will come in the future!)

Infrastructure-wise the contents of std::arch are expected to continue to be defined

定義する

in the stdsimd crate/repository. Intrinsics defined

定義する

here go through a rigorous test suite involving automatic

自動の

verification against the upstream architecture defintion, verification that the correct instruction

命令

is generated

生成する

by LLVM, and at least one runtime test for each intrinsic to ensure

保証する

it not only compiles but also produces

産出、産出する

correct results.

結果、戻り値

It's expected that stabilized intrinsics will meet these critera to the best of their ability.

Currently today on x86 and ARM platforms the stdsimd crate performs all these checks, but these checks are not yet implemented

実装する

for PowerPC/MIPS/etc, but that's always just some more work to do!

It's not expected that the contents of std::arch will remain static

静的な

for all time. Rather intrinsics will continue to be implemented

実装する

in stdsimd and make their way into the main Rust repository. For example there are not currently any implemented

実装する

AVX-512 intrinsics, but that doesn't mean we won't implement

実装する

them! Rather once implemented

実装する

they'll be stabilized and included in libstd following

下記の、次に続く、追従する

the Rust release model.

The types in `std::arch`

It's worth paying close attention to the types in std::arch. Types like __m128i are intended to represent

表現する

a 128-bit packed SIMD register on x86, but there's nothing stopping you from using types like Option<__m128i> in your program! Most generic containers and such probably aren't written with packed SIMD types in mind, and it'd be a bummer if everything stopped working once you used a packed SIMD type in one of them.

Instead it will be required that the types defined

定義する

in std::arch do indeed work when used in "nonstandard" contexts. For example Option<__m128i> should never produce

産出する

a compiler error or a codegen error when used (it may just be slower than you expect). This requires special care to be taken both in representation

表現

of these arguments

引数

as well as their ABI.

Implementation-wise these packed SIMD types are implemented

実装する

in terms

項、用語

of a "portable" vector in LLVM. LLVM as a results

結果、戻り値

gets most of this logic correct for us in terms

項、用語

of having these compile without errors in many contexts. The ABI, however, had to be special cased as it was a location where LLVM didn't always help us.

The Rust ABI will currently be implemented

実装する

such that all related packed-SIMD types are passed via memory instead of by-value. This means that regardless

〜に関わらず

of the target features enabled for a function everything should agree on how packed SIMD arguments

引数

are passed across boundaries and whatnot.

Again though, note that this section

節

is largely an implementation

実装

detail of SIMD in Rust today, though it's enabling usage without a lot of codegen errors popping up all over the place.

Intrinsics in `std::arch` and constant
定数
arguments
引数

There are a number of intrinsics on x86 (and other) platforms that require their arguments

引数

to be constants

定数

rather than decided at runtime. For example _mm_insert_pi16 requires its third argument

引数

to be a constant

定数

value where only the lowest two bits are used. The Rust type system, however, does not currently have a stable way of expressing this information.

Eventually we will likely have some form

形式、形態、形作る

of const arguments

引数

or const machinery to guarantee

保証する

that these functions are called

呼び出し

and monomorphized with constant

定数

arguments,

引数

but for now this RFC proposes taking

とる

a more conservative route forward. Instead we'll, for the time being, forbid the functions from being invoked

呼び出す

with non-constant

非定数

arguments.

引数

Prototyped in #48018 the stdsimd crate will have an unstable attribute where the compiler can help provide this guarantee.

保証する

As an extra precaution as well #48078 also implements

実装する

disallowing

許可しない

taking

とる

a function pointer to these intrinsics, requiring a direct invocation.

呼び出し

It's hoped that this restriction

制限

will allow

許可する、可能にする

stdsimd to be forward compatible with a future const-powered world of Rust but in the meantime not otherwise

さもなければ

block stabilization of these intrinsics.

Portable packed SIMD

So-called "portable" packed SIMD types are currently implemented

実装する

in both the stdsimd and simd crates. These types look like u8x16 and explicitly

明示的に

specify

特定する、指定する、規定する

how many lanes they have (16 in this case) and what type each line is (u8 in this case). These types are intended to unconditionally available (like the rest of libstd) and simply optimized much more aggressively on platforms that have native support for the various

さまざまな

operations.

演算、操作

For example u8x16::add may be implemented

実装する

differently on i586 vs i686, and also entirely differently implemented

実装する

on ARM. The idea with portable packed SIMD types is that they represent

表現する

a broad intersection of fast behavior

ふるまい

across a broad range

範囲

of platforms.

It's intended that this RFC neither includes nor rules out the addition

追加

of portable packed-SIMD types in Rust. It's expected that an upcoming RFC will propose the addition

追加

of these types in a std::simd module. These types will be orthogonal to scalable-vector types which are expected to be proposed in another, also different, RFC. What this RFC does do, however, is explicitly

明示的に

specify

特定する、指定する、規定する

that:

The portable SIMD types (both packed and scalable) will not be used in intrinsics.
The per-architecture SIMD types will be distinct
区別された/独立した
types from the portable SIMD types.

Or, in other words, it's intended that portable SIMD types are entirely decoupled from intrinsics. If they both end up being implemented

実装する

then there will be jkro-cost interoperation between them, but neither will necessarily depend on the other.

Not stabilizing MMX in this RFC

This RFC proposed notably omitting

省略する

the MMX intrinsics, or those related to __m64 in other words. The MMX type __m64 and the intrinsics have been somewhat problematic in a number of ways. Known cases include:

Due to these issues having an unclear conclusion as well as a seeming lack of desire to stabilize MMX intrinsics, the __m64 and all related intrinsics will not be stabilized via this RFC.

Drawbacks

This RFC represents

表現する

a significant addition

追加

to the standard library, maybe one of the largest we've ever done! As a result

結果、戻り値

alternate implementations

実装

of Rust will likely have a difficult time catching up to rustc/LLVM with all the SIMD intrinsics. Additionaly the semantics of "packed SIMD types should work everywhere" may be overly difficult to implement

実装する

in alternate implementations.

実装

It is worth noting that both Cranelift and GCC support packed SIMD types.

Due to the enormity of what's being added

たす

to the standard library it's also infeasible to carefully review each addition

追加

in isolation. While there are a number of automatic

自動の

verifications in place we're likely to inevitably make a mistake when stabilizing something. Fixing a stabilization can often be quite difficult and costly.

Rationale and alternatives
代わりのもの、選択肢

Over the years quite a few iterations

反復、繰り返し

have happened for SIMD in Rust. This RFC draws from as many of those as it can and attempts to strike a balance between exposing functionality while still allowing

許可する、可能にする

us to implement

実装する

everything in a stable fashion for years to come (and without blocking us from updating LLVM, for example). Despite this there's a few alternatives

代わりのもの、選択肢

we could do as well.

Portable types in architecture interfaces

It was initially attempted in the stdsimd crate that we would use the portable types on all of the intrinsics. For example instead of:



#![allow(unused)]
fn main() {
pub unsafe fn _mm_set1_epi8(val: i8) -> __m128i;
}

we would instead define

定義する



#![allow(unused)]
fn main() {
pub unsafe fn _mm_set1_epi8(val: i8) -> i8x16;
}

The latter definition

定義

here is much easier for a beginner to SIMD to read (or at least I gawked when I first saw __m128i).

The downside of this approach, however, is that Intel isn't telling us what to do. While that may sound simple, this RFC is proposing an addition

追加

of thousands of functions to the standard library in a stable manner. It's infeasible for any one person (or even the entire libs team) to scrutinize all functions and assess whether the correct signature

シグネチャ

is applied

適用する

(aka was it i8x16 or i16x8?)

Furthermore not all intrinsics from Intel actually have an interpretation

解釈

with one of the portable types. For example some intrinsics take

とる

an integer

整数

constant

定数

which when 0 interprets

解釈する

the input as u8x16 and when 1 interprets

解釈する

it as u16x8 (as an example). This effectively means that there isn't a correct choice in all situations for what portable type should be used.

Consequently

結果として

it's proposed that instead of portable types the exact architecture types are used in all intrinsics. This provides

与える

us a much easier route to stabilization ("make sure it's what Intel says") along with no need to interpret

解釈する

what Intel does and attempt to find the most appropriate type.

There is interest by both current stdsimd maintainers and users to expose a "better-typed" SIMD API in crates.io that builds on top of the intrinsics proposed for stabilization here.

Stabilizing SIMD implementation
実装
details

Another alternative

代わりのもの、選択肢

to the bulk of this RFC is allowing

許可する、可能にする

more raw

生の

access to the internals of LLVM. For example stabilizing #[repr(simd)] or the ability to write extern "platform-intrinsics" { ... } or #[link_llvm_intrinsic...]. This is certainly a much smaller surface area to stabilize (aka not thousands of intrinsics).

This avenue was decided against, however, for a few reasons:

Such raw
生の
interfaces may change over time as they simply represent
表現する
LLVM as a current point in time rather than what LLVM wants to do in the future.
Alternate implementations
実装
of rustc or alternate rustc backends like Cranelift may not expose the same sort of functionality that LLVM provides,
与える
or implementing
実装する
the interfaces may be much more difficult in alternate backends than in LLVM's.

As a result,

結果、戻り値

it's intended that instead of exposing raw

生の

building blocks (and allowing

許可する、可能にする

stdsimd to live on crates.io) we'll instead pull in stdsimd to the standard library and expose it as the stable interface to SIMD in Rust.

Unresolved questions

There's a number of unresolved questions around stabilizing SIMD today which don't pose serious blockers and may also wish to be considered

みなす、考慮する

open bugs rather than blocking stabilization:

Relying on unexported LLVM APIs

The static

静的な

detection performed by cfg! and #[cfg] currently relies on a Rust-specific patch to LLVM. LLVM internal knows all about hierarchies

階層

of features and such. For example if you compile with -C target-feature=+avx2 then cfg!(target_feature = "sse2") also needs to resolve

解決する

to true. Rustc, however, does not know about these features and relies on learning this information through LLVM.

Unfortunately though LLVM does not actually export this information for us to consume (as far as we know). As a result

結果、戻り値

we have a local patch which exposed this information for us to read. The consequence

結果

of this implementation

実装

detail is that when compiled against the system LLVM the cfg! macro may not work correctly when used in conjunction with -C target-feature or -C target-cpu flags.

It appears

現れる

that clang vendors and/or duplicates LLVM's functionality in this regard. It's an option for rustc to do the same but it may also be an option to expose the information in upstream LLVM. So far there appears

現れる

to have been no attempts to upstream this patch into LLVM itself.

Packed SIMD types in `extern` functions are not sound

The packed SIMD types have particular care paid to them with respect to their ABI in Rust and how they're passed between functions, notably to ensure

保証する

that they work properly throughout Rust programs. The "fix" to pass them in memory over function calls,

呼び出し

however, was only applied

適用する

to the "Rust" ABI and not any other function ABIs.

A consequence

結果

of this change is that if you instead label all your functions extern then the same bug will arise. It may be possible to implement

実装する

a "lint" or a compiler error of sorts to forbid this situation in the short term.

項、用語

We could also possibly

ことによると、可能性としてあり得ることに

受け付ける、受理する

this as a known bug for the time being.

What if we're wrong?

Despite the CI infrastructure of the stdsimd crate it seems inevitable that we'll get an intrinsic wrong at some point. What do we do in a situation like that? This situation is somewhat analagous to the libc crate but there you can fix the problem downstream (just have a corrected type/definition) for vendor intrinsics it's not so easy.

Currently it seems that our only recourse would be to add a 2 suffix to the function name or otherwise

さもなければ

indicate there's a corrected version, but that's not always the best...

The Rust RFC Book