- Feature Name: repr_simd, platform_intrinsics, cfg_target_feature
- Start Date: 2015-06-02
- RFC PR: rust-lang/rfcs#1199
- Rust Issue: rust-lang/rust#27731
Summary
Lay the ground work for building powerful SIMD functionality.
Motivation
SIMD (Single-Instruction Multiple-Data) is an important part of performant modern applications. Most CPUs used for that sort of task provide dedicated hardware and instructions
This RFC lays the ground-work for building nice SIMD functionality, but doesn't fill everything out. The goal here is to provide the raw
(An earlier variant of this RFC was discussed as a pre-RFC.)
Where does this code go? Aka. why not in std
?
This RFC is focused on building stable, powerful SIMD functionality in external crates, not std
.
This makes it much easier to support functionality only "occasionally" available with Rust's preexisting cfg
system. There's no way for std
to conditionally provide an API basedstd
in every configuration is certainly untenable. Hence, if it were to be in std
, there would need to be some highly delayed cfg
system to support that sort of conditional
With an external crate, we can leverage cargo
's existing build infrastructure: compiling with some target features will rebuild with those features enabled.
Detailed design設計(する)
The design
- types (
feature(repr_simd)
) - operations演算、操作(
feature(platform_intrinsics)
) - platform detection (
feature(cfg_target_feature)
)
The general
There is definitely a common core of SIMD functionality shared across many platforms, but this RFC doesn't try to extract
Types
There is a new attribute: repr(simd)
.
The simd
repr
can be attached to a struct
The repr(simd)
may not enforce that any trait boundsrepr(simd)
types. As such, it will be possible to get the code-generator to error out (ala the old transmute
size errors), however, this shouldn't causerepr(simd)
types would use some unsafe
trait as a bound
Addingrepr(simd)
to a type may increase its minimum/preferred alignment,
Operations演算、操作
CPU vendors usually offer "standard" C headers for their CPU specificarm_neon.h
and the ...mmintrin.h
headers for x86(-64).
All of these would be exposed as compiler intrinsics with names very similarextern
block with an appropriate ABI. This subsetextern
in stable code), and would not be exported by std
.
Example:
These all use entirely concrete
NB. The structural typing is just for the declaration:X
, it must always be calledX
, even if other types are structurally equalX
. Also, within a signature,add_...
all refer
There would additionally be a small set
- shuffles and extracting/inserting elements要素
- comparisons比較
- arithmetic算術
- conversions変換
All of these intrinsics are imported via an extern
directive similartransmute
, however, the SIMD operationsplatform-intrinsic
. Use of this ABI (and hence the intrinsics) is initially feature-gated under the platform_intrinsics
feature name. Why platform-intrinsic
rather than say simd-intrinsic
? There are non-SIMD platform-specific instructions_addcarry_u32
intrinsic correspondingADC
instruction).
Shuffles & element要素 operations演算、操作
One of the most powerful features of SIMD is the ability to rearrange data within vectors, giving super-linear speed-ups sometimes. As such, shuffles are exposed generally: intrinsics that represent
This may violate the "one instruction
The rawT
and U
are SIMD vector with the same elementU
has the appropriate length etc. Libraries can use traits to ensure
This approach has similar
These operations
The index arrayidx
has to be compile time constants.
Similarly,
The i0
indices do not have to be constant.v[i0] = elem
and v[i0]
respectively.
Comparisons比較
Comparisons
These are type checked during code-generation similarlyT
and U
have the same length, and that U
is appropriately "boolean"-y. Libraries can use traits to ensure
Arithmetic算術
Intrinsics will be provided
These will have codegen time checks that the element
add
,sub
,mul
: any float or integer整数typediv
: any float typeand
,or
,xor
,shl
(shift left),shr
(shift right): any integer整数type
(The integeri8
, ..., i64
, u8
, ..., u64
and the float types are f32
and f64
.)
Why not inline asm?
One alternative
- assembly is generally a black-box to optimisers, inhibiting optimisations, like algebraic simplification/transformation,
- programmers would have to manually synthesise the right sequence連なり、並びof operations演算、操作to achieve a given与えられたshuffle, while having a generic shuffle intrinsic lets the compiler do it (NB. the intention is that the programmer will still have access to the platform specific特定のoperations演算、操作for when the compiler synthesis isn't quite right),
- inline assembly is not currently stable in Rust and there's not a strong push for it to be so in the immediate future (although this could change).
Benefits of manualasm!
blocks that replace the intrinsics (they need to be designed
Platform Detection
The availability of efficientcfg(target_arch = "...")
is not precise enough. This RFC proposes a target_feature
cfg
, that would be set
- a default x86-64 compilation would essentially only setセットする、集合
target_feature = "sse"
andtarget_feature = "sse2"
- compiling with
-C target-feature="+sse4.2"
would setセットする、集合target_feature = "sse4.2"
,target_feature = "sse.4.1"
, ...,target_feature = "sse"
. - compiling with
-C target-cpu=native
on a modern CPU might setセットする、集合target_feature = "avx2"
,target_feature = "avx"
, ...
The possible values of target_feature
will be a selected whitelist, not necessarily just everything LLVM understands. There are other non-SIMD features that might have target_feature
s setpopcnt
and rdrnd
on x86/x86-64.)
With a cfg_if!
macro that expands to the first cfg
that is satisfied (ala @alexcrichton's cfg-if
), code might look like:
Extensions
-
scatter/gather operations
演算、操作allow許可する、可能にする(partially) operating on a SIMD vector of pointers. This would require allowing許可する、可能にするpointers(/references?) inrepr(simd)
types. -
allow
許可する、可能にする(and ignore無視するfor everything but type checking) zero-sized types inrepr(simd)
structs,構造、構造体to allow許可する、可能にするtagging them with markers -
the shuffle intrinsics could be made more relaxed in their type checking (i.e. not require that they return their second type parameter), to allow
許可する、可能にするmore type safety when combined合体する、組み合わせるwith generic simd types:#[repr(simd)] struct Simd2<T>(T, T); extern "platform-intrinsic" { fn simd_shuffle2<T, U>(x: T, y: T, idx: [u32; 2]) -> Simd2<U>; }
This should be a backwards-compatible generalisation.
Alternatives代わりのもの、選択肢
-
Intrinsics could instead by namespaced by ABI,
extern "x86-intrinsic"
,extern "arm-intrinsic"
. -
There could be more syntactic support for shuffles, either with true syntax,
文法or with a syntax文法extension. The latter might look like:shuffle![x, y, i0, i1, i2, i3, i4, ...]
. However, this requires that shuffles are restricted制限するto a single単一のtype only (i.e.Simd4<T>
can be shuffled toSimd4<T>
but nothing else), or some sort of type synthesis. The compiler has to somehow work out the return value:Presumably
z
should beSimd8<u32>
, but it's not obvious how the compiler can know this. Therepr(simd)
approach means there may be more than one SIMD-vector type with theSimd8<u32>
shape (or, in fact, there may be zero). -
With type-level integers,
整数there could be one shuffle intrinsic:fn simd_shuffle<T, U, const N: usize>(x: T, y: T, idx: [u32; N]) -> U;
NB. It is possible to add this as an additional
追加のintrinsic (possibly deprecating thesimd_shuffleNNN
forms) later. -
Type-level values can be applied
適用するmore generally: since the shuffle indices have to be compile time constants,定数the shuffle could befn simd_shuffle<T, U, const N: usize, const IDX: [u32; N]>(x: T, y: T) -> U;
-
Instead of platform detection, there could be feature detection (e.g. "platform supports something equivalent
等価to x86'sDPPS
"), but there probably aren't enough cross-platform commonalities for this to be worth it. (Each "feature" would essentially be a platform specific特定のcfg
anyway.) -
Check vector operators
演算子in debug mode just like the scalar versions. -
Make fixed length arrays
配列repr(simd)
-able (via just flattening), so that, say,#[repr(simd)] struct
and構造、構造体u32x4([u32; 4]);#[repr(simd)] struct
etc works. This will be most useful if/when we allow構造、構造体f64x8([f64; 4], [f64; 4]);許可する、可能にするgeneric-lengths,#[repr(simd)] struct
構造、構造体Simd<T, n>([T; n]); -
have 100% guaranteed
保証するtype-safety for generic#[repr(simd)]
types and the generic intrinsics. This would probably require a relatively complicated setセットする、集合of traits (with compiler integration).
Unresolved questions
- Should integer整数vectors get division automatically? Most CPUs don't support them for vectors.
- How should out-of-bounds shuffle and insert/extract indices be handled?