Intrinsics are by definition provided by the compiler. GCC doesn't use that word, but its description for its built-in functions indicates that the compiler provides them, and library developers are encouraged to use them by providing macros redirecting standard names to the built-in names. However, the documentation seems to state that some standard library functions are routed to the built-in equivalents regardless of what the library says. This includes memcpy and memset. (I'm not sure how the compiler thinks it can implement malloc on its own. Especially when the compiler does not implement free.) Not that it appears to do so in practice. It does recognize a for-loop filling a uint8_t array as being the same as memset, but it generates a function call to memset rather than inline it. This is true for both GCC 6.3.0 on Windows and GCC 4.9.4 on Linux, so its reliance on a closed source, foreign C library on Windows does not appear to be a factor.
This was odd. If I actually call memset explicitly, then it gets inlined. Seems like some optimized steps are run in the wrong order. It appears that for loops are replaced by memset after memset has been otherwise been inlined.
The inlined memset does use rep stos, but it uses stosd (stosl in GAS), so unaligned memory access may be an issue. I haven't quite figured out how it deals with an element count that is not divisible by four, but the only part I don't understand from the brief time devoted to it is the bit before the rep-part, so it must be there.