As far as I can tell, the code for determining calculating resistance due to the gradient is in simvehicle.cc line 1963, vehicle_t::calc_drag_coefficient(). This only takes into account the slope of the current tile, although it seems to update to the new gradient on a hill over two steps, and then takes a few steps to return to the base level after climbing a hill. If curve_friction_factor is nonzero, the behaviour doesn't look sensible
It does not do any averaging over surrounding tiles, which I think is necessary to simulate gentle slopes, such as are needed on a railway. Such averaging could be done either as an attribute of the way, or within the vehicle or convoy. The former approach seems better, as this would involve less computation, and would be consistent between all vehicles using the way.
I think a reasonable approach would be to assign a 'microheight' to each edge between adjoining way tiles. If this tile is an intersection, then this microhieght is forced to be the displayed height, and if the tile has an obstacle (another way tile or building) within two levels above or below it, then this microheight is forced to be at most or at least the displayed height. Otherwise, for each slope tile, search up to 2N tiles along the way (where N is the maximum distance in one direction to average over), until reaching another slope, an intersection, or a obstacle above/below the tile (depending on whether we're averaging a downhill or uphill slope). The linearly interpolate the microheight between the two end points (which are either the middle of a slope tile, with a height of +/- half the height difference per level relative to the level section of way; or the edge of an intersection or tile with an obstacle, with a height equal to the displayed height). This gives a height at each edge, which can be used to assign a gradient to each direction on each way tile.
(Note that there are some inconsistencies in what I write here that should not be included in an implementation, but the basic idea is correct.)
I'm not sure what the best value for N would be - that's probably something to determine partly by experimentation. However, given that shallow gradients are (I believe) currently 4% (uphill in the physics engine; downhill is less for some reason), a value of N=4 would give a minimum gradient of about 0.5% (+/- some imprecision in my definition of N). Does that sound like a reasonable gradient for an early railway? Perhaps N should be different for rails and roads, although that would create inconsistencies for street-running trams.
This does mean that building sloped ways will come with some implicit earthworks (in open terrain, but not in busy terrain). This could be taken into account in the cost of building sloped ways, although it could equal be decided that the land should never have been perflectly flat up to the slope in the first place, and we are just countering Simutrans' lack of infinitesimal variations in height and gradient.