## Lecture 22 ## The IEEE floating point standard (notes)
## The IEEE floating point standard ## 1985 ## standardized representation format: single (32 bits), double (64 bits), extended (> 64 bits, in practice 80 bits) ## correctly rounded arithmetic
## Correctly rounded arithmetic ## The arithmetic operations +,-,*,/ on 2 floats x,y must return the float that is closest to the exact answer ## Possible to change rounding mode at the hardware level so that answers round up or down instead of to nearest, but this is not normally used and not easily accessible from Java
## Infinities and NaNs ## x/0.0 is inf when x>0, -inf when x<0 and NaN when x==0 ## inf/x is inf when x>=0, -inf when x<=0 ## inf and –inf are very different ## there are different representations for 0 and -0, but they test equal ## any operation with NaN gives NaN
## All Pentium PCs have 80 bit floating point registers where the arithmetic operations are executed (whereas Sun Sparc, Apple Power-PC, etc use 64 bit registers) ## On a PC, in the C language, “long double” uses the same 80-bit format. ## However, the Java language does not allow the use of the 80-bit format and, in Java 1.1, insisted that the operations be carried out as if they were being done in a 64-bit register ## In Java 1.1, the result was that floating point was very slow on a Pentium, as it required software modification to hardware results ## The keyword *strictfp *is used to insist on identical results on all machines, but this makes floating point very slow
**Do'stlaringiz bilan baham:** |