Is it possible to map a discontiuous data on disk to an array with python? -
i want map big fortran record (12g) on hard disk numpy array. (mapping instead of loading saving memory.)
the data stored in fortran record not continuous divided record markers. record structure "marker, data, marker, data,..., data, marker". length of data regions , markers known.
the length of data between markers not multiple of 4 bytes, otherwise can map each data region array.
the first marker can skipped setting offset in memmap, possible skip other markers , map data array?
apology possible ambiguous expression , solution or suggestion.
edited may 15
these fortran unformatted files. data stored in record (1024^3)*3 float32 array (12gb).
the record layout of variable-length records greater 2 gigabytes shown below:
(for details see here -> section [record types] -> [variable-length records].)
in case, except last one, each subrecord has length of 2147483639 bytes , separated 8 bytes (as see in figure above, end marker of previous subrecord , begin marker of following one, 8 bytes in total ) .
we can see first subrecord ends first 3 bytes of float number , second subrecord begins rest 1 byte 2147483639 mod 4 =3.
i posted answer because the example given here numpy.memmap
worked:
offset = 0 data1 = np.memmap('tmp', dtype='i', mode='r+', order='f', offset=0, shape=(size1)) offset += size1*byte_size data2 = np.memmap('tmp', dtype='i', mode='r+', order='f', offset=offset, shape=(size2)) offset += size1*byte_size data3 = np.memmap('tmp', dtype='i', mode='r+', order='f', offset=offset, shape=(size3))
for int32
byte_size=32/8
, int16
byte_size=16/8
, forth...
if sizes constant, can load data in 2d array like:
shape = (total_length/size,size) data = np.memmap('tmp', dtype='i', mode='r+', order='f', shape=shape)
you can change memmap
object long want. possible make arrays sharing same elements. in case changes made in 1 automatically updated in other.
other references:
Comments
Post a Comment