gretl native data write/read

The numbers below were generated using gretl CVS of 2014-01-07 on a simulated panel dataset with 60000 observations on 1200 variables. Around 70 percent of the rows contained nothing but missing values, simulating a seriously unbalanced panel. Below the table is one variant of the hansl script used.

Even the text write and read times shown below are much faster than those I posted here. That is primarily because of the faster mechanism for figuring out the optimal precision-preserving format mentioned in this posting.

The results

write(secs) ratio read(secs) ratio size(bytes) ratio ratio/max
Binary, with skip-padding
zip = 0 1.45 1.000 0.79 1.000 171228064 1.000 0.297
zip = 1 2.41 1.664 1.17 1.470 23394299 0.137 0.041
zip = 3 2.49 1.716 1.09 1.369 20941042 0.122 0.036
zip = 6 4.99 3.444 1.07 1.350 18934804 0.111 0.033
Binary, without skip-padding
zip = 0 0.54 1.000 0.17 1.000 576960000 1.000 1.000
zip = 1 3.67 6.828 1.45 8.513 58454875 0.101 0.101
zip = 3 4.15 7.725 1.15 6.770 46591059 0.081 0.081
zip = 6 11.24 20.925 1.10 6.493 39957550 0.069 0.069
Text, with skip-padding
zip = 0 7.99 1.000 2.07 1.000 75458165 1.000 0.131
zip = 1 8.75 1.095 2.37 1.146 26753320 0.355 0.046
zip = 3 9.77 1.223 2.36 1.141 24811108 0.329 0.043
zip = 6 12.46 1.560 2.35 1.138 22952497 0.304 0.040
Text, without skip-padding
zip = 0 8.73 1.000 3.73 1.000 228093263 1.000 0.395
zip = 1 10.06 1.153 4.18 1.123 29436342 0.129 0.051
zip = 3 10.82 1.240 4.20 1.126 27671605 0.121 0.048
zip = 6 13.57 1.555 4.19 1.126 24741012 0.108 0.043

The script

<hansl>
nulldata 60000
setobs 20 1.1 --stacked-time-series
set seed 786553
loop i=1..100 -q
  series x$i = normal()
endloop
loop i=1..500 -q
  series d$i = uniform() > 0.5
endloop
loop i=1..600 -q
  series k$i = uniform() > 0.3 + uniform() > 0.6
endloop
# make about 70% of rows into padding
series skip = uniform() > 0.3
scalar nv = $nvars - 1
loop i=1..nv -q
  string vname = varname(i)
  @vname = skip ? NA : @vname
endloop
set stopwatch
# set gdt parameters here
store big-panel.gdt --binary --gzipped=1
printf "store: %g secs\n", $stopwatch
clear
set stopwatch
open big-panel.gdt -q
printf "open: %g secs\n", $stopwatch
</hansl>

Allin Cottrell
2014-01-07