Reading CSV Data

Introduction

A request has been made, by a combination of Ross, Curtis and Wendy Couch, to present CSV data (from the fieldsite) in an easier to read format, and preferably in a format that can be read in Matlab. The reminder of this page discusses:

Original format

The data was provided on a disk. Each measured variable has a value for every minute - except where there is missing data. Each file stores the data for one day and has two columns: time to the nearest minute and the measured value. The daily files are stored in the following directory structure: Chirag/<year>/<month>/<day>.

Conversion

I decided the best option was to store the data as in a netCDF file - which I've just called csc.nc. NetCDF data can be fairly easily retrieved in both IDL and Matlab with the required extensions (see notes on Matlab below for how). ASCII data would be significantly larger and we would have to carefull about how it was formatted.

Some of the variable names contain the left and right brackets, such as B_WD_Dirn (10min) True.csv, which causes all sort of problems trying to manipulate them in a unix environment. The only thing I was able to do with them is list them (ls), anything else, such as more, produced and input/output error.

One of the few options remaining was to read the data in with a Fortran program. This was done with the program /home/lem/data/csv/csv2netcdf.f. There was some problems with reading in the data from some individual files, but this went away when the disk was removed and put back into the computer.

Rather than storing each measurement in one large netCDF array, it is easier to identify when data was created if it is broken into time periods. After discussing this with Wendy, we decided to go with storing the arrays in the form (mins, hours, day of year, year). The dimension of day of year must run to 366 to allow for leap year, but the 366th element of this dimension will be empty for the other years.

Contents of file

Some of the measurements either had no values or all the values for zero and these variables were not including in the netCDF file. While it was clear what variables were needed now, it wasn't clear what variables might be needed in the future, so all the other variables have been included in csv.nc.

I didn't find a way in Matlab of listing the variables available, so probably easiest just to use ncdump from the command line, .i.e.

demo$ ncdump -h csv.nc
netcdf csv2 {
dimensions:
        MinsInHour = 60 ;
        HoursInDay = 24 ;
        DaysInLeapYear = 366 ;
        nYears = 6 ;
variables:
        int Month(nYears, DaysInLeapYear) ;
                Month:long_name = "Month of Year" ;
        int MinsSince2000(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
        float B_WD_Dirn_10min_True(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                B_WD_Dirn_10min_True:_FillValue = -999.9f ;
        float B_WD_Dirn_2min_raw(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                B_WD_Dirn_2min_raw:_FillValue = -999.9f ;
        float B_WD_Gust_10min(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                B_WD_Gust_10min:_FillValue = -999.9f ;
        float B_WD_Speed_10min(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                B_WD_Speed_10min:_FillValue = -999.9f ;
        float B_WD_Speed_2min(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                B_WD_Speed_2min:_FillValue = -999.9f ;
        float C_AT_Air_Temp_10min(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                C_AT_Air_Temp_10min:_FillValue = -999.9f ;
        float C_AT_Humidity_HumiCap(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                C_AT_Humidity_HumiCap:_FillValue = -999.9f ;
        float C_AT_QNH_10min(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                C_AT_QNH_10min:_FillValue = -999.9f ;
        float E_WV_Average_Height(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Average_Height:_FillValue = -999.9f ;
        float E_WV_Direction_Spread(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Direction_Spread:_FillValue = -999.9f ;
        float E_WV_H10(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_H10:_FillValue = -999.9f ;
        float E_WV_Max_Waveheight(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Max_Waveheight:_FillValue = -999.9f ;
        float E_WV_Mean_Mag_Direction(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Mean_Mag_Direction:_FillValue = -999.9f ;
        float E_WV_Mean_Wave_Period(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Mean_Wave_Period:_FillValue = -999.9f ;
        float E_WV_Peak_Period(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Peak_Period:_FillValue = -999.9f ;
        float E_WV_Sig_Waveheight(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Sig_Waveheight:_FillValue = -999.9f ;
        float E_WV_Significant_Period(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Significant_Period:_FillValue = -999.9f ;
        float E_WV_T10(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_T10:_FillValue = -999.9f ;
        float E_WV_Tp_5(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Tp_5:_FillValue = -999.9f ;
        float E_WV_Zero_Crossing(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                E_WV_Zero_Crossing:_FillValue = -999.9f ;
        float X_WV_AcqInt(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_AcqInt:_FillValue = -999.9f ;
        float X_WV_ActDur(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_ActDur:_FillValue = -999.9f ;
        float X_WV_ChargeI(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_ChargeI:_FillValue = -999.9f ;
        float X_WV_HMo(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_HMo:_FillValue = -999.9f ;
        float X_WV_MagVar(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_MagVar:_FillValue = -999.9f ;
        float X_WV_MeanTrueDir(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_MeanTrueDir:_FillValue = -999.9f ;
        float X_WV_MsgTimeUTC(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_MsgTimeUTC:_FillValue = -999.9f ;
        float X_WV_SampRate(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_SampRate:_FillValue = -999.9f ;
        float X_WV_Samples(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_Samples:_FillValue = -999.9f ;
        float X_WV_SeaTemp(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_SeaTemp:_FillValue = -999.9f ;
        float X_WV_SysVolts(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_SysVolts:_FillValue = -999.9f ;
        float X_WV_TxInt(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_TxInt:_FillValue = -999.9f ;
        float X_WV_TxPower(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_TxPower:_FillValue = -999.9f ;
        float X_WV_Tz(nYears, DaysInLeapYear, HoursInDay, MinsInHour) ;
                X_WV_Tz:_FillValue = -999.9f ;

// global attributes:
                :Start_date = "01/08/2003" ;
                :End_date = "18/06/2008" ;
}
All the missing values for the measurements are -999.9, so these will have be filtered out before plotting.

The time is given as minutes since the start of 2000, given by the variable MinsSince2000. It does produce some large numbers, but provides a time axis that can be used for plots.

Also, I've added a Date array. Given the day of year and the year it provides the day of month, month and year. For example if you want to know the date for the 190th day in 2004, which is the 2nd year of the data, it should be given by Date(1, 190, 2) / Date(2, 190, 2) / Date(3, 190, 3) - which should be 8/7/2004.

Matlab

I've hardly used Matlab before (I use IDL), so these notes are sketchy.

Setting up Matlab for reading netCDF file

The instructions are provided on IT@Met matlab page, on require:

demo$ setup matlab
demo$ matlab
>> addpath /opt/graphics/matlab_tools/netcdf_toolbox
>> ncstartup

To load in all the variables:

>> ncload('csv.nc')

Plot data

As far as I can tell, before the data can be plotted it needs to be stored in a 1D array. So for example to plot B_WD_Dirn_10min_True against the first 30 days of the first full year (2nd year, in this case 2004):

>> x=squeeze(reshape(MinsSince2000(:,:,1:30,2),[24*60*30,1]));
>> y=squeeze(reshape(B_WD_Dirn_10min_True(:,:,1:30,2),[24*60*30,1]));
>> ValidValues=find(y > -999);
>> x=(x-min(x))/(24*60);
>> plot(x(ValidValues),y(ValidValues))
where
  • I've used squeeze and reshape to get the data into 1D arrays (there is probably a better way - but I don't know it).
  • I've created an array which references all the values that aren't the fill value of -999.9, i.e. the values greater than -999.
  • I've converted the x axis into days since the start of the year by removing the time at the start of year and dividing by the minutes in a day.

Plotting one variable against another

For example to plot wind speed against wind direction for 3rd year.

>> x=squeeze(reshape(B_WD_Dirn_10min_True(:,:,:,3),[60*24*366,1]));
>> y=squeeze(reshape(B_WD_Speed_10min(:,:,:,3),[60*24*366,1]));
>> index=find(y > -999);
>> plot(x(index),y(index))

Setting the x-range

It's possible to set the x-range with the command XLim. For example if you want to find out where B_WD_Speed_10min is greater than say 20, it's probably easiest to consider each year at a time. In the example below we analyse data for the 3rd year (2005) and home in on part of the 134th day of 2005:

>> mins=squeeze(reshape(MinsSince2000(:,:,:,3),[60*24*366,1]));
>> y=squeeze(reshape(B_WD_Dirn_10min_True(:,:,:,3),[60*24*366,1]));
>> index=find(mins > 0);
>> mins_min=min(mins(index))
>> days=(mins - mins_min)/(24 * 60)+1;
>> index=find(y > -999);
>> plot(days(index),y(index));
>> XLim([125 150])
>> XLim([132 135])
>> XLim([134.5 134.54])
And 0.5 to 0.54 is roughly between 12:00 and 13:00, which on day 134 is 60 measurements given by B_WD_Dirn_10min_True(:,13,134,3).

Things to do now

Contact

Page navigation