Statistics
Class FixedBinDataSource

java.lang.Object
  extended byStatistics.FixedBinDataSource
All Implemented Interfaces:
jas.hist.DataSource, jas.hist.HasStyle, jas.hist.Rebinnable1DHistogramData
Direct Known Subclasses:
RandomDataSource

public abstract class FixedBinDataSource
extends java.lang.Object
implements jas.hist.Rebinnable1DHistogramData, jas.hist.HasStyle

A non rebinnable container for data implementing the interface jas.hist.Rebinnable1DHistogramData defined in the JAS library. Class reports progress on filling the bins to a progress bar displayed in a JFrame window (updated 200 times). Thus this process takes 200 times as long as it taks for the window to pop up.

This class provides the link to the histogram plotting capabilities of the JAS library. Instances are intended to be passed as parameters to the constructor of the class jas.hist.JASHist to be able to use the JAS plot widget to plot histograms of the data set. Simple histogram plotting is provided.

Binning data: Partition the interval [min,max) into nBins subintervals I_j (left closed, right open) of equal length (bin width). Data values are retrieved only by the constructor using the method nextSampleValue() (which you define either in the body of the constructor call or by subclassing).

If the data value x is observed the index j with $x\in I_j$ is determined and the bin count binHeight[j] increased by one. After all the data have been retrieved and the bin counts (and hence the empirical bin probabilities established) further operations such as smoothing or normalization can be performed on the binHeights.

The qualifier nonrebinnable means that the JAS plot widget cannot determine the number of bins displayed and must use the number nBins supplied to the constructor. In consequence we do not have to store the data themselves which saves memory.

Initialization: Binning the data is done by the initBinCounts methods and requires a definition of the method nextSampleValue. Occasionally a subclass must initialize fields before this method is defined. Such a subclass will have to call a superclass constructors which does not bin the data, perform the necessary subclass initializations and then call super.initBinCounts to bin the data.

Memory: If the parameters min, max are handed to the constructor the data do not have to be stored at any time. If these parameters have to be computed from the data sample this sample is stored in an array during construction. But the array is garbage collected once the constructor call completes.


Field Summary
 
Fields inherited from interface jas.hist.DataSource
DATE, DELTATIME, DOUBLE, INTEGER, STRING
 
Constructor Summary
FixedBinDataSource(double full_Init, java.lang.String name, java.lang.String xAxisLabel, int nSamples, int nBins, boolean smoothBinHeights, boolean normalizeArea)
          Full initialization.
FixedBinDataSource(java.lang.String name, java.lang.String xAxisLabel, int nSamples, int nBins, boolean smoothBinHeights, boolean normalizeArea)
          No data binning.
FixedBinDataSource(java.lang.String name, java.lang.String xAxisLabel, int nSamples, int nBins, boolean smoothBinHeights, boolean normalizeArea, double min, double max)
          Full initialization.
FixedBinDataSource(java.lang.String name, java.lang.String xAxisLabel, int nSamples, int nBins, boolean smoothBinHeights, boolean normalizeArea, double p, double q, double dummy)
          Full initialization.
 
Method Summary
 void displayHistogram()
          Displays histogram of data set in a JFrame window.
 void displayHistogram(java.lang.String filename, int filetype)
          Displays histogram of data set in a JFrame window and saves the histogram.
 double[] get_data()
          The array containing the data in increasing order.
 boolean get_normalizeArea()
          Flag set in constructor indicating wether the area under the histogram will be normalized to one.
 int get_nSamples()
          Current size of data sample.
 boolean get_smoothBinHeights()
          Flag set in constructor indicating wether binHeights are smoothed before histogramming.
 java.lang.String[] getAxisLabels()
          Label displayed on histogram x-axis
 int getAxisType()
           
 int getBins()
          Number of bins when plotting histograms.
 double getMax()
          Maximum of the data range to be binned.
 double getMin()
          Minimum of the data range to be binned.
 boolean getProgressIsReported()
          Flag set in constructor indicating wether a progress bar reports progress on filling the bins.
 jas.hist.JASHistStyle getStyle()
          Communicates the histogram style to the JAS plot widget.
 java.lang.String getTitle()
          Title displayed on histogram
 jas.hist.JASHist histogram()
          Returns a histogram of type jas.hist.JASHist of the data set.
protected  void initBinCounts()
          Bins a sample of size nSample in nBin bins.
protected  void initBinCounts(boolean all)
          Bins a sample of size nSample in nBin bins.
protected  void initBinCounts(double p, double q)
          Bins a sample of size nSample in nBin bins.
 boolean isRebinnable()
          false, communicates to the JAS plot widget that current number of bins must be used.
static void main(java.lang.String[] args)
          Test program.
abstract  double nextSampleValue()
          The next value from the data sample
 double[][] rebin(int bins, double min, double max, boolean wantErrors, boolean hurry)
          Returns the array of bin heights.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FixedBinDataSource

public FixedBinDataSource(java.lang.String name,
                          java.lang.String xAxisLabel,
                          int nSamples,
                          int nBins,
                          boolean smoothBinHeights,
                          boolean normalizeArea)

No data binning. Fields min, max not initialized, array binHeights allocated but not initialized.

Parameters:
name - title shown on histogram (descriptive of data set).
xAxisLabel - label shown on histogram x-axis.
nSamples - size of the data sample.
nBins - number of bins.
smoothBinHeights - see smoothing.
normalizeArea - see normalizing.

FixedBinDataSource

public FixedBinDataSource(double full_Init,
                          java.lang.String name,
                          java.lang.String xAxisLabel,
                          int nSamples,
                          int nBins,
                          boolean smoothBinHeights,
                          boolean normalizeArea)
Full initialization. Minimum/maximum of data range binned are the actual minimum/maximum of the sample. This needs memory.

Parameters:
full_Init - dummy variable to differentiate parameter signature. Assign any value.
name - title shown on histogram (descriptive of data set).
xAxisLabel - label shown on histogram x-axis.
nSamples - size of the data sample.
nBins - number of bins.
smoothBinHeights - see smoothing.
normalizeArea - see normalizing.

FixedBinDataSource

public FixedBinDataSource(java.lang.String name,
                          java.lang.String xAxisLabel,
                          int nSamples,
                          int nBins,
                          boolean smoothBinHeights,
                          boolean normalizeArea,
                          double min,
                          double max)

Full initialization. Minimum/maximum of data range to be binned are are user defined. Data values outside [min,max] are disregarded. This needs very little memory.

Parameters:
name - title shown on histogram (descriptive of data set).
xAxisLabel - label shown on histogram x-axis.
nSamples - size of the data sample.
nBins - number of bins.
smoothBinHeights - see smoothing.
normalizeArea - see normalizing.
min - only data in range [min,max] binned.
max - only data in range [min,max] binned.

FixedBinDataSource

public FixedBinDataSource(java.lang.String name,
                          java.lang.String xAxisLabel,
                          int nSamples,
                          int nBins,
                          boolean smoothBinHeights,
                          boolean normalizeArea,
                          double p,
                          double q,
                          double dummy)

Full initialization. Minimum/maximum of data range histogrammed are chosen so that the 100p% smallest and 100q% largest (0< p,q< 1) data values are disregarded. This needs memory.

Note: the largest and smallest data value are discarded always.

Parameters:
name - title shown on histogram (descriptive of data set).
xAxisLabel - label shown on histogram x-axis.
nSamples - size of the data sample.
nBins - number of bins.
smoothBinHeights - see smoothing.
normalizeArea - see normalizing.
p - smallest 100p% (0< p< 1) of values disregarded in histograms.
q - largest 100q% (0< q< 1) of values disregarded in histograms.
dummy - parameter used to differentiate parameter signature. Assign any value.
Method Detail

get_nSamples

public int get_nSamples()
Current size of data sample.


get_data

public double[] get_data()
The array containing the data in increasing order.


get_smoothBinHeights

public boolean get_smoothBinHeights()

Flag set in constructor indicating wether binHeights are smoothed before histogramming.

Smoothing: Neighboring binHeights are averaged. Smoothing destroys information about point masses. Use only if distribution is known to be absolutely continuous (ie. to have a density). Smoothing can distort the shape of the histogram if the number of bins is too small. Therefore only histograms with at least 100 bins are smoothed. Smoothing has no effect on histograms with fewer bins.

Intended use: eliminate raggedness from the shape of the histogram.


getProgressIsReported

public boolean getProgressIsReported()

Flag set in constructor indicating wether a progress bar reports progress on filling the bins.


get_normalizeArea

public boolean get_normalizeArea()

Flag set in constructor indicating wether the area under the histogram will be normalized to one.

Normalizing: Bin heights are adjusted to make the area under the histogram equal to one (mass of a probability distribution).


initBinCounts

protected void initBinCounts(boolean all)

Bins a sample of size nSample in nBin bins. Minimum amd maximum of the data range to be binned are the actual sample minimum amd maximum.

The fields nSamples,nBins must have been set and the array binHeights allocated, min, max are computed.

Parameters:
all - dummy variable to differentiate parameter signatures. Assign any value.

initBinCounts

protected void initBinCounts()

Bins a sample of size nSample in nBin bins. Minimum amd maximum of the data range to be binned are user defined.

The fields nSamples,nBins, min, max must have been set and the array binHeights allocated.


initBinCounts

protected void initBinCounts(double p,
                             double q)

Bins a sample of size nSample in nBin bins. Minimum amd maximum of the data range to be binned chosen by dropping the 100p% smallest and 100q% largest values (0< p,q< 1).

The fields nSamples,nBins must have been set and the array binHeights allocated, code>min, max are computed.

Parameters:
p - 100p% smallest values dropped (0< p< 1).
q - 100q% largest values dropped (0< q< 1).

nextSampleValue

public abstract double nextSampleValue()
The next value from the data sample


isRebinnable

public boolean isRebinnable()

false, communicates to the JAS plot widget that current number of bins must be used.

Specified by:
isRebinnable in interface jas.hist.Rebinnable1DHistogramData

getTitle

public java.lang.String getTitle()
Title displayed on histogram

Specified by:
getTitle in interface jas.hist.DataSource

getAxisLabels

public java.lang.String[] getAxisLabels()
Label displayed on histogram x-axis

Specified by:
getAxisLabels in interface jas.hist.Rebinnable1DHistogramData

getAxisType

public int getAxisType()
Specified by:
getAxisType in interface jas.hist.Rebinnable1DHistogramData

getBins

public int getBins()
Number of bins when plotting histograms.

Specified by:
getBins in interface jas.hist.Rebinnable1DHistogramData

getMax

public double getMax()
Maximum of the data range to be binned.

Specified by:
getMax in interface jas.hist.Rebinnable1DHistogramData

getMin

public double getMin()
Minimum of the data range to be binned.

Specified by:
getMin in interface jas.hist.Rebinnable1DHistogramData

rebin

public double[][] rebin(int bins,
                        double min,
                        double max,
                        boolean wantErrors,
                        boolean hurry)
Returns the array of bin heights. Parameters which in general indicate the new bin dimensions are irrelevant since this data source is not rebinnable (see isRebinnable()).

Specified by:
rebin in interface jas.hist.Rebinnable1DHistogramData

getStyle

public jas.hist.JASHistStyle getStyle()
Communicates the histogram style to the JAS plot widget. We use the default and turn off the error bars.

Specified by:
getStyle in interface jas.hist.HasStyle

histogram

public jas.hist.JASHist histogram()

Returns a histogram of type jas.hist.JASHist of the data set.


displayHistogram

public void displayHistogram()

Displays histogram of data set in a JFrame window.


displayHistogram

public void displayHistogram(java.lang.String filename,
                             int filetype)

Displays histogram of data set in a JFrame window and saves the histogram.

Parameters:
filename - filename path (all directories must exist already)
filetype - currently only Flag.EPS (the default).

main

public static void main(java.lang.String[] args)
Test program. Displays smoothed histogram of a standard normal sample of size 200,000 discarding the 0.01% largest and smallest values.