A Guide to EDV - The Exploratory Data Visualizer

Author: Graham Wills
Version: 1.0
Last Updated: May 28, 1997


1. Basics

EDV is a tool for visual exploration of data. It reads tables of data and creates views of any combination of table columns. In each of these views there is the concept of a selection - a subset of the data that has been singled out by the user as different, interesting, unusual or significant. By using a number of views to show different aspects of the data and interacting with these views, a better understanding of the data can be built.

Basic 1: Get the Data into the Program

Data files are text (ASCII) files consisting of one or more columns of data forming a matrix. The first two lines of the file are special. The first line contains names for each column, and the second describes the type of data in the column. Types can be abbreviated to their initial letter. The possible types are:

Although continuous and string data are very different, since plots of one often cannot be used for the other, the integer type can be used as if it were either. It is worthwhile taking some care to make sure variables are given the correct type.

White space (spaces, tabs, newlines) separate fields, with the one exception that a file consisting of one string variable uses newline only as the separator, allowing lines of text to be read in as a variable.

Here is a sample input file:

        Name       Age       Score
        string     int       c
        Graham_J   22        33.3
        Fred_S     88        11.0
        Joe_J      35        12
        ...        ..        ..

Multiple tables may be read into EDV, and multiple files may be merged to form one (or more) tables. All input files that have the same number of rows are considered to be part of the same table, and are effectively vertically pasted together.

Command Line

The EDV command line is:

edv [-D dictionary] [-I startup] table1 [table2 ...]

EDV may be launched with no parameters, and data tables may be read in interactively. Alternatively, one or more input data tables may be specified on the command line.

The "-D" parameter may be used to give the name of a file containing a description of the columns in the table(s). If a dictionary is given and the "Dictionary View" is open (see below), then selection of a variable in the "Panel" (see below) causes the description of that variable to be scrolled to in the dictionary file. The first line of the dictionary file should be a regular expression pattern into which the variable name will be substituted in place of a "%s" token; this pattern will be used to search the dictionary file.

The "-I" parameter specifies a file containing textual commands that will be read and executed on startup, before the user has interactive access to the program. The scripting language is described below.

Initial Window

The initial window that appears when EDV starts is the "Panel"; it shows all variables as rectangles. The analysis proceeds by selecting variables and applying transforms or views to the selected variables.

Basic 2: Selecting Variables

Use the left mouse button and <CTRL>-right mouse button (or middle mouse button under Unix) to select variables. A variable selected with the left button turns grey and is termed an X variable. One selected with <CTRL>-right mouse button turns yellow and is termed a Y variable.

Some variables may have a type of "E"; such variables contain no data, and can not be used in views or computation.

Basic 3: Making views

Select the variables of interest and make views of them. A view is created for the selected variables by going to the pull-down "View" menu, or by depressing the right mouse button to display a pop-up menu of available views. If you make an invalid choice, the program will tell you what variables are required for the given plot.

You can make multiple X and/or Y variable selections. Some views will accept multiple variables, and display them all within its view. Most, however, use a fixed number of variables; when you create such a view, the extra variables will be used to create multiple instances of the view all at once as a convenience. In either case, variables are used in the order you selected them: i.e., the first variable you selected will be first, the second used second, etc.

The actions for the mouse buttons are:

Basic 4: Selection

In EDV, the concept of selection state is import. It is the main mechanism used to tie views together. Each table has an implicit extra variable - the selection state. This state can be modified by using selection tools within any of the views, and also by some menu commands. There are three possible selection states: normal, deleted, or selected.

The selection state applies to a row in the data table. So if we are analyzing baseball data and we use a histogram of the salaries to select highly paid players, a bar chart of player's positions will immediately update to show what positions the selected players play at. The selection of rows made in the histogram affects what is selected in every other view.

The topmost figure to the left shows this situation. But looking at the display we see that there are too many 'unusual' positions and decide to remove these from the display. In the second figure we select the most common positions and then choose the pop-up menu choice Delete unselected to produce the third figure. The histogram of salaries (not shown) also updates because of the deleted players, and we can select the high-paid players from that to arrive at figure four.

The selection states are:

Some views (Nicheworks, Scatterplot, Values List) have the option not to show hidden items - i.e. to show only the selected items. Although this breaks the general paradigm, it is often necessary for large, visually complex data sets.


2. Working with the Panel

Icons representing variables can be selected and re-arranged. They can be dragged onto a slot in a component to modify that component. This will be explained in the section on components.

The variable icon gives information about that variable. The variable icons are colored to indicate which table they belong to. An attached type box that shows their type and indicates if they are being used as a color code for that table (by coloring the box with a rainbow shading of colors).

Variables selected using the left mouse button are indicated by a grey-filled box and are used as X variables for views. Those selected using the zoom mouse button are Y variables, and are shown as highlighted boxes (usually yellow). The right mouse button brings up a copy of the views menu for convenience.

Panel Menus

File menu

Read EDV File
Reads a specially formatted EDV data file from the disk. This file must be written with the "Write EDV file" menu item.
Write EDV file
Writes the current data set to disk as an EDV data file. The variables are written together with linking, selection, color and aggregation information in an efficiently-read binary format.
Import Text File
Reads a text data file from disk. This file must have the two identifying lines described in "Basic 1: Get the Data into the Program".
Export Text file
Writes only the selected variables to a file. The variables will be written with names and types so that the file is a valid text input file.
Tidy Icons
Repositions icons into regular rows and columns.
Sort Icons
Orders the icons in the panel by which table they are part of and by name.
Shorten Names
Makes names shorter according to a heuristic that tries to keep the name legible.
Show Types
Toggle display of the type code on the display.
Quit
Terminates program. If you have added new variables, it will prompt you to save the file as an EDV data file before it quits.

Compute menu

Each of the transforms works on all applicable combinations of variables. If you X-select Years and Age and Y-select Score and Salary, then the menu choice 'Y/X' gives you four variables Score/Years, Score/Age, Salary/Years and Salary/Age.

Root(X), Ln(1+X), 1/X, X^2, e^X
Transforms each non-string X variable.
Rank(X)
Creates a rank ordering for the selected variables. Tied values share a rank.
Order(X)
Creates an order on X where ties are broken randomly.
Index(X)
Creates a 1,2,3,.. variable indexing the selected variables(s). This can be useful if variables represent an implicit series.
Date to Days
For selected X variable(s) that are integers or strings in one of the following formats: YYMMDD, MMDD, month/day/year, month/day (where the separator '/' can be any non-numeric character), it converts the values into an integer giving the number of days sine January 1 of the first year noted in the variable. The newly created variable is named with that year, if such a year was found.
Y+X, Y-X, Y*X, Y/X
Performs the selected simple transformation on each pair of non-string variables
Catenate
Creates a string variable consisting of the selected variable's values pasted together. Useful for labeling.
Niche Variables
For all X-selected variables, this option uses a robust estimator of the correlation between them and produces variables that would allow the NicheWorks view to be used. 'Variables' is the node names, 'V1' and 'V2' the links and 'Association' the link strengths. This command is useful for finding similarities between variables.
Aggregate
Gives access to the aggregation functionality described below.
Delete
Delete the X-selected variables on the panel.

Aggregate menu

Aggregate over
X-Select a non-numeric variable representing IDs. Optionally another numeric variable can be Y-selected to represent time. This menu option will aggregate over IDs and time to create a new table. See the section on Aggregation for more details.

For the following commands, select a time variable:
Calc Start Times
For each ID, calculate the first occurrence in time.
Calc End Times
For each ID, calculate the last occurrence in time.
Calc Counts
For each ID, calculate the number of times it was present.

For the following commands select numeric variable(s) where a time variable has been created on their table:
Calc Averages
For each ID, calculate the average value.
Calc Minimums
For each ID, calculate the minimum value.
Calc Maximums
For each ID, calculate the maximum value.

Options menu

Use Variable for Color
The X-selected variable is used to color data points in each view in which they are visible as separate entities (e.g., scatterplots, part of parabox, NicheWorks). Coloring only affects views on data items in the same table as this variable.
Zoom Right Button
(Not shown; available on Windows only.) When selected, swaps the meaning of "zoom mouse button" and "right mouse button," so that <CTRL>-right button brings up a view-specific menu and the right button alone pans and zooms a view.
Continuous Indicate
If enabled, EDV will continually highlight the data item the pointer is over, as it moves across views. If disabled, the <SHIFT> key must be held down while moving the pointer to see the highlighting.
Selector
This submenu allows different selection tools to be chosen. These tools provide different ways of selecting sets of graphics: by enclosing a rectangular area, by surrounding with an irregular line, by encircling, or brushing with a rectangular or circular brush.
Selector Options
Creates a set of buttons for setting how selection works. These options control how new selections interact with previous selections: replace, add to, subtract from, intersect with, or invert them.
Basic Colors
This submenu allows different color schemes to be chosen. These schemes control the overall look of EDV: black background, light background, gray background, bright labels, or dim labels.
Overlay Color
This submenu allows the color of the overlay (indicator data displayed on top of a view) to be selected. Options are yellow, white, red, green, or black.
Color scale
Choose the color scale used by "Use Variable for Color". The scales available are a fully saturated RGB rainbow, pastel RGB rainbow, intensity-equalized RGB rainbow, gray scale, red-white thermal scale, and custom RGB or HSV scales. For the custom choices, the user should enter two color names, indicating the end points of the range to be generated.
Link Variables
By X-selecting two similar variables from different matrices and choosing this option, the user links the two matrices so that a selection in one matrix causes a selection in the other. See the later selection on inter-matrix linking for details.
Kill Links
Destroys all inter-matrix links.
Use Links
Toggles inter-matrix linking on and off.
Join
Join two tables that on the two keys that are x-selected. Joining adds all of the variables in the smaller table to the first without totally duplicating the storage. This is useful if a larger table contains keys and a second table contains details about the keys (e.g., an employees table contains "school" as a field, and a second table gives details about each school). Joining makes the data available in the larger data for analysis without physically duplicating the contents of the fields. Fields to be joined over must be categorical and must be the same type. New joined fields will appear at the end of the fields in the panel; if there is an intervening table, use "Panel->Sort Icons" to order the variables by table. Variable created by "Join" may be removed by "Compute->Delete".
Save Prefs
Saves above settings for how EDV work, and settings for any opened views. For more information see "user preferences" in section 5 below.

View menu

Bar Chart
Creates bar-charts of integer or string X variables.
Histogram
Creates a smoothed histogram of continuous or integer X variables.
Rose Diagram
Creates a radial histogram for time-periodic data, continuous or integer, that have been X-selected. It estimates the period of the data, trying units of minutes, days, hours, etc. It is a good idea to check the view and modify the period if necessary.
Scatterplot
Creates scatterplots of X variables against Y variables. It will index string variables and plot the index against the other variable.
Table
Creates a categorical table for each combination of X and Y non-continuous variables.
Triplot
Creates a scatterplot variant showing the contribution from three X-selected variables.
Scatter Matrix
Creates a scatterplot matrix for all X-selected variables, consisting of all pairs of variables.
Parabox
Creates a set of boxplots (bubble-plots for string variables) for all X-selected variables within one view on which parallel coordinates can be overlaid.
Values List
Creates a spreadsheet-like display for all selected X-variables
Nicheworks and
NicheWorks3D
Creates a NicheWorks graph view. Y select a variables for the nodes. X select "from" and "to" link variables.
MapViewand
MapView3D
(Experimental.) Creates a map view, showing data on a geographic region.
ProfileView
(Experimental.)
Dictionary
Display the variable description dictionary, as described here. This option is only available if the "-D" parameter was supplied when EDV was started.
Counts
Show summary statistics for selected attributes in a table.


3. The Views

Each view has two tasks:

Although there are exceptions, the general way views display data are:

The second task a view must perform is to allow the user to alter the selection using the view. The mechanisms for achieving this are as follows:

In the following sections, the details of each view are explained.

Bar Chart

A bar plot of the unique values of the X-selected item. The heights of bars indicates the number in each category. If another population is selected in another view, the subset of the bars corresponding to this other selection is shown as a highlighted sub-bar chart. If possible, color codes are used for the selected portions of a bar. If there are different colored items in the same bar, the entire plot is drawn using the default highlight color.

Left mouse button is used for selection
Zoom mouse button not used
Right mouse button brings up the menu:

Horizontal
If checked, the bars are aligned along a horizontal axis as opposed to a vertical axis.
Spineplot
Toggles between the bar chart and the spineplot. In the spineplot, the bars widths (rather than heights) represent the counts and the height of the selected area represents the percentage of the bar selected.
Order
Gives the user a choice of which order the bars should be drawn in:
Animate
Starts or stops animation of the bar chart.

A bar chart is animated by successively selecting the data in the each bar across the chart, running from left to right and then wrapping back to the left-most bar. The selection of each bar is reflected in any other active views, animating them along with the bar chart. Animation respects the current primary selector mode (see "Selection options" in section 5 below), thus "replace" causes single bars to be selected, "add" causes each selection to add to the already selected bars, "subtract" removes bars from the current selection, etc.

The following actions can be performed on a bar chart animation:

Slower. (Press the < or + key.) Decrements the animation rate
Faster. (Press the > or - key.) Increments the animation rate
Normal. (Press the = or @ key.) Resets animation rate to the startup value
Step Forward. (Press the SPACE or TAB key.) Moves to the next bar in the animation; pauses the animation if it is currently running
Step Backward. (Press the BACKSPACE key.) Moves to the next bar in the animation; pauses the animation if it is currently running
Toggle. (Press the t or x key.) Toggles the animation state between paused and running
Pause. (Make any data selection with the mouse.) Stop the running animation

Histogram

A smoothed histogram of the variable is shown. The degree of smoothing can be controlled with the '+' and '-' keys for more or less smoothing. The 'home' key sets the display to show all the data. The smoothing is based on a kernel smooth using a standard kernel. The default smoothing is chosen via a simple heuristic.

Left mouse button is used for selection
Zoom mouse button not used
Right mouse button brings up the menu:

Linear, root, log
Shows the data on a transformed X axis.

Rose Diagram

Creates a histogram wrapped around a circle. The area of the bar is proportional to the number of counts in it. The data is assumed to have some periodicity, which can be chosen via a menu. The default periodicity is chosen via a simple heuristic. For example, a variable stating the start time of an occurrence measured in hours could be displayed with a period of 1, which would show hourly effects or with period of 24, which would show daily effects.

Left mouse button is used for selection
Zoom mouse button not used
Right mouse button brings up the menu:

Toggle labels
Turns labels on and off.
Toggle as percent
In 'as percent' mode, the height of the selected area shows the percentage of the category that is selected.
First bar at top, ... at Right
Whether to start the display from 12 o'clock or 3 o'clock position.
... inner radius
Sets the radius of the inner circle. With no inner circle the smaller values are visually more striking; with large inner radius, the heights of the bars are nearly proportional to their areas and so it looks more like a wrapped histogram.
Period is ...
Redraws the view with the periodicity of the data set to the value chosen.

Scatterplot

Scatterplot of X variable versus Y variable. If one or more of the variables is a string variable, the string variable is converted to an integer representing each of the distinct string values. Unselected items are drawn as open circles, selected values are drawn as filled circles of either the default highlight color or the color used to code the data.

Left mouse button is used for selection
Zoom mouse button is used for zooming. Click and drag to pan; shift-click and drag up/down to zoom
Right mouse button brings up a menu, either for an axis (if clicked near the axis) or for the whole plot. The available menu choices are:

Show unselected
Whether to hide points that are not selected.
Highlighted size..
Gives a choice as to how much larger selected points should be drawn.
Linear X/Y, Root X/Y, Log X/Y
How to transform X or Y axis.
Jitter X/Y, Unjitter X/Y
For data that is discretized, overplotting obscures points. Jittering the data allows distinct values to be seen. Jittering is done to the view scale, not the data scale, so that zooming way out and jittering will jitter the data more than zooming in and jittering.
Larger Points
Increases the points' radii.
Smaller Points
Decreases the points' radii.
No/Linear/Local Trend
Shows/hides trend lines for both the data and the selected subset. The linear trend is a regression line. The local smooth uses a local kernel smooth for trending.

TriPlot

This creates a variant on a scatterplot known as a triplot. It uses three X variables with the underlying assumption that the three can be added together to form a meaningful quantity - which ideally should be a constant. The classic example is of percentages of votes in an election going to one of three parties. This will sum to 100%. Another example is dollars spent on local calls, toll calls, or international calls. The three would sum to a certain amount, which can then be divided up into percentages to each category.

The position of a point in the triplot shows what those percentages are. The closer a point is to a corner, the higher the percentage of that variable is.

The mouse functionality and menus are similar to the scatterplot's, with the following menu additions:

Auto Scale
When the variables are to different scales, use this option to normalize the values before calculating percentages.
Show Grid
Displays a helpful grid showing percentage contour lines.

Scatter Matrix

A scatterplot matrix is, as its name suggests a, a set of scaterplots laid out in a square matrix. The rows and columns of the matrix each consist of the variables X-selected when creating the matrix. In each cell is a scatterplot of the row variable against the column variable. The cells on the diagonal are not used (as a scatterplot of a variable against itself would be boring).

Zooming and panning within one matrix cause zooming and panning in other panels so that individual data points remain aligned within the scatterplot matrix.

Left mouse button is used for selection. Note that the plot initially clicked on is the only one used for selection. Dragging into another plot will have no effect. This means that you don't have to worry about brushing other plots by accident.
Zoom mouse button is used for zooming. Click and drag to pan; shift-click and drag to zoom. Use the 'Home' key to get back to the original view.
Right mouse button brings up a menu. Note that clicking on the diagonal boxes gives two extra menu commands which can be used to re-order or edit the matrix rows and columns. Here are the available menu choices:

Show unselected
Whether to hide points that are not selected.
Highlighted size..
Gives a choice as to how much larger selected points should be drawn.
Larger Points
Increases the points' radii for all plots.
Smaller Points
Decreases the points' radii for all plots.
Remove 'XXX'
Drop the variable from the matrix.
Place 'XXX' after ..
Re-order the variables by placing the named variable after another one in both the row and column order.

Table

Creates a categorical table of X vs. Y. Each unique value of X defines a row in the table and each unique value of Y defines a column. The data elements are sorted into the cell into which they should be placed. The view then displays a representation of the number of items binned into each cell, with information on how many of those are selected also available.

Touching a glyph with the mouse displays the indicated cell name and counts.
Left mouse button is used for selection
Zoom mouse button not used
Right mouse button brings up the menu:

Show Grid
Shows/Hides a background grid.
Display as
A submenu for choices on how to display cells:
Column Order, Row Order
Choices for column and row order:

Parabox

For each variable in this view, either a boxplot or a bubbleplot is shown. A boxplot summarizes integer and continuous data: the height of the plot corresponds to the max-min range of the data, with the max value being mapped to the top of the view. The inner grey box shows where 50% of the data lies (the top of the inner box shows the 75% quartile, the bottom the 25% quartile). The median is drawn across the box between them. The outer grey box shows where 95% of the data is expected to lie. Anything outside is deemed an outlier. Bubble-plots show categorical (string) data as bubbles, the area of which is proportional to the count of cases in that category.

Sub-boxplots and sub-bubbleplots can be shown for the selected data. Also outliers can be hidden.

Parallel coordinates can be superimposed on the view. For each selected case, a line is drawn between values for that case of each consecutive variable. By comparing traces of selected variables, you can see the similarities and differences between groups of outliers.

Left mouse button is used for selection
Zoom mouse button not used
Right mouse button brings up a menu:

Parallel Axes
Shows parallel axes lines for selected data.
Boxplots
Shows boxplots for selected data.
Outliers
Shows outliers.
Same Scale
Uses the same data range for each variable.

Values list

Creates a spreadsheet-like view of the selected variables allowing individual values to be seen. The user can manipulate this list and change the order of items by sorting on variables in the list. Searches can also be performed on a selected column, with rows matching the search selected according to the current selection criteria.

Zooming in and out changes the font size. When the size gets too small to read, the cells turn into bars, the length of which encodes either the length of the text (in characters) or the value of the variable (from min to max as a percentage).

Left mouse button is used for selection.
Zoom mouse button for zoom/pan. Zooming changes the font as well as line separation and when the font gets too small, maps numbers to line lengths and strings to their actual lengths.
Right mouse button brings up a menu:

Sort by Column
Sorts rows using data in this column, ascending. The sort is stable, so sorting by one column then another does a 2 key sort.
Reverse Sort by Column
Sorts rows using data in this column, descending.
Original Order
Puts rows back to original order.
Find In Column
Selects items matching a search criteria. This selection respects the current default selection operation.
Define command
(Available on Unix only.) Supplies a command used to invoke a text editor. A "%s" in the command is replaced with the file to be edited.
editor command
(Available on Unix only.) Invokes a text editor on the value list data.
Fit All
Zooms so that all items are displayed.
One Line/Pixel
Zooms so that each data item occupies one pixel height.
Gray unselected
If true, unselected items are shown in gray.
Indent text
If true, initial indentation is shown for text.

The zoom bar on the right side of the window acts like a scroll bar, with the addition that the thumb wheel sized can be adjusted to zoom the values view. This zoom bar is a visible interface to the panning/zooming provided by the zoom mouse button.

Counts


This view displays a table of summary statistics for x-selected variables in a table. For each statistic, the result over the entire variable (excluding deleted items) and the result over the currently selected subset are both shown. Zero values are omitted. Statistics are:

Count
Number of items (rows) in the attribute. This is the same for all attributes in the table.
Deleted
Deleted items.
Sum
For numeric attributes, the sum of all values.
Mean
For numeric attributes, the arithmetic average.
Std Dev
For numeric attributes, standard deviation.
Unique
For string attributes, the number of unique strings.
Frequent
For string attributes, the most frequently occuring string.
Max
For numeric attributes, the maximum value.
Min
For numeric attributes, the minimum value.
Range
For numeric attributes, the range between the maximum and minimum values.


4. Components

Components are visualizations with more complicated data needs than the data views described in the previous section. They are generally large, complex data views. Creating a component creates both a view and a component panel in the control panel. By dragging variables to and from the component panel slots, component views are modified and updated. Deleting a view deletes its associated panel component.

Nicheworks

NicheWorks is a graph display consisting of nodes and links. The NicheWorks component provides placement of this graph to bring connected nodes closer together, and allows the user to interact with the graph to find interesting sub-networks. The component panel contains slots for controlling display and labeling of nodes and links and for selecting the variable to be used for the link weights for the positioning algorithm. To create a NicheWorks display, use the zoom mouse button to Y-select a variable consisting of a list of nodes, and X select two variables from another table that will represent the links (the values of the variables must be a subset of the nodes' values)

The NicheWorks display can be interacted with using the mouse as follows:
Left mouse button by default is used to label/unlabel nodes, but can also be used for selection of nodes. Labelling respects the current selection mode
Zoom mouse button for zoom/pan and rotate. Zoom and pan as are for all views; the view may be rotated by panning vertically or horizontally after starting in any of the corners of the window. Rotation is in the direction of your dragging.
Right mouse button brings up the standard view selection menu

The '+' and '-' keys decrease and increase node and link sizes simultaneously.

The Nicheworks component has its own set of menus:

Actions

Read Positions..
Read node locations from file. The file should be a list of pairs of x and y coordinates, to any scale.
Write Positions..
Writes a file of the above form for the current positions.
Postscript..
Writes a postscript representation of the view.
Node Statistics calculats statsitics on nodes of the graph
Allows the user to create descriptive statistics on the nodes based on the graph information inherent in the links. The choices in this menu are:
Quick/Thorough Tests
These are subtle statistical tools for advanced users. Their purpose is to test statistically whether or not the selected nodes are statistically different from the unselected nodes. The test is based on connections to nodes of the other type, and is designed to test if the selected subset and the unselected subset are connected more or less than you would expect if there was no *real* difference between them. The results can be:

For directed graphs only, the separation is also split up by direction. Thus you can also get statistics for:

Both quick and thorough tests are exactly the same, they simply do more trials for one of the options. In each case the number (percentage) reported after the result is the statistical significance of the test.

The details for those really interested:

The program implements a Monte-Carlo style algorithm to test the interaction between the selected and unselected subsets. The null hypothesis is that there is no difference; the selected subset is a random labelling of the graph. The alternative hypothesis is that selected nodes are more or less likely to be connected to other labelled nodes than chance would suggest.

We test the hypothesis by generating a number of simulated random labellings and calculating the Monte Carlo statistic; we use the sum of link weights between similar (selected-selected and unselected-unselected) pairs as our statistic. In the case of directed graphs we also use selected->unselected and unselected->selected links, providing two other statistics to test. In the usual Monte Carlo fashion, we use the rank of the actual statistic in the set of simulated statistics to provide a confidence value for the test. Note that although the test is not exact, it is good for the thorough tests (399 simulations) at even the 1% level.

Tie Link Selection
When checked, this menu option links the node and link tables together so that highlighting nodes causes links between those nodes to be highlighted and highlighting links makes nodes at either end highlighted.
Show link if one node
Usually links are only seen (and highlighted) if both nodes are seen (highlighted). Under this option, if either end of a link is visible, the link will also be visible.
Mouse Select
The first button of the mouse is used for selection instead of labeling when this is checked.

Placement

The NicheWorks graph may be placed to show relationships between the nodes. Placement attempts to improve a potential function, which will improve as highly-attracted nodes are placed nearer. This potential function will use the "placement weight" statistic in the NicheWorks control panel if one is specified.

Each placement action is performed simultaneously on each connected component of the graph, then each component is moved into an overall configuration with the largest component in the center.

Place on Circle
Each component's nodes are placed on the perimeter of a circle.
Place on Hex Grid
Each component's nodes are place in a grid formation.
Place using Tree
Creates a maximal spanning tree within the graph and then places the nodes using a tree placement algorithm.
Random Swaps
Swaps nodes within a component at random.
Potential is
Choices for potential function. Terms with (1-dw) in them cause nodes that should be far apart to be so, terms with (w-1/d) cause nodes that should be close to be so. Squaring causes more emphasis on a term and exponential even more.
Fix Selection
The currently selected nodes are fixed and will not be moved by any positioning operation.
Unfix all
Make all nodes free to be positioned.
Swap for ..
Swaps nodes around if it will improve the potential of the overall fit, or if it may help escape a local minimum.
Move for ..
Uses a steepest descent algorithm to move nodes around to the best location.
Apart for ..
Ignores the potential and just moves close nodes a little further apart.

View

These operations let the user control the visual display of the graph and its selection state.

Select all
Selects all nodes and links.
Select one step
Selects all nodes one link away from current subset and all inter-connecting links.
Select component
Selects all nodes connected to current subset of nodes.
Select one step outgoing/incoming
Selects one step, but restricted either to nodes from or to the current selection.
Directed Links
Links are shown as arrows to indicate direction.
Gray unselected
Unselected nodes are drawn in gray.
Label all/off/selected
Label all, none or just the selected nodes.

Nicheworks3D

This component is the same as the Nicheworks component except that it presents the graph in 3D instead of 2D. Placement and most interactions are identical, except that the control panel for the component has an additional slot for "Node Height". Any node statistic may be dropped into this slot to control the height of the nodes in the graph.

The graph begins with the viewer looking "down" on the graph, so it still appears to be the same as the Nicheworks graph. The graph may be rotated to show the 3D effect; motion helps the eye perceive the 3D rendering of the scene.

The Nicheworks3D component has one additional menu to those of the Nicheworks component titled "3D" that controls the 3D rendering; it contains these options:

Extruded
The shape is drawn from its Z location "down" to the Z zero location, forming a "bar".
Shape
Select the shape to use for objects by default. These same shapes are selected in this order by the Style slot of the control panel for this component. The shapes are roughly in order of increasing rendering complexity; shapes lower in the list are slower to draw.
Filled
If on (the default), draw each glyph as a solid; otherwise it is rendered as a wire frame.
Home
Draw in the default location, with the observer looking "down" the Z axis toward the X/Y plane. The "Home" key is the equivalent of this menu item.
Standard View
Position the observer so that the scene is rotated and reduced from the Home view, providing a view of its three dimensional nature. The "End" key is equivalent to this menu item.
The scene may be manually zoomed, panned and rotated. These transforms are always done with the right mouse button; there is no popup menu, and the Option "Zoom Right Button" has no effect on this mapping currently. Panning and zooming are the same as all views:  drag with mouse button depressed to pan, drag up/down to zoom in/out. Dragging with CTRL depressed rotates the scene.

3D rendering works best in "True Color" display modes (providing 24 bit color), although it will work with color palettes as small as 256 colors. Colors in low resolution modes may be extremely distorted, however.


5. Miscellaneous

Aggregation

To use the aggregation facility, you need a table of repeated observations. One variable of the matrix should indicate which data items are to be aggregated. If you have a variable which indicates the order (time) of each observation, that can also be used. Using the "Aggregate Over" menu command creates a new data table that is an aggregated version of the original data.

This new table initially contains a new variable consisting of the unique IDs from the original table. Also in the new table is a time variable - a variable that cannot be used as a normal data variable, but that represents which parts of the original and new matrices are linked together. Finally the two tables are linked together by the respective IDs as described in below.

Once an aggregation has been performed, the program keeps track of the original table and marks it as one for which there is a time variable, making each variable eligible for time series view creation and aggregation commands.

The user can use menu commands from the aggregation menu to create counts of how many data items are mapped to a given ID, the start and end times of such series, and also summary information by ID - such as the average value of a variable averaged over the series.

Here's an example:

        ID      Time    V1      V2        A       1       2       2
        B       1       3       5
        B       3       4       6
        A       3       7       7
        B       2       3       6

Aggregating this by ID and using the Time variable as time, we get:

        ID      TimeLink        A       1,4
        B       2,5,3

Where the TimeLink variable numbers refer to the line numbers of the original matrix. Note that if no time variable had been used in the aggregation, the series for B would have read 2,3,5. If we calculated an average variable for V1 and V2, and counts for the TimeLink variable, the matrix would look like this:

        ID      TLink   Av(V1)  Av(V2)  Count        A       1,4     4.5     4,5     2
        B       2,5,3   3.333   5.667   3

Inter-matrix linking

By selecting two variables, both containing sets of the same IDs, and choosing the "Link Variables" menu option, the user states that selection operations within one matrix are propagated through to the other. This means that when at least one occurrence of an ID in one matrix is highlighted, all occurrences in the other matrix will be highlighted. EDV shows you that two variables have been linked by drawing a line between them in the panel.

A typical scenario is that a selection is made in one matrix that highlights a set of data items. The IDs of those highlighted items are compared to a the list in the other matrix, and any IDs matching this list are highlighted in that matrix.

In the aggregation example, selecting 'B' in the second matrix will select data rows 2, 3 and 5 in the first matrix, and selecting those cases with values of V1 > 5 will select row 4 of the first matrix which will select row 1 in the second matrix. Note that the selection is not allowed to propagate back to the original matrix.

Selection options

This window tells you how a selected subset is merged with the current selection. If you are not holding down the shift key, then the operation shown on a white background is performed; if you are holding the shift key down, the light gray operation is performed. Click or shift-click on the operations in the box to change the operations available. Here is a list of what the operations do:

Replace - Selected subset become the new selection
Toggle - The state of the selected subset is toggled
Add - Selected subset is added to the current selection
Subtract - Selected subset is removed from the current selection
Intersect - Only the subset that is in the current selection and in the selected subset remains selected.

User preferences

Between sessions, EDV can save a user's preferences for how EDV works and how views are displayed. These preferences include such things as the setting of the global "Continuous Indicate" option and "gray unselected" or "show unselected" in individual view menus.

When first started, EDV searches for the following files:

If either of these exist, its preferences are read in from there. If neither exists, it attempts to create the file $HOME/.edvprefs. EDV's current preferences are written out to this file only when the "Save Prefs" menu item is specifically chosen.

The preferences file is user-readable and editable. An example one looks like this:

NicheWorks:Directional = TRUE
NicheWorks:Mouse Selects = TRUE
NicheWorks:One Node needed = FALSE
NicheWorks:Show all = TRUE
NicheWorks:Tie Link Selection = FALSE
Panel:IconFont = 14
Panel:IconHeight = 19
Panel:IconWidth = 100
Panel:continuousIndicate = 0
Panel:labelled = 1
Parabox:Boxplots = TRUE
Parabox:Outliers = TRUE
Parabox:Parallel axes = FALSE
Parabox:Same Scale = FALSE
Scatter:Highlighted Size = +2
Scatter:Show Unselected = FALSE

The order of individual items in the file does not matter.

Scripting

EDV supports a textual scripting language that provides most of the capabilities of the interactive graphical user interface. This includes selecting attributes on the panel, creating views based on these selections, creating transforms, coloring by attributes, and selecting view options.

The scripting language mirrors the grahpical interface. For example, to create a new view, you select attributes on the panel and then create the view. Any menu item in the user interface may be selected, including those for the panel and those for views.

Each statement in the scripting language is a line containing tokens, separated by white space (space or tab). Case is ignored for everything except variable names. Tokens containing spaces must be enclosed in double quotes (").

The scripting language has these statements in it:

# string ... A comment; the line is ignored
menulabel [submenulabel] [submenulabel...] Send a menu selection (and/or a submenu label) to the panel. Any panel menu item may be so selected.
clear Clear all x/y selections of variables in the panel
close viewname [viewname...] Close the named view(s).
drag varname compname slotlabel "Drag" variable varname to the Component table for the component named compname and drop it in the slot labelled slotlabel
export filename Write the named text file. This allows the file name to be supplied; the menu option may also be selected using Panel "Export Text File", but this will prompt for the file. See also "import".
import filename Read the named text file. This allows the file name to be supplied; the menu option may also be selected using Panel "Import Text File", but this will prompt for the file. See also "export".
print filename string [string...] Append the string(s) to the named file.
readedv filename Read the named EDV format file. This allows the file name to be supplied; the menu option may also be selected using Panel "Read EDV File", but this will prompt for the file. See also "writeedv".
selections filename Append the selection state (selected/unselected/deleted) of all tables to the named file. This output is in a format suitable for restoring with the alternate form of this command, described below.
selections code { [state]count ... } Set the selection state of the table identified by code. (Code is the number of rows in the table.) Statements of this form are created by the previous form of this command; this is the best way to create them, since the state encoding is somewhat cryptic. The state is encoded as a series of states composed of a state identifier followed by a count for the number of occurances of that state. The state identifier must be an alphabetic character ("a", "b", or "c"), corresponding to deleted, visible, or selected. If the state character is omitted and their is just a count, than the state two back in the stream of state specifiers is used. (This default is used because states often toggle between two values.) If there are not enough specifications for all rows in the table, the state of the uncovered rows is set to deleted.
set label value Save the string "value" under the name "label". This label can be referenced as "$label", and the saved value will be substituted. This provides a simple name substitution facility.

The special label "$Result" is defined by the system, and yields the name of the last created variable. This is useful for referencing a variable created via a transform, where the created name may not be known.

system command Execute the command string using the system command processor (i.e., the shell).
vu viewname menulabel [submenulabel...] For the view named viewname, invoke the menu operation labelled menulabel (and possibly a submenu label as well)
vzout string ... Popup a message in a dialog box, requiring the user to dismiss the box. Multiple strings are concatenated.
winposition filename Append a line to the named file for each EDV view currently open. These lines are themselves "winposition" script commands in the next format that, when executed in a script, will position these windows in the current location. Thus to position windows in a script, set up the windows in the locations that you wish and execute this command in another script; this will save the window locations for you. Then use these lines in your script.
winposition name x y dx dy Position the existing named view to the given location and size. Locations and sizes are real numbers between 0.0 and 1.0; the actual window position and size is determined by taking the screen size and scaling by these amounts. This allows window positions to be used on machines with different screen sizes, and the windows will be placed with proportionally the correct placement and si