DSCI 554 lecture 12

Study guide

Dr. Luciano Nocera

Final examination

Format:
  • MCQ similar to Quizzes
  • On content from lectures
  • 60 MCQ questions in 4 parts
  • When: FRIDAY, DECEMBER 10, 2-4 P.M.
Data and information visualization
  • Uses of data visualization
  • Infovis vs. scivis
  • Affordances and signifiers
  • Big data units (kilo-, mega-, giga-, tera-, peta-, exa-, zetta-, yotta- byte)
  • Designer encodes information, User decodes information
  • Information used by designers:
    • Forms adapted to nature of information
    • User familiarity with form
    • User knowledge of topic
    • User abilities
    • Display type and size
    • Context where the form is used
  • Guidelines for choosing visualization forms:
    • Form constrained by the goals of the visualization
    • Form follows function
  • Visualizations are means to achieve goals
  • Visualizations are devices that help an audience complete certain tasks
  • DIKW pyramid
  • Data: numerical, interval, ratio, categorical, nominal, ordinal, dichotomous
  • Named graphs and maps: scatterplot, scatterplot matrix (grid of scatter plots used to visualize bivariate relationships between combinations of variables), stripchart (1d scatterplot), bubble chart, bar chart, lollipop chart, coxcomb chart (stacked bar chart with radial layout), Marimekko chart (bar chart where the width encodes relative size, also called Mekko chart), waterfall chart, pie chart (stacked bar charts in polar coordinates, angle encodes proportion), donut chart, line chart, sparkline, slopegraph, parallel coordinates, radar chart (graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point, also called web, spider, star, cobweb, polar, or Kiviat chart), area chart, streamgraph (tacked area graph displaced around a central axis), dendrogram, Reingold–Tilford tree (hierarchical data as linked tree layout implemented using the Reingold-Tilford algorithm), treemap (hierarchical data as nested rectangles, area proportional to value), sunburst (hierarchical data as rings. center is root node. angles are equal or proportional to value), alluvial diagram (relations between multivariate data), sankey diagram (magnitude of flow between
  • nodes in a network), network graph (relationships as lines between entities as nodes), heat map (matrix values as colors), chord diagram (shows directed relationships among a group of entities in a matrix), word cloud, bubble cloud, circle packing (bubble cloud technique with hierarchical information as enclosing circles), time series plot (values ordered in time as a line chart), index chart (interactive line chart that shows percentage changes for a collection of time-series based on a selected index point), gantt chart (schedule with tasks layed out on time axis), timeline (events layed out on time axis), thematic maps (choropleth, proportional symbol map, dot map (can be used to locate each occurrence of a phenomenon, one-to-one or one-to-many), cartogram, isopleth), choropleth (areas are shaded or patterned in proportion to variable), proportional symbol map (scaled symbols show data for areas/locations, also called graduated symbol map), dot map, cartogram (area used to display value, distortion used to show continuous variables), isopleth or isarithmic (use contours to show continuous variables), topographic map (detailed quantitative representation of land relief using contour lines), nautical map (charts of maritime/coastal area), image based map (using satellite or aerial imagery), combo or combination chart (combination of multiple charts, can have multiple series and y-axes).
  • Statistical graphics: boxplot (shows quartiles, distribution skewness, tails, outliers in unimodal distributions) also named box-and-whisker plot), violin plot (mirrored probability density), dot plot (statistical chart consisting of data points plotted on a fairly simple scale, typically using filled in circles), frequency distribution table (displays the frequency of various outcomes in a sample, also called frequency table), population pyramid or age-sex pyramid (graphical illustration of the distribution of a population by age groups and sex), stem-and-leaf plot (device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution), histogram (approximate representation of the distribution of numerical data), frequency polygon (histogram as a line chart), Q–Q or quantile-quantile plot (graphical method for comparing two probability distributions by plotting their quantiles against each other), Heatmap of Pearson’s correlation coefficients (graphical tool to assess correlations in multivariate data), scree plot (line plot of the eigenvalues of factors or principal components in an analysis), biplot (generalization of the simple two-variable scatterplot with variables displayed either as vectors, linear axes or nonlinear trajectories)
  • Graphical elements in charts: title, legend, axes, axes labels, labels grid lines, tick marks, tick labels
  • Infographic: graphic visual representations of information, data or knowledge intended to present information quickly and clearly, Receiver Operator or ROC Curve(diagnostic tool for binary classifiers with decision threshold),
  • Dashboard: graphical user interface which often provides at-a-glance views of key performance indicators (KPIs) relevant to a particular objective or business process.
Visual system information processing
  • Eye: fovea, retina cones (S, M, L), rods distribution, blind spot
  • Visible spectrum: 400-700nm
  • Highest visual resolution in central 1~2 degrees of the fovea
  • Visual system comprised of eyes, nerves and the visual cortex (V1-5)
  • Bottom-up and top-down processing
  • Saccades, fixations
  • Selective attentional tuning: when presented with superposed layers, we can focus on one layer
  • Priming
  • Inattentional blindness: failure to detect an unexpected stimulus that is fully visible due to limited attention
  • Change blindness: failure to detect a brief transitory event occurring in the visual field, when we blink, iconic memory
  • Pre-attentive processing: no attention, from a single glimpse, works on large displays
  • Pre-attentive tasks: target, boundary and region detection, counting and estimation
  • Pre-attentive feature hierarchy: how certain preattentive features are easier to detect than others
  • Target and distractors
  • Conjunction search: search involving a combination of non-unique features
  • Basic visual preattentive properties: color, orientation, size, motion, stereoscopic depth
  • Pop-out effect: universal property, independent of practice, familiarity with the features and number of distractors
  • The greater the distance between target and distractors the greater the pop-out effect
  • Visual pathways (two stream hypothesis): Works without visual input,
  • Where / Dorsal visual pathway: relative object location for motor tasks
  • What / Ventral visual pathway: object identification and recognition
  • Pattern recognition: process that matches information from a stimulus with information retrieved from memory
  • Priming: Effect in which exposure to one stimulus influences a response to a later stimulus.
  • Apophenia: Perception of images or sounds in random stimuli, priming increases likelihood of seeing the pattern
  • Convergence: group of cells form a receptive field for a cell in the brain
  • Neuronal tuning: Simpler tuning in earlier visual areas (V1 & V2), Complex tuning in higher visual areas (V4 & IT)
  • Lower visual cortex: low information, high localization, universal experience, pop-out, developed early, e.g., V1 neurons may fire to any vertical stimulus.
  • Higher visual cortex: high information, low localization, individual experience, no pop-out effect, e.g., IT neurons may fire only to a specific face.
  • Some neurons of V1 are tuned to vertical lines, others to diagonal lines
  • Memory types: Iconic, VSTM, VLTM, from visual persistance to information persistance
  • Iconic memory: Unlimited capacity, Retention: ≤ 1s, high bandwidth, works unconsciously, provides temporal integration ensures continuity during saccades
  • VSTM memory: Limited capacity, Retention: ≤ 30s, Buffer that stores temporary information, Constructs and manipulate visual images
  • VLTM memory: Large capacity, Retention: indefinite, Capacity increases over childhood, declines with old age, Encodes information semantically for long term storage, Subject to fading, recalls help preserve it
Color and perception
  • Chromatic aberration effect: red/blue perceived at different distance
  • Types of color scales: sequential, divergent, qualitative
  • Primary and secondary colors
  • After-images and predicting colors using additive RGB color model
  • Color properties distinguishable: Hue, Saturation, Brightness
  • Color vision theories working together: trichromatic (R, G, B cones) and opponent process (visual system responds to opponent channels R-G, B-Y, B-W)
  • Trichromatic theory problems: no R-G or Y-B, R-G overlap, small B response, afterimages)
  • Subtractive color model: print, Primaries: CMY, secondary: RGB
  • Additive color model: computer screens, Primaries: CMY, secondary: RGB
  • Additive color displays: addition of illumination (projectors), partitive mixing (LCD), time mixing (OLED), binocular mixing (stereovision)
  • Color model: abstract mathematical model describing the way colors can be represented as tuples
  • RGB, HSV color models
  • Color blindness: about 9% of the population, R-G
Color and perception
  • False colors techniques: choropleth, density slicing (divides the image into few colors)
  • Pseudocolor, e.g., colored IR image
  • Simultaneous color contrast: colors of different objects affect each other
  • Color constancy: ensures color remain relatively constant under varying illumination
  • Mach bands: illusion due to simultaneous contrast - can appear in color scales
  • Sharpening: More sensitive to dark than light differences, affected by background
Depth and perception
  • Perceptual egocentric space: up, towards, sideways
  • Oculomotor depth cues: accommodation, convergence, myosis
  • Visual depth cues: binocular (stereopsis), monocular (static, motion-based)
  • Classic pictorial (static) depth cues: Occlusions, Linear perspective convergence, Relative/Familiar size, Texture gradient, Shadows, Shading, Defocus blur, Atmospheric perspective
  • Motion-based depth cues: Motion parallax, Occlusion in motion (deletion, accretion), Structure from motion
  • Simultaneous size contrast
  • Size constancy
  • Ponzo illusion (size constancy), Muller-Lyer illusion, Necker cube
Maps and GIS
  • Thematic maps: choropleth, proportional symbol map, cartogram, dot map, isopleth, dasymetric map
  • Geographic coordinates: latitude longitude in degrees
  • Geodetic datum: coordinate system and reference ellipsoid to locate places
  • Horizontal datum: defined by reference ellipsoid
  • Vertical datum: used to measure elevation, can be Geodetic, Tidal, Gravimetric
  • Geocentric datum: good for global applications
  • Local datum: good for regional applications
  • Properties preserved in maps: conformal (shape), Equal-area (area), Equidistant (point distance), Azimuthal (direction from a point)
  • Cannot have maps that are both conformal and equal-area
  • Developable surfaces used for map projections: Cylindrical, Conical, Azimuthal (plane)
  • Map projection can be: Tangent or Secant, Normal, Transverse or Oblique
  • Common projections: Albers conic, Lambert azimuthal equal-area, Lambert conformal conic, Mercator projection, Universal Transverse Mercator (UTM)
  • Composite projections: same projection optimized for different areas, e.g., Albers USA
  • Mercator projection: standard for Web mapping applications
  • Universal Transverse Mercator (UTM) projection is a projection over over $61 \times 6^o$ zones in cartesian coordinates, no trigonometry is needed to compute distances, units in meters as Easting and Northing
  • Maps in the Browser can be raster (images) or vector (SVG)
  • Tile Map Service (TMS): e.g., Google maps serve map tiles also called slippy maps
  • GeoJSON: JSON map format defining Geometry (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), Feature and FeatureCollection
  • TopoJSON: GeoJSON extension where to encode topology (Geometry is indexed with arcs) leading to smaller files
  • GeoJSON and TopoJSON coordinates can be in geographical or projected coordinates
Visual design
  • Visualization wheel dimensions (Abstraction-Figuration, Functionality-Decoration, Density-Lightness, Multidimensionality-Unidimensionality, Originality-Familiarity, Novelty-Redundance)
  • Visualization wheel dimensions more intelligible and shallower vs. more complex and deeper
  • Cairo's design principles (clarify, seek depth, add boom effect)
  • Cairo's suggestion to deal with novel forms (use redundancy)
  • Minimalistic visualizations
  • Tufte principles: 1. Above all else show data, 2. Maximize the data-ink ratio, 3. Erase non-data-ink, 4. Erase redundant data-ink, 5. Revise and edit
  • Nigel Holmes design principle: Use humor to instill affection in readers for numbers and charts
  • Data-ink ratio (Data-ink / Total ink used to print the graphic, Proportion of a graphic’s ink devoted to the non-redundant display of data-information, 1.0−Proportion of a graphic that can be erased without loss of data-information)
  • Chartjunk
  • Visual query: a pattern cognitively specified, that if found in the display will contribute to the solution of a problem
  • Ware design principles: Carefully craft visualizations to optimize visual queries, leverage information that the brain processes efficiently, (e.g., pre-attentive features) to prioritize most important information
  • Avoid attentional overload and change blindness
  • Uses of colors (Tufte): Label (identify, highlight, group), measure, represent or imitate reality, enliven or decorate
  • Color design guidelines:
    • blue text hard to read (fewer blue cones)
    • achromatic (BW) channel is easier to read (use all 3 cones)
    • colors harder to read (achromatic ~ 3 x chromatic channel info)
    • W on B less strain than B on W
    • Be considerate of colorblind people
    • Respect well-established color sequences
    • Shape from shading is recognized from the luminance channel
    • Observe cultural conventions
    • Use consistent aesthetics
    • Most important visual queries should have most weight
  • Apprehandable chunk: unlearned pattern complexity that can be apprehended in one fixation
  • Apprehendable chunks consist of about three components
  • Sketches are less work to understand than images.
  • Gestalt principles: Emergence, Reification, Multi-stability, Invariance
  • Gestalt laws of grouping: Proximity, Similarity, Closure, Continuity, Common fate, Connection, Common region
  • Gestalt is useful in graphics and UI
  • Semiology: visual language is a sign language, sender encodes information in signs and receiver decodes information from signs.
  • Visual variables: marks (points, lines, areas...) and encodings/channels (position, size, value, texture, color, orientation, shape)
  • Accuracy of perceptual tasks, e.g., points more accurate than area than color
  • Isotype: International System Of TYpographic Picture Education (Neurath)
  • Miller's law: The Magical Number Seven, Plus or Minus Two (1-D information judgment task)
  • LSTM capacity 4-5 items with characters
  • LSTM capacity 3-4 items with basic visual features & interference task
  • Multiple graphs usually better than single graphs: Faceting, conditioning, latticing, trellising, small multiples
  • Ways to deal with overplotting: transparency, outline shape, jitter, summarize, add information (e.g., regression line), split (small multiples)
Statistics
  • Population and sample
  • Descriptive vs. inferential statistics
  • Independent and dependent variables vs. features, labels
  • Measures of order
  • Quantiles and quartiles definitions
  • Measures of central tendency
  • Distribution modes, skewness and tails
  • Frequency & Relative frequency
  • Graphic forms adapted to descriptive statistics for univariate and multivariate data
  • Boxplot: quartiles and whiskers $Q1 - 1.5 \times IQR$, $Q3 + 1.5 \times IQR$
  • Steam-and-leaf plots and how to build
  • Histogram and how to build
  • Visualizing model performance: confusion matrix, reporting accuracy, precision, recall, and F-1 scores as tables
Visualization software
  • Nomenclature: Chart Typologies, Visual Analysis Grammars, Visualization Grammars, Component Architectures, Graphics applications
  • Expressiveness vs. ease-of-use of tools (Heer14 table)
  • General knowledge of various tools: Google Sheets, Matplotlib, Seaborn, VizQL, Tableau, ggplot2, plotnine, Altair, Protovis, D3, Vega, Vega-Lite, Prefuse, Flare, Improvise, VTK, Processing, P5.js, WebGL, Three.js, OpenGL
  • Grammar of graphics: graphic defined as a "grammar" of "components"
  • HTML basic elements
  • HTML global (id, class, style) and element specific attributes
  • inline vs. block-level HTML elements
  • CSS classes, ids, selectors
  • CSS pseudo-classes for handling events, e.g., #bar:hover
  • CSS inheritance
  • Ways to include CSS and javascript in pages (inline, embedded, external)
  • Javascript (user interaction, asynchronous communications, control the browser, alter content)
  • Javascript features (oo, first class functions, dynamic typing, block-level scope)
  • Javascript basics and closures
  • SVG basic shapes and attributes (rect, circle, ellipse, line, polyline, polygon)
  • SVG attributes defaults (position and size is 0, fill is black, stroke is none)
  • SVG path and it's uses in visualization
  • SVG transformations, e.g.. translate, rotate
  • AJAX
  • DOM box model
  • DOM event models (DOM Level 0: inline and traditional, and DOM Level 2)
  • DOM level 2 event bubbling (default) and capturing phases
  • D3 features: javascript library, declarative syntax
  • D3 what it is and what it does:
    • Loads data in the browser (DOES NOT HIDE THE DATA!)
    • Binds data to document elements
    • Transforms elements by interpreting each element’s bound datum and setting its visual properties
    • Transitions elements between states in response to user input
  • Basic d3 operations (implemented with function chaining):
    • Select elements
    • Add new elements to selected elements
    • Delete selected elements
    • Modify selected elements to position and style
  • D3 default data join maps data according to corresponding data and selection order
  • D3 data join with keys syntax: updates can occur anywhere in the data array, depending on the overlap between the old and new values.
  • D3 general update pattern basic order: data join, update, enter, enter + update, exit
  • D3 margin convention and how to use
  • D3 select/append mechanism
  • D3 axes
  • D3 scales domain range
  • D3 scale Continuos and Ordinal
  • D3 scales useful for colors: Linear, Sequential, Diverging, Quantize, Quantile, Threshold, Ordinal
  • Computing and recognizing basic scales (scaleLinear, scalePoint, scaleBand, scaleTime...)
  • D3 data format (js array [] of objects)
  • D3 loading json vs. csv, tsv
  • D3 layouts and generators general knowledge
  • D3 maps in GeoJSON or TopoJSON, data join by Feature and d3.geoPath() used to transform Feature to <path>
  • D3 event listeners implement DOM Level 2 event model .on(), calling a listener function
  • D3 transition: only one transition at the time per element
  • D3 transition events: start, end, interrupt
  • D3 array methods .min, .max, .ascending, .descending
  • js library vs. framework: you call library code in your code, framework code calls your code
  • Dataframes: table where columns are variables, rows are observations, string variables can be stored as factors
  • Recognize dataframes proper form
  • Basics Jupyter & Pandas: ?, help('list'), import, read_csv, describe
  • Layered Grammar of Graphics derived from Wilkinson's Grammar of graphics
  • Layered Grammar of Graphics components: Defaults (Data and Mapping), Layer (Data, Mapping, Geom, Stat, Position, Scale), Coord, Facet
  • ggplot2: implementation of the Layered Grammar of Graphics
  • Minimal ggplot2 plot: Data, Aesthetic Mapping, Geom
  • ggplot2 minimalistic plot:
  • ggplot2 aes() is used to reference variables in data (dataframe)
  • ggplot2 basic named plots: geom_point(), geom_text(), geom_bar(), geom_line(), geom_area(), geom_dotplot(), geom_histogram(), geom_freqpoly(), geom_grammar(), geom_violin()
  • ggplot2 faceting, e.g., facet_grid(rows = vars(var1), cols = vars(var2)), facet_grid(var1 ~ var2), facet_wrap(var1)
  • Tableau: built on VizQL an implementation of Grammar of graphics where mappings are specified interactively and visually
  • Dimensions (categorical) and Measures (numerical) variables in Tableau
  • SVG vs. canvas elements
  • canvas API: used with 2D context (getContext('2d'), shape, text image rendering functions
  • canvas with WebGL: used with 3D context (getContext('webgl')) primitives: GL_POINTS, GL_LINES, GL_LINE_STRIP, GL_LINE_LOOP, GL_TRIANGLES, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN
  • In WebGL and related toolkits data is passed as indexed arrays
  • Graphic pipeline: process of rendering a 3D scene in WebGL, includes vertex shader, triangle assembly, rasterization, fragment shader, testing and blending, steps
  • Model and view matrices: used in WebGL to place the object to render and the camera in a world coordinate system
  • Projection matrix: used to define how 3D is projected in the camera, can be defined using the clipping planes
  • Three.js: High-level access to WebGL and graphical utilities, e.g., scene, camera, model loaders, lights and materials
  • Processing: simplified Java API for drawing and graphics
  • Processing.js: JS API to use Processing code
  • P5.js: HTML5 processing implementation
  • Deck.gl and Mapbox GL JS 3d rendering capabilities
Notes
What type of chart is this?
  1. Stripline
  2. Slopegraph
  3. Line chart
  4. Area chart$\leftarrow$
What choice corresponds to the CSS rule:

					div, p a.foo {
					  color: red;
					  font-weight: bold;
					}
				
  1. div elements, hyperlinks with class foo and paragraphs
  2. div elements, hyperlinks with class foo in paragraphs $\leftarrow$
  3. div elements, paragraphs in hyperlinks with class foo
  4. div elements in paragraphs or in hyperlinks with class foo
What choice corresponds to the SVG code:

					<svg style="height: 25px; background-color: lightpink;">
					  <rect x="20" y="5" width="100" height="10" fill="blue"></rect>
					  <text x="0" y="20" style="font-size: 0.65em">5</text>
					</svg>
				
  1. 5
  2. 5 $\leftarrow$
  3. 5
  4. 5
What will appear on the page?

					<!-- assume data.csv contains:
					Color
					Red
					Green
					Blue
					-->

					<p>Orange</p>

					<script>
					d3.csv("data.csv").then(function(data) {
					  d3.select("body")
					  	.selectAll("p")
					  	.data(data)
					  	.enter()
					  	.append("p")
					  	.text(function(d) {return d.Color; })
					  });
					</script>
				
  1. Orange, Color, Red, Green, Blue on separate lines
  2. Orange, Green, Blue on separate lines$\leftarrow$
  3. Color, Red, Green, Blue on separate lines
  4. Red, Green, Blue on separate lines
Which statements about the pop-out effect are correct?
  1. Basic features that pop-out correspond to the basic visual patterns processed by the brain
  2. Pop-out is an individual property
  3. In feature space, the smaller the distance between target and distractors the smaller the pop-out effect
  4. The pop-put effect depends on practice, familiarity and number of distractors
Answer: A and C
Which statements about data join with keys are correct?
  1. By adding a key to the data-join, updates can occur anywhere in the data array.
  2. If a key is not specified, then the first datum in values is assigned to the first element in the selection and so on.
  3. Is usually implemented by formatting the data as an array of objects with some key-value property for the keys.
  4. Is realized by specifying the keys in the second argument in selection.data(values, [, key])
Answer: A, B, C and D
Which plots are valid for the following data?
73 42 67 78 99 84 91 82 86 122

					A. The decimal point is 1 digit(s) to the right of the |    C. The decimal point is 1 digit(s) to the right of the |
					 4 | 2                                                       4 | 2
					 6 | 738                                                     6 | 837
					 8 | 24619                                                   8 | 91642
					10 |                                                        10 |
					12 | 2                                                      12 | 2

					B. The decimal point is 1 digit(s) to the right of the |    D. The decimal point is 2 digit(s) to the right of the |
					 4 | 2                                                       0 | 4
					 5 |                                                         0 | 7788899
					 6 | 7                                                       1 | 02
					 7 | 38
					 8 | 246
					 9 | 19
					10 |
					11 |
					12 | 2
				
Sorted values: 42 67 73 78 82 84 86 91 99 122
Answer: A, B and D
Which plots are valid for the following data?
10 1 0.1 7.5 8.8 2 3.5 6.7

A. The decimal point is at the |               C. The decimal point is at the |
	0 | 10                                       	0 | 1
	2 | 05                                       	1 | 0
	4 |                                          	2 | 0
	6 | 75                                       	3 | 5
	8 | 8                                        	4 |
 10 | 0                                        	5 |
                                                6 | 7
																								7 | 5
																								8 | 8
																								9 |
																							10 | 0

B. The decimal point is at the |              D. The decimal point is 1 digit(s) to the right of the |
	0 | 01                                       0 | 0124
	2 | 50                                       0 | 789
	4 |                                          1 | 0
	6 | 57
	8 | 8
 10 | 0
						
Sorted values: 0.1 1.0 2.0 3.5 6.7 7.5 8.8 10.0
Answer: A, C and D