Workshop: Graphing with R & ggplot2

Descriptionggplot2 example by Hadley Wickham

Introduce yourself to R and the powerful graphing library based on the Grammar of Graphics–ggplot2. Attendees will work in small teams to learn how to generate basic and advanced plots in ggplot2 to solve a variety of problems. The workshop will also review the fundamentals of data visualization to increase the readability and clarity of plots.

The workshop is open to all types of users, including those who are unfamiliar with R. We will mix some demonstration with small group-based projects. Basic principles of data visualization will also be emphasized alongside ggplot2 demonstrations to put the program into a larger context.

Audience

The workshop is targeted to individuals who are not familiar with ggplot2, including beginners who are new to the R software. Attendees will need to bring there own computer where we will install the R and ggplot2 software–don’t worry, both are open source and free.

Time, Location, & Signup

The workshop begins October 8  in the IMSA classroom at 1871 located on the 12th floor of The Merchandise Mart (222 W. Merchandise Mart Plaza). It includes four sessions (outline below) meeting on consecutive Mondays at 6pm. The IMSA room isn’t available on October 22 so, depending on the number of attendees, we will either meet in a smaller conference room or push the schedule a week.

There are only 30 seats available for this workshop due to the size limitations of the IMSA classroom. Interested attendees need to go to the CDVG meetup site to sign up for each of the four sessions.

Workshop Leaders

The workshop will be led by CDVG member Tom Schenk. Tom is a Senior Research Data Analyst at Northwestern University, Department of Medical Social Sciences. You can read more about Tom on his website. He also curates Data Nouveau–a collection of interesting data visualizations on the web.

Tom will be assisted by CDVG member Josh Doyle (who is relatively new to R & ggplot2 and will ask the dumb questions so others won’t have to). We also expect to have some other experienced folks in the room to help out.

Workshop Outline

Introduction to R (October 8)

We will familiarized ourselves with the R environment with a gentle introduction to the basic functions. After installing R, we will import and inspect data sets while becoming familiar with R terminology. By the end of the class, we will conduct basic descriptions and plots of the data.

  • Learn how to import data into R.
  • Understand the structure of data sets and their components..
  • Learn how to describe data.
  • Download and install new packages from CRAN.
  • Plot data using basic R functions.

Introduction to ggplot2 (October 15)

We will begin to use the ggplot2 package to create basic, but handsome, univariate, bivariate, and time-series graphs. We will introduce the functions and terminology used in ggplot2. We will also explain the fundamentals of proper data visualization techniques and how it relates to the ggplot2 defaults.

  • Install the ggplot2 package.
  • Use geometric shapes to display data.

Grammar of Graphics (October 22 or 29)

We will continue to show more advanced features of ggplot2, including how it relates to Leland Wilkinson’s Grammar of Graphics. We will show how to plot more than 2 variables in a single graph using colors, shapes, and sizes. We will also discuss how human ability to perceive different shapes and colors should drive the choices we make in data visualization.

  • Using scales to add information.
  • Using coordinates to aid interpretations.
  • Easily create small multiple graphs.

Plots for Publications (October 29 or November 5)

After learning how to make plots, we will learn how to customize graphs with custom colors, labels, and themes. We will emphasize how to create a customized look to be included in publications, including adding labels in diagrams to help readers.

  • Saving graphs from R into publication-friendly formats.
  • Use custom colors for plots.
  • Use your own fonts.
  • Customizing ggplot2 graphs with the new themes feature.
Advertisements

The challenges of visualizing the US Electric Grid

Matej Mavricek will be presenting Visualization Data on the US Electric Grid at our next CDVG meeting on Aug. 29. Matej is a Senior Analyst with Power Switch. This is an energy think tank in Chicago focusing on effective research of the US Energy sector. He has a particular interest in creating some visualizations of the Electric Power Grid. He will discuss his objectives for a visualization and present the data that Power Switch has gathered. Anyone interested in creating visualizations of this data will be able to post them on the CDVG website and solicit feedback from other CDVG members. Matej will also offer feedback on the visualizations created and Power Switch may choose to use the visualizations in their materials. This is a great opportunity to get practice and exposure!

Profile: Datascope Analytics

Datascope Analytics is a data analytics and visualization agency in Chicago. Established in 2009 by Mike Stringer and Dean Malmgren—two PhD students in the lab of Luis Amaral, professor of chemical and biological engineering at Northwestern University. Mike and Dean were investigating large communication networks and scientific databases for information to support the lab’s research. Their realization that they actually enjoyed the data analysis, coupled with the growing demand for these skills, eventually led them to start Datascope Analytics.

I sat down with Dean Malmgren and discussed Datascope Analytics and the Chicago data visualization community in preparation for his presentation at the CDVG meeting on August 15. The notes from our conversation are posted here. He told me he is still a teacher at heart. This is evident by his passion for discussing data visualization and his work at Datascope Analytics. After reading this post, and hearing him speak at our meeting, I hope you are encouraged to reach out and speak with him. It may turn into an opportunity for you as Datascope Analytics is growing and has some exciting projects starting soon.

Your presentation for the August 15 meeting of the CDVG is titled “Data-driven: at the intersection of design and analytics”.  Can you give me a little preview of what you will be speaking about and why you have chosen this topic?

I will walk through a project or two from start to finish to give a sense of how we approach problems and to emphasize the importance of designing compelling visuals to achieve our results.

Describe Datascope Analytics

We are a data-driven consulting and design firm. Instead of specializing solely in design, consulting or analytics, we operate in the space where these three functions intersect. With this broader perspective we are able to provide solutions customized for the data and challenges unique to our clients. We believe that this comprehensive approach has differentiated us in the analytics market.

What are some of your signature clients?  Can you discuss the projects you did for them?

Proctor & Gamble contracted us because they needed their employees to adopt a new work process throughout a multinational and multi-functional organization. We conducted a social network analysis and created an influence network model. With this we identified the thought leaders and change agents who are simultaneously well-respected by their peers and optimally positioned in the influence network to foster a movement. The result was a team of ambassadors who will spearhead change in the organization.

P&G Influence Network created by Datascope Analytics
P&G Influence Network created by Datascope Analytics

We’ve also worked with companies like Research Corporation for Science Advancement, a global information services provider, and a global e-discovery services provider.

Can you describe your design process?

We have developed a four stage process that has been successful for us. It has four major phases.

    1. Clarifying our clients need. This is a collaborative exercise with the client to brainstorm ideas. We then select a few options that are the best fit for the problem and create prototypes. Our ability to create prototypes is one of our strengths.
    2. Identifying the data that can be combined or created to provide insight for our clients. This may be data from within the client’s organization or from external sources. If the data is incomplete, we fill in the gaps with custom tools and surveys. All of this data is combined to provide a reusable asset for the client.
    3. Designing the analysis. We know that our clients, and their data, are unique. Consequently, we don’t use the same analytical tools for every project. We are a custom shop because we believe we deliver greater insight in to a client’s data than we can with a vended software package.
    4. Communicating the results with a LivingReport™. This is our unique solution that is more effective than just text or a table of numbers. It is a visual representation of the data that shows the patterns that can reveal valuable insight about the client’s business.

What technology does Datascope Analytics use to create their visualizations?

We use open source development tools to develop custom solutions for our clients instead of using vendor software. We feel the vendor packages have considerable functionality but are ultimately more limited that our custom solutions. For analysis, Datascope Analytics uses Python, R and Hadoop. Raphael.js and D3.js are used for visualization. We have also created our Lens™ library: a set of analysis tools that let clients see into their data with more clarity. It is built in Python and is the glue that sticks everything together coherently.

What is it like to run a data visualization startup?

My days are divided into three activities: white boarding solutions, coding, and interacting with clients. These aren’t eight hour days, however, so I get to spend a considerable amount of time on each of these. And that’s okay because I enjoy them all.

How would you describe the Chicago data visualization community?

I would like to see the Chicago data visualization community mirror the diversity of businesses that exist in our market. Unlike the financial focus of New York or the tech focus in San Francisco, Chicago has a very rich set of industries that could all benefit from data visualization excellence and cross-fertilization of ideas. This meetup one of several great ways to start the process of sharing ideas and bringing together the diverse community interested in data visualization.

What help in starting Datascope Analytics did you get from the Chicago community?

Northwestern University Farley Center was instrumental in getting us off the ground. They provided accounting services, space, and mentoring. We were also fortunate enough to receive deeply discounted legal assistance from the Loyola Law Clinic.

We have also benefitted from collaborations with other start-up companies in the Chicago area like Syndio Social.

Who are some of your favorite data visualization designers? What are some of your favorite data visualizations?

Moritz Stefaner and Stephanie Posavec are two of my favorites. I like how each of them thinks outside the box to come up with interesting ways of using different graphic elements to visualize data. Naming a favorite is difficult, but I particularly like Stephanie Posavec’s “11 x” series which, despite the simplicity behind the underlying visualization, is a fun way to explore the emergent patterns in the long multiplication.

What advice do you have for those interested in getting started in data visualization?

Get started playing with data any way you know how. Start with a pencil and paper, make a static image, and — if it is useful to do so — create something interactive. The only way to learn what works and what doesn’t is to try and iterate, not read and regurgitate

InfoActive (a Chicago infographic startup) wants your help!

InfoActive is a Chicago startup providing an online platform for creating interactive, mobile-friendly infographics. It was one of the startups selected to participate in the Chicago Lean Startup Challenge. I’ve met with Trina Chiasson (founder and CDVG member). She says InfoActive will help people that have “an interest in visualizing data but don’t know where to start, or those who think it’s difficult to create beautiful, data-driven stories.”  If this sounds like you then Trina wants your feedback. Please go to InfoActive.us/News and fill out the interest form. Trina will contact you and your contributions will help another Chicago startup take the next step.  I’m pleased that this one is in the data visualization space.  Hopefully we will see more of Infoactive in the future.  We will check in with her later in the year to talk to her about the data viz startup experience.