5 Experiences from the Tableau Conference

The annual Tableau Conference is in my opinion the biggest congregation of data loving nerds. This was my 3rd Tableau Conference and it feels like it gets better and bigger every year. Here’s me sharing my experience from #TC18.

1. Venue

The conference was held at the Ernest N. Morial Convention Center in New Orleans. The venue’s wikipedia page explains that- ‘as of 2006, it has about 1.1 million square feet (102,000 m²) of exhibit space, covering almost 11 blocks, and over 3 million square feet (280,000 m²) of total space. The front of the main building is 1 kilometer long’. I did not know this information prior to going to the conference, but if you were there, you would know how far some things were. In fact, Mark Edwards even measured the time it took him to walk between a few locations. Now, he admits he’s not a slow walker and that the corridor was empty, so I would definitely add a few minutes to this to get a better value.

Personally, I met all the fitness goals my phone generally tracks- Move Minutes, Heart Points and the like. I had a burst of movement throughout the week but mostly on the 23rd and the 24th of October. I walked more than 20,000 steps on these two days and exceeded my goal of 5,000 steps per day (I didn’t say it was a goal I was proud of).

 

 

2. Data-driven

Seeing how Tableau helps individuals and organizations become more data-driven, there’s no doubt that they give a lot of importance to numbers. There were 17,000 attendees this year, which was more than last year (sorry, don’t have the exact count for last year). There are also 10,000+ organizations that use Tableau Online. This was a good-to-know for me as mine is one such organization. However, the numbers that got me the most excited were those around Tableau Public. 550,000 authors, 1.5M+ Vizzes and 1.7B+ Views. These may not be the biggest numbers you have seen but yet they are something I can’t fathom. But what this also means is that an average Tableau Public author has (1,500,000/550,000) 2.73 or roughly 3 visualizations on Tableau Public. However, if you are a regular Tableau Public author you know that’s not true. There are plenty authors who have far greater number of vizzes. For example, MakeoverMonday hosts and Zen Masters Andy Kriebel has 697 vizzes, and Eva Murray has 129. Fellow Phoenix author and Zen Master Ann Jackson has 212, and I have 112. So, really there is a small percent of authors who have a mid to high percent of vizzes that are on Tableau Public (creating a mental Pareto Chart as I write this).

 

3. #MakeoverMonday

To say that #MakeoverMonday has changed the lives of many (in a good way) is to say the least. It has given several (including me) a platform to learn, improve, and showcase one’s data analysis and visualization skills. I am an absolute fan, and beneficiary of the project and if you met me at the conference and spoke about data visualization, I am sure I mentioned Makeover Monday at least once.

Monday, October 22, 2018 was the first day of the conference and the first sessions I attended was the Makeover Monday Live session. I was absolutely looking forward to it because of all the reasons I mentioned earlier. However, it unexpectedly turned out to be a very special session for me. I have been to one Makeover Monday Live session at #data17 in Las Vegas so I knew that Andy, and Eva ask attendees to volunteer and present their makeover after the vizzing time is up. However, I wasn’t expecting that happen to me because I have rarely been able to complete one in under an hour. To add to that, there were internet, and data related issues that prevented me to get started until after 15 minutes had already passed in the session. However, when I finally got my hands on the data (about Beer Prices at MLB) and I started looking for an insight, it didn’t take me long to find one. I found that New York Mets had about 82.6% increase in price of beer in 2018 as compared to 2013. This made me think if the Size of the beer can/bottle has also increased, and it indeed had. At this point, I realised I could make this into the narrative of my viz. So I decided to rename my sheets exactly as I was thinking about it in my mind, as if I was having a conversation with myself as I underwent this data discovery. I quickly finished the viz. and posted on Tableau Public in time.

As soon as the time finished, Andy and Eva started looking for volunteers to present. It only then dawned on me, that I could actually go up there and present since I had finished my viz. Now, raising my hand in a room full of people to volunteer and present is not at all me. I am in fact the opposite. I would sit there, and see others take the lead. But everything happened so fast, I didn’t get a chance to second guess my action. And that’s when Eva called me over to line up besides the stage to volunteer. I have learnt this about myself. If I ever second guess a decision, I always end up taking the easier option. But when I go with the gut, I end up learning more and evolving as a person. I am glad I didn’t get a chance to second guess that day.

Photo credit to @AnnUJackson and @ktfontnowagner

4. Inspiring Sessions

There’s no doubt that inspiration is in the air at the Tableau Conference. But there are a few sessions that just stay with you for long after they are over. I was fortunate to attend a few of these in person.

50 Charts in 50 Minutes

Last year, Andy Kriebel, and Jeffrey Shaffer did a session together- 50 tips in 50 minutes- where in they demoed more than 50 tips in tableau. I learnt more from that one session than several others combined. So when I found out they were doing another similar session this year, I signed up in a heartbeat. This time they demoed ’50 charts in 50 minutes’, and just like last time, it was an amazing session. They did not use the ‘Show Me’ (pill pusher, as they call it) to build any of the charts. It was a pleasure to watch these two Zen Masters build a variety of charts (definitely more than 50) in the short time. They know their art so well, they would throw a joke here and there. It was great to see how comfortable they were on the stage doing a live demo in front of a big audience. Since this was a fast-paced session, they recommend not to make any notes (and rightfully so), as one can go back and watch the recording. Additionally, it was also fun to see Andy build the packed bubbles chart because I know how much he dislikes them.

IMG_20181023_162423.jpg

But yes, he definitely asked everyone who attended to never make one.

You are an Artist

Mike Cisneros is an amazing data visualizer. He tends to construct amazing stories through his thorough analyses. However, what I did not know was how good of a storyteller he is in general. I attended his session- ‘You are an Artist’. In the session he explained how and why to start visualizing data and how one can be a good data artist by focusing on Design, Analysis, and Storytelling. His style of presentation keeps one captivated with what he has to say. If you did not attend the session, I would highly recommend watching the recording. ‘Seek excellence, not approval’. I was fortunate to get a chance to speak with Mike the next day where we had a detailed discussion about everything from attribution to use of images, and Tableau Public to Makeover Monday.

Freak-alytics | The hidden data behind everyday questions

On the last day of the conference I got a chance to attend a session by Chris Love and Rob Radburn. Why does it always rain on me? Why do all the malls have the same stores? Do bands play the same gigs night after night? These were the questions they tried to answer with data. Now these seem very simple questions, but it’s never too easy to answer these. And this is exactly what I learnt from the session. The one with the band gigs was my favorite. What they did here was downloaded (scraped) data from a website that crowdsources this information. Basically, one can go on to the website and submit the songs a band played at a concert they attended. They can assign a sequence etc. What this did for Chris and Rob is that, they were now able to analyse this data to find out what bands played the same gig night after night and which ones mixed it up.

IMG_20181025_124338.jpg

From the image above-which I believe is of the songs Rolling Stones played at their concerts-you can see they pretty much kept their order the same across concerts. I would highly recommend watching the recording as it has a lot more analyses on these questions and it is easier to understand the way they explain it. I would also recommend following their blog Data Beats.

There were plenty of other sessions which I wanted to attend but I couldn’t. But for those, I marked them on the app and took screenshots so I can go back and watch the recordings.

 

5. Braindates

Briandates was a new thing Tableau introduced to the conference this time. Basically, you create a topic you would like to talk about on the app, and any one else interested in the same topic can send an invite to meet. If you do not want to create a topic, you can even join conversations that someone else has created. These are either a one-on-one conversations or group ones (which does not include more than 5 participants).

When I first saw this option available in the app, I wasn’t too excited to get started. I browsed for a few topics I was interested in but most were already full. One of the main goals I have during the conference is to attend sessions that are HR specific and try and find people who work in HR to understand how they use Tableau. There have never been too many HR sessions in Tableau. I have applied to speak for 2 of the last 3 conferences I attended but I never got selected. Again this year,  there were very limited number of HR sessions happening. So I decided to create a Braindate topic instead.Screenshot_20181028-170143.png

‘Using Tableau to solve problems in HR’. The response I received was unlike anything. There were so many people who sent me invites to discuss HR. And one of the common things we realised was that we were in the minority. There were not too many sessions about HR that we could attend. I got a chance to meet with people working in different companies, and even in different countries. But what I found was that most of them had similar challenges we faced at my organization. We used the same systems, pulled the data the same way, and built dashboards around very similar metrics. This was all valuable information because not only was it a validation of the work I do at my organization, but what it also meant was that we were on the right track.

In all I attended 11 braindates over the course of 4 days. 7 of these were about HR. 2 of the remaining 4 were based on the other topic I created-‘Building a Tableau Public Profile and committing to it’. My intention here was to help someone get started with Tableau Public and make some recommendations to participate and build vizzes. Both the dates I had were very new to Tableau Public and had limited experience building vizzes. However, they were both excited to learn and get started as they believed it was the right way to get some practice and get comfortable with the tool. This in the end was going to help them build dashboards at work. They had both heard of #MakeoverMonday before but did not know how to get started. So I walked them through the process, showed them where they could download the data sets and how they can participate. I hope I was able to encourage them enough to start participating. But if nothing else, I hope I was able to inspire them to start building visualizations in Tableau Public.

IMG_20181028_172047.jpg

I also got a chance to learn about web-editing with Mark Edwards. He’s doing a fantastic job at his organization and I was able to get a ton of ideas to start using it at my work. I also got a chance to discuss IronViz with Curtis Harris, who shared his experience from the time he won in 2016, and his efforts since then. It was great to hear his thoughts about the process and the competition and get some ideas around what one can do to hopefully win a feeder.

Overall, the braindates provided me a ton of opportunities to network. It especially worked for me who is an introvert and not too comfortable talking in a crowd. The one-on-one aspect really helped me learn from others’ experience and gave me a lot of ideas for myself or my work. I hope they get a chance to come back to the next conference so I can do it all over again.

Finishing thoughts

Although I chose to skip sharing my thoughts about the Keynotes, IronViz etc., these are the sessions that are truly amazing. Devs on Stage is one of my favorites. I can’t wait to for the new Parameter and Set Actions features coming down the pipeline. IronViz is the most favorite event of the conference. It is a dream to be on the stage some day. Three awesome vizzes were created this time by three amazing vizzers. In my opinion, being on the stage is being a winner.

There were plenty other things to enjoy at the conference in between sessions. Like the ‘Aggre-gator’ scavenger hunt was fun. I was able to find 5 aggre-gators to win the special button. The lady who made the ‘data’ mural did a fabulous job. Seeing some great public vizzes feature at the viz gallery was super cool. And Cafe Du Monde beignets almost made me want to settle in New Orleans. I always look forward to the Tableau Conference. But it’s things like these that make it even more memorable. However, nothing beats meeting the #datafam you know only through Twitter.

Hope to see you next year in Vegas.

Suraj.

Advertisements

#TC18 Tweets

Monday, October 22, 2018 is Tableau Conference 2018. Tens of thousands of data nerds will descend into New Orleans to celebrate their love of data visualization. Last year I made a dashboard looking at all the tweets tweeted using the conference’s official hashtag- #data17. I created an applet through IFTTT (If This Then That). Looks like it is still being used, with the last tweet posted as late as October 11, 2018.

#Data17 Tweet Counter

Then, I had only started collecting the data about three months prior to the conference on July 03, 2017. Last year, the maximum tweets were posted on the second day of the conference October 10, 2017 with 3,171 tweets. I believe this was the case as it usually is the first actual day of the conference where break out sessions and hands on trainings start. So most of the attendees are expected to be at the conference by Tuesday. Mondays have historically included all day trainings, certification exams, and very limited meetups and breakout sessions and so does not see as many tweets as Tuesday.

However, I wanted to do something different this year. I created another applet a day into the conference last year, to start tracking tweets since then. I figured attendees would usually start tweeting using the official hashtag on the last day of the conference- to bid goodbyes and promising to meet again the following year. And even if that didn’t happen, at least I would have a year’s worth of data collected on the hashtag. But my guess was true.

Looking at the screenshot below of my current dashboard, you can easily figure out that the last day of #data17 (October 12, 2017) saw an unusually high number of tweets using #TC18. 67 tweets to be precise. 139 if retweets are included. So far, the maximum tweets have come on October 04, 2018.

#TC18 Tweets.png

From a dashboard perspective, I wanted to show a leaderboard for the top 5 users who have tweeted using the hashtag. Not only that, I also wanted to differentiate between a tweet and a retweet so the numbers don’t appear to be skewed in the direction of users who mostly retweet. Finally, I also wanted to allow you, the user to find all the tweets you posted. If you know you tweeted about #TC18 but can’t find yourself on there, please reach out so I can make sure you are added. I have had to remove plenty of tweets and users as those were not related to the Tableau Conference. I am sure some are still there in the data but far less than what it had initially.

Can we beat the last year’s record for maximum tweets in a day? If you are attending the conference this year, don’t forget to tweet using #TC18. If you see me, please come say hi.

-Suraj

#MakeoverMonday 2018 W40: Avocado Price as compared to US National Average

For this week’s #MakeoverMonday, the data was provided by Hass Avocado Board. There’s a lot of data available in the csv like the Average Price, Total Volume, both categorised by the size of the Avocado (Small, Medium, Large, XL), and broken out across regions of the United States.

I went through a few iterations to get to a point where I felt comfortable with my viz. Some of the iterations are below.

My final version is a small multiple chart I did based on the one I had done previously as part of a #WorkoutWednesday- The State of U.S. Jobs. What was done here, was instead of showing two lines, one for the state and other as the national average, both the calculations were shown as a single metric- the % difference. I decided to do the same here- checking the % Difference in Average Price of the Avocado in a region vs the National Average.

Dashboard 1

Link

I did not know how to get the labels at the top of the small multiple charts. I thought of annotating, or floating text boxes, but that defeated the purpose of a single sheet small multiple. However, upon searching a little bit I landed on Andy Kriebel’s viz on National Parks where he has used a calculation trick to display the labels. It’s a simple trick involving finding the center most value on the timeline, and creating a static value for that particular date to create some artificial padding as well as a mark over which a label can be shown. See below for an example.

IF ATTR(Date)=DATE(“07/03/2016”) THEN
0.45
END

However, upon completing this, I wanted to go back a few steps and make this a little simple to understand. So I decided to this another viz. which simply compares the two trends of the average prices- one for the region and another for the nation.

Dashboard 2

Link

Hope you enjoyed!

-Suraj.

Iterating through my #MakeoverMonday 2018 W33: Anthony’s Travels

This week the data set was about Anthony Bourdain. He was a celebrity chef famous for his television shows where he travelled the world exploring local cuisines. He hosted 4 shows spanning 16 years where he travelled 86 countries and 392 cities.

Iteration 1

When I first looked at the data set, I noticed there were a few key elements to base the visualization upon- Air date (which was really the date the episode was aired, not when he actually travelled), Country, City, and the Show. I immediately thought of doing a calendar across all the years, where a date would be marked ‘x’ when he would be travelling (again really the date aired). I liked the idea because that’s what we as humans tend to do, mark important dates on a calendar. Something like my image below.

IMG_20180814_163147

So, I started upon the journey to create a calendar. I have created a few calendars in the past so I was familiar as to how to make one (thanks to this blog by Andy Kriebel). However, this only creates one for a single month. This means, I would have to create 12 different ones and apply the year filter across all these sheets to have them change as the Year changes. This is not terrible, but if you read my last blog, you would know I am trying to avoid creating multiple sheets after that experience (I created 51 sheets). This also basically helps me learn new techniques. So I guess it’s a win either ways.

To create a one-sheet-one-year calendar, I referred back to Andy’s blog as I remember him running one of those as a #WorkoutWednesday last year. After a couple quick google searches I landed on this blog where he explains how to create the same with ‘month labels’. This was cool and I watched the video, but at this point I didn’t even know how to create a 12 month calendar in the first place, so adding labels was not too useful to me. Thankfully, he had added a link to Kevin Taylor’s blog where he explains in detail how to create a similar view.

This was amazing. I was learning new things, all excited to see my calendar come to life. But- it’s time for Lesson 1-“explore the data before diving too deep”.

Sheet 1

This is important so that you don’t make the mistake I did. when you look at the image above, the darker shades of blue are the ones where there is data. These are basically the dates episodes were aired. So in 2014, episodes of Anthony’s show were aired only in May (4, 11, 18), June (1,8), October (19, 26), and November (2, 9, 16). I may be missing a few dates here if the shade of the blue is too close to the one where there was no data, but the point is, the shows didn’t air on a regular schedule. Which means depending on the year I pick, there would be only a few months and days across which his shows would air. I could still continue complete this view (with better color differentiation obviously), but there would just be too many days with no data. I think I’ll lose the point of the calendar with that.

If only I had done something like below at the beginning, I would have known the problem and saved myself a lot of time. But hey, at least now I know how to make a one-sheet-one-year calendar.

Sheet 6.1.png

Iteration 2

When I first looked at the dataset I wanted to stay away from the map because I felt that was obvious. However, since my plan 1 failed, I decided to give map a shot. I obviously started with showing all the locations on a map where he has been. I could also show this data filtered by year so the user can interact with it as needed.

But I wanted to do more. I did know there was a field called ‘Order’ in the data set which basically showed the sequence of events. I decided to use that and create paths across the map. Again, with the help of a couple blogs (one and two) I was able to achieve what I wanted to do.

Sheet 3 (2)

This is extremely busy, so I decided to add YEAR(Air Date) to the Pages shelf so it would only show me one year at a time. I also tried to do the same with Season to see how it looks. I realised they were both equally important and showing different views so I decided to create a parameter so one can switch to and fro. In the end I threw in some color and that’s it.

To make this even more crazy, I created a sheet for the dashboard title and assigned it the same color as my chart, so the title will also change color with Year or Season. Important to note, that there were places that Tableau wasn’t able to locate on the map, and so those have been filtered out. Below is an image of my viz.

ANTHONY'S TRAVELS

 

Bonus

To take the crazy one step further, I tried to see how a step line would look for paths on a map. Let me say, it doesn’t look too terrible. The only problem with this is, one might misunderstand some corners that are not really there. For e.g. the vertex over Canada is non-existent. It’s just how step lines are connected so it appears as if there is a point there. Apart, from that I kind of like this view as well.

ANTHONY'S TRAVELS2

In the end, my viz this time is not intuitive, or helpful and has too many paths criss crossing, but I still decided to experiment. I learnt something new and had fun. That’s what #MakeoverMonday is all about after all, have fun, experiment and learn. Isn’t it?

 

 

#MakeoverMonday 2018 W31: Big Mac Index

For this #MakeoverMonday the dataset was provided by The Economist. It was about Big Mac Index. This concept was invented by them in 1986 to make understanding currency exchange rate easy. Read more on their website.

I spent way too much time this week than I should. I actually worked on parts of it everyday of the week. I iterated several times. Adding something, then deleting, then formatting etc. You’ll understand why that happened as you read more.

There are times when you see a visualization, and you love it so much that you just want to replicate it. That was exactly the case this time for me. Andy Kribel made a stunning viz a couple weeks ago on the NBA Salary data. That’s the one I wanted to replicate. I love the way it’s laid out. I am amazed by the amount of information it shows without being too overwhelming. Not just that, you only need to understand one chart to understand all of them. I mentioned in a tweet that it would be very useful at work where we were trying to do something similar but show HR data. And it indeed was. I used the combination of bar and reference lines along with some donuts at work. I am sorry, I can’t share.

However, I wanted to replicate in its entirety (or most of it) through a different data set. Fortunately I found one so soon in this week’s #MakeoverMonday. However, halfway through I realised maybe it wasn’t a good data set after all.

My Lessons

  1. For the NBA Salary data that Andy has used, the data allows him to keep the axis uniform across sheets. This makes the dashboard look clean and allows the user to compare values across teams very easily. That wasn’t possible for me. The price of the Big Mac could vary vastly across countries due to varied differences in currency rates. However, I decided to ignore that as I had already spent a lot of time on it and decided to continue.
  2. Since I could not keep the axis across the countries uniform, I had to create individual sheets for every country I was showing. The number is 51. This was a terrible idea. I have done this once before in this viz when I was starting out in #MakeoverMonday last year. However, I didn’t think I would do it again. There were several occasions when I had to make a slight formatting change and then go back to all the sheets and edit. This was a good lesson.

 

Metrics I chose

After spending considerable time on it I decided to see it all the way through. I changed the story a few times but eventually went with what made most sense to me. Through the trend lines I was trying to show how different the price of a Big Mac is in a country vs the price in the United States. The other way to show this was compare the United States price with the USD equivalent of local price, however, I changed that moving on. I did this because I wanted to compare the implied or Big Mac exchange rate (which can be found by dividing local price by USA price) to the Actual exchange rate. Based on these calculations you can find out whether a currency is over or under valued.

In the end I enjoyed working on the viz and glad it looks similar to Andy’s version but I had important lessons to learn from my approach with this data. Check out the interactive version here.

 

Big Mac Index.png

Polio – Road to Eradication #IronViz

Polio or poliomyelitis is an infectious disease caused by the poliovirus. It is incurable and potentially deadly disease that causes paralysis. The weakness most often involves the legs but can also involve the head, neck or the diaphragm. It can also cause difficulty in breathing.

The theme to this year’s second Tableau IronViz feeder is Health and Wellbeing. When the round was announced, there were two topics that came to my mind. The first was Yoga. Yoga has its origins in ancient India is a form of exercise that helps one stay fit and live a healthy life. I wanted to build a viz on the topic to show how it started and how widespread the practice has become. However, after spending considerable time researching the topic, I could not find enough information to build a visualization.

The second topic that came to mind was Polio. Polio eradication program was once a big campaign in India. I have grown up watching television adverts starring a popular Indian actor supporting the cause and requesting all having infant children to go to the nearest polio booth to receive “do boond zindagi ki” or “two drops for life”. I started with the intent of telling India’s journey to eradication. But as I researched, I realised although India was a big challenge to overcome, it was important to elaborate on the successes of polio eradication independent of India.

I spent a few days trying to find as much data as I could and I found plenty. However, I had no idea how I was going to tell my story. I could do a dashboard highlighting how the cases have reduced, or how vaccination helped, but I couldn’t find ways to combine and connect everything in a way that told a story. I decided to take a second look at the data and articles I had gathered. It is then that I realised that I had a very important entity at hand that would be enable me to tell me story perfectly as well as connect the dots. This entity was time.

Every point of data I found was all associated with at minimum a ‘year’ when something happened. I decided to put this together in the form of a timeline. I started vertically. I have seen many community members do this quite nicely in the past. Here, and here are a couple examples. But I wanted to do something different. That’s when my wife suggested I try something. The idea was to create the shape of a virus in tableau. I immediately fell in love with the idea. I would plot points in time all along the shape and would show events through time around the virus. Below is a snapshot of how it looked.

Dashboard 2.png

I spent considerable time on building the shape with x,y coordinates in excel and tableau. It was challenging but I was proud of it as I did not believe I could do it. Although there was a problem. As is visible from the image, since this was a closed shape, it would get too congested too soon. I would run out of space to show anything in no time. And so I left the idea midway in search of something else.

I knew I still wanted to do a timeline. I decided to give a horizontal version a try. I imagined it to pass through different phases of the eradication journey which I could shade differently. I build a template and loved it.

I started with explaining briefly what Polio is and how wide spread it was at the start of my timeline in 1905. It was endemic in every country on the globe. There were major outbreaks and epidemics all across the planet from early 1910s and onwards. In 1916, 72,000 cats were killed in New York as it was believed they spread the virus. In the 1940s to 1950s thousands of lives were lost in the United States and millions were affected by polio. Sewage and sanitation conditions were still improving which was one of the biggest carriers of the virus.

It was then that a virologist in the United States names Jonas Salk developed a vaccine against polio. It was an inactivated polio vaccine and had an 80-90% success rate. There was a sharp decline in polio cases in the United States following the launch as seen in the image below.

1. Second Polio Outbreak in the US-1952

The IPV or Salk vaccine although successful, had some disadvantages. It would require a health professional to administer the vaccine as it would have to be injected. It proved to be expensive and had limited reach. In 1960, Albert Sabin developed a new Oral Polio Vaccine. This was easy to administer and negated the need to have a health professional administer it.

Successes and spike in vaccination saw a decline in the number of polio cases across the globe. However, the fight is not over yet. Wild Polio Virus is still prevalent in three countries today, namely- Afghanistan, Pakistan, and Nigeria. Until every last child is vaccinated, the risk of spread is still significant. There is a polio outbreak in at least 5 countries today, the most recent case being from Papua New Guinea. All these countries have either a imported a Wild Polio Virus (WPV) from an endemic country or has a circulating Vaccine Derived Polio Virus (cVDPV). For this reason, the Global Polio Eradication Initiative launched the Polio Eradication and Endgame Strategic Plan 2013-2018, who’s one of the four objectives is to withdraw oral polio vaccine which is the main cause of cVDPV.

If polio is eradicated, it would save many lives across the globe and no child would have to live in fear of contracting the disease again. Below is an image of my viz and here’s a link. If you like it, please favorite it in your Tableau Public profile.

POLIO ROAD TO ERADICATION.png

Suraj.

Iterating through my #MakeoverMonday 2018 W21: Premier League Results

This week’s #MakeoverMonday was about the Premier League results. The data was made available through The Guardian and just by looking at the dataset the first thought that crossed my mind was to do a slope graph. But I assumed that would be the case with several others too- and I was right as seen by Michal Mokwinski’s tweet below.

I decided to give it a try regardless because #MakeoverMonday is about practicing and improving your own data viz skills and not to create the most unique chart.

Option 1: Slope Graph

The data was very simple and I was able to put together a slope graph very quickly. I added a purple color to Actual Finishes to make it Premier League themed. But I did not like the overall result too much. Plus I already had another chart brewing in my mind which I kind of liked better (at least in my brain).

Sheet 1

The chart is missing team labels but the point is that each slope is the Actual to Predicted finish of a team in the Premier League. I decided to keep it as an option should my second option not work out.

Option 2: Dumbbell Chart

I started picturing the this data in a dumbbell as I was building the slope. Again, since the data was simple enough, I was able to put together a dumbbell fairly quickly too. Thanks to regular #MakeoverMonday practice, these charts that are not native to Tableau almost feel like they are since I can build them in minutes.

Sheet 2.png

This looked a lot better to me to show this kind of a data.

Benefits

  1. You can see the rank of every team where purple is the Actual Finish and grey the Predicted.
  2. You can also see the gap between Actuals and Predicted, where Manchester City has the same value but Burnley, which was predicted to finish very low, ended up finishing higher.
  3. I decided to adding sorting option so the user can sort as they prefer to see the data set.

 

Hope you enjoy the viz. and learning how I iterate through ideas. Check it out here.

Suraj.

Iterating through my #MakeoverMonday 2018 W20: Percent of Time Europeans spend Stuck in Traffic

The dataset this time was about time spent in congestion in the European cities made available through Euronews. The original dataset from Inrix has data for several cities across the globe.

As mentioned in my previous post, I almost always start a viz with a bar chart. I believe this is the most effective way to depict any data. If I do not do this for the actual data, I at least throw the Number of Records field in Rows or Columns to check the total record count of the dataset. I also use this as a validation against my original source (usually an excel document) to make sure all rows have come through.

I did the same this time. I put the data in a bar chart and sorted it a few different ways to see what works best. The first iteration below is the default look sorted in alphabetical order. Iteration 1.png

I sorted it in the descending order of Country/City and added a combined field between the two parts of the hierarchy that is Country and City. And sorted that in descending order of Hours as well. What that did was it first arranged Country in descending order, then sorted the City within each country in a descending order as well.

Iteration 2

However, I still wasn’t satisfied. I could format it and make it look better but I wasn’t sure I wanted to go with a bar chart. It then hit me that I could do a Waffle Chart. I hadn’t done it in a while. In fact I checked when was the last time I did one and found out that I had done one last as my #MakeoverMonday submission during Week 20 of 2017. Certainly, I didn’t remember how to do one anymore so I decided to find this dashboard and reverse engineer. After spending sometime on it I was still missing some parts when Andy Kriebel’s video tutorial came to my rescue. I remember seeing this the last time I did the waffle as well.

However, as I started working on it, I realized that with the data I had, it would not be possible to make a Waffle. This is because what a waffle does is very similar to a pie or a donut chart which is shows percent of a total (or unit of a total). In this case all I had was the number of hours an average person spent in congestion. I had the ‘unit’. I just didn’t know what the ‘whole’ was. So I went back to the original source to see if more information was available. I was expecting to see a total hours an average person spends on the road for every city. I could not find it. What I did find was what these hours available in the data were in the form of a percent.

Inrix

This was easy. I could calculate the total hours per city based on these two data points and that’s exactly what I did.

That’s all. Once I had the required information, I decided to go back to my waffle and update the viz. to show percent of time spent in traffic. I also added the number of hours out of total hours as a sub-label.

Berlin

After getting this first one right, I simply duplicated the sheet 9 times, one per city and put them together in a dashboard. I tried to accomplish this in one sheet through small multiples but didn’t succeed. If you find/know a way to do that, please let me know as a comment or through twitter (@surajshah212).

Check the interactive version here.

Percent of Time Europeans spend Stuck in Traffic.png

Thanks,

Suraj.

 

 

 

 

 

 

 

 

 

 

 

 

#IronViz: Book vs Movie: Which is Better?

Tableau announced its first feeder of IronViz in April. Based on a survey conducted by Tableau after the Tableau Conference 2017, it has made several changes to the format this year. To detail a couple, one is to allow about 4 weeks time for every feeder. Another is to make the schedule of all 3 feeder rounds available at the same time. Third and my favorite is the selection of the ‘Crowd Favorite’ viz. where they have voters go to their Tableau Public profile and favorite their Iron Viz submissions by hitting the star button. In addition, Tableau will also release detailed results for every participant. All these changes made me really excited to participate this time, not that I am never excited to participate in an IronViz. Check out my last 6 submissions here, here, here, here, here, and here.

The theme this time was Books and Literature. Just like any other IronViz I started off with writing a list of ideas. Some of these were Pulitzer prize winners, ebooks vs physical books, Decline of physical bookstores due to rise of e-commerce etc. I also thought about a few specific books I wanted explore like the Sherlock Holmes series. I also explored word counts in books and found this nice visualization. But I could not find anything sufficient that I could build a visualization on. I thought of another idea to compare Books that have been made into Movies and see if I could find anything. I thought of adding book and movie ratings but the problem was that they were different sources (goodreads and IMDB) and had vastly different audiences. It wasn’t the right way to compare whether a book was better or movie.

As part of this, I was exploring www.goodreads.com. This is a website (now owned by Amazon) where readers can rate, review and recommend books. They can add books to their own collections and mark them as read or add for future reading. Here I bumped into Lists. A list is basically a collection of books on a common theme/topic. Some examples of a list are Best Books of the Decade: 1820; Books that everyone should read at least once; Best Historical Fiction etc. There are two more lists which proved to be useful to me which are: The Book was Better than the Movie and the other is the opposite- The Movie was Better than the Book.

Upon going through the lists I immediately had a couple ideas I wanted to test. I scraped the data and downloaded in excel. I cleaned it up as needed in Alteryx, ready for consumption in Tableau.

 

One of the first things I looked at was the top voted books in both Top Votedthe lists. On the first list, 4 out of top 5 books were from the Harry Potter series. I am a Harry Potter fan myself and wasn’t surprised to see this. Fans who have read the books often feel the books go into more detail than the movies.

I then looked at the top 5 books on the other list where people think the movies were better than the books. I was surprised to see- The Lord of the Rings up there. However, I have only seen the movies and loved them, but I did not know readers disliked the books.

Next I looked at the disparity in the votes. I calculated the Percent of all Votes per book that went in the first list vs the second. This means, I could only look at books that were present in both the lists. I added a Sort By parameter to allow the user to sort the data as needed.

Although what I soon realised was that calculating the data based on % of Total had it’s disadvantages. If a book only had a few votes in total, then the data would be skewed. For example, a book only has 4 votes- 3 for books and 1 for movies, then it would mean that 75% of people who vote liked the book better. This is not wrong, but due to the limited number of votes, one might get the wrong impression. So, I decided to add a filter that would allow the user to look at the percents only after allowing a minimum vote threshold. By default I set to look at every book that had a minimum of a 1000 votes.

Tooltip

I also added a simple table viz in tooltip to look at not only the percent of total votes but also the number so once can make an informed decision. Further down, I wanted to see how these books are rated, agnostic of the lists. And so I decided to add the Average and the Number of Ratings a book has received.

While preparing my viz so far I started seeing a pattern amongst books that were part of a series. I noticed that books that were the first of a series had been rated by more users than books that launched after. So I decided to look at books within a series. I realised my hypothesis was true.

Sequels.PNG

Books that were first in a series had the most ratings. This makes sense because when one starts to read a series, one would start at the beginning. Depending on whether they like it or not, they move on to the next book. If they did not like it, they won’t read the next one. And so the ratings for the first book are always higher. However, I was surprised to see the pattern break in the Hannibal series book- The Silence of the Lambs. This book was the 2nd of the series and it has received the most ratings.

I wanted to check if this same hypothesis held true when looking at the average rating a book received. It did not. Not in every case at least. Among several series, the books that came later on had higher ratings- an example being Harry Potter and the Deathly Hallows (7th and the last in the series).

If you like my viz. don’t forget to favorite it through your Tableau Public profile.

Dashboard

Thanks,

Suraj.

 

 

 

Iterating through my #MakeoverMonday 2018 W18: Total Annual Bee Colony

This week #MakeoverMonday collaborated with #VizForSocialGood. The data was provided by Bee Informed and dealt with loss of bee population.

Whenever I start making a viz. I always go through several iterations. Only rarely am I able to make a viz that works right the first time. In this blog I am going discuss my iterations for my latest viz.

1. Bar chart

This is by far my most used chart (at least for #MakeoverMonday last year) and also the one I always start my viz with. In my opinion every dataset can be depicted using a bar chart and it also aids in thinking. I start by sorting it in different ways (mostly descending then ascending) to identify top and bottom values. In this case I did the same and noticed that Iowa has seen the highest overall loss in Bees in the last 7 years. Although the data is depicted incorrectly below because it adds up percentage across all 7 years, it still gives an insight to work with should I go in that direction. I then broke it down by all the years to see if I find anything, but it did not lead me anywhere specific.

Sheet 0Sheet 0.1

2. Small Multiple Line

I have done Small Muliples a few times and I enjoy them. When I started, I manually created (or copied) every sheet (like in this viz). But recently I learnt a way to create these easily using the formulae in this blog by Information Lab. Ever since, I have been able to create the same effects in a more dynamic and easier manner (like here and here).

For the same reason, I thought of giving it a try again this time. I did like what I saw and also noticed that there was data missing for certain years for certain states (look at the disconnects in cell 1-1, 4-0, 5-6 etc. I decided to possibly use this if nothing else worked out.

Sheet 1

3. Dumbell

I like the dumbell chart as it shows the gaps between multiple points along the same dimension (here state). I noticed there is a huge variance in some states between the lowest and the highest points but I missed out on the sequence of years to see the trend.

Sheet 2

4. Line Graph (with first and last years)

I thought of seeing the trend between the first and the last year. A decline would mean the loss of bees has decreased (a good sign) and a rise would mean the loss of bees has increased. I did like this outcome a lot because I was able to see the states which had a decrease and increase in bee population. However, I could not easily point out the state with the largest fall or rise. But this gave me an idea.

 

Sheet 3 (4)

5. Small multiple area

It was Tableau Conference 2017 at the #MakeoverMonday live session where I remember Ann Jackson had made this great visualization on obesity in America. I had already set aside my small multiple line graph as an option but I also only wanted to show the first and last years in the data. That’s when I started working on a viz. similar to Ann’s. Below are a couple iterations along with one that also evolved into a United States map combined with an area chart per state.

Sheet 3Sheet 3 (3)Sheet 3 (2)

But I eventually landed on the small multiple area. I sorted it to show the one’s with increase in Loss of Bee Colonies first (in alphabetical) then the ones showing decrease in Bee Colonies. The ones with only a single vertical line showing are the states where the data for the 2010/11 year was not available. Check out the complete viz below and click here for the interactive version.

Dashboard 1.png

Cheers,

Suraj.