I am trying to visualize the language tags on github repository data. I have names of 16k github repositories, and all the languages associated with each one of them. Below is a chord diagram I came up with.

However, I find that the chord diagram is not a very good representation of the data because

  1. It does not show one-to-many relationships. For example, most of the repositories with JavaScript will have both html and css. This is represented as JavaScript-HTML and JavaScript-css, but not all together.

  2. The size of the arc does not represent the number of repos in the dataset. For example, the number of repos that use JavaScript is about 6k in my dataset, however, the arc is about 17k. This is because of multiple languages in each repo. For example, if a repo has JavaScript, HTML, CSS and Python, the length of the arc would be 3.

Do you have any suggestions for a better visualization of this data, Thanks. This particular one is a d3 chord diagram, but the visualization can be done using any package (Python or JavaScript or R).

Github repository languages chord diagram

Here is the dataset if you are interested, do this query on Google BigQuery

  `bigquery-public-data.github_repos.languages` languages
  `bigquery-public-data.github_repos.sample_repos` sample_repos
  languages.repo_name = sample_repos.repo_name
WHERE sample_repos.repo_name IN (
  SELECT repo_name[OFFSET(0)]
  FROM `bigquery-public-data.github_repos.commits`)
ORDER BY sample_repos.watch_count DESC
LIMIT 16000

