I did make it to the Indiewebcamp/Homebrew meeting this evening after all, in Portland this time, since I happened to be passing through.

I was able to show off some of the work I've been doing on embedding data-driven graphs/charts in the Web versions of in-progress academic writing: d3.js generating SVG tables in the browser, but also saving SVG/PDF versions which are used as figures in the LaTeX/PDF version (which I still need for sharing the document in print and with most academics). I need to write a brief blog post describing my process for doing this, even though it's not finished. In fact, that's a theme; we all need to be publishing code and writing blog posts, especially for inchoate work.

Also, I've been thinking about pseudonymity in the context of personal websites. Is there anything we need to do to make it possible to maintain different identities / domain names without creating links between them? Also, it may be a real privacy advantage to split the reading and writing on the Web: if you don't have to create a separate list of friends/follows in each site with each pseudonym, then you can't as easily be re-identified by having the same friends. But I want to think carefully about the use case, because while I've become very comfortable with a domain name based on my real name and linking my professional, academic and personal web presences, I find that a lot of my friends are using pseudonyms, or intentionally subdividing

Finally, I learned about some cool projects.

  • Indiewebcamp IRC logs become more and more featureful, including an interactive chat client in the logs page itself
  • Google Web Starter Kit provides boilerplate and a basic build/task system for building static web sites
  • Gulp and Harp are two (more) JavaScript-based tools for preparing/processing/hosting static web sites

All in all, good fun. And then I went to the Powell's bookstore dedicated just to technical and scientific books, saw an old NeXT cube and bought an old book on software patterns.

Thanks for hosting us, @aaronpk!
— Nick

I'll try to make it tonight for Homebrew meeting. Maybe I can get "fragmentions" (ugh, terminology) or hypothes.is annotations on academic papers working beforehand.


P.S. While last time I RSVP'ed I worried that these irrelevant posts in my feed were needless, I ended up getting multiple emails with really valuable responses (about hiding certain types of posts and about the academic writing on the web project in general). So I'm persuaded not to worry urgently about hiding them from my index page or feed.

Subject: imperialist/sexist maps
From: nick@npdoty.name
Date: 7/27/2014 03:05:22 PM To: all my friends with whom I've been discussing maps and imperialism Bcc: http://bcc.npdoty.name/

I’ve been thinking about the qualities of maps — imperialist, humanitarian, democratizing — and the demographics of cartographers, neogeographers and Web mapping folks.1 Did you all see this article on the possibilities of sexism in street maps? I’m encouraged to see writing on the topic, but I see two important lessons to remember.

First, let’s use data, quantitative and qualitative, to investigate sexism and fairness in maps.

The FastCo article makes a specific claim that OpenStreetMap “may contain more strip clubs than day care centers”. Getting an exhaustive answer to that question would take some time, but it doesn’t take long to gather some initial data.2 Because OpenStreetMap is actually more an accessible database than it is a map, we can issue queries and get statistics. In fact, we have an awesome opportunity to ask and answer some of these questions of fairness in the map, such that it’s worth spending some time investigating.

Strip clubs are typically tagged as “amenity=stripclub”, a well-defined tag, of which there are 455 in OSM. Day care centers are a little more difficult to count (more on this later) because people use different tags for them: one might be “amenity=kindergarten” which covers pre-school centers (what we often call “kindergarten” in the US) and services that look after young children but which aren’t educational. I count 124,197 of these in OSM, but I can’t quickly tell how many are pre-school (what I often call kindergarten) and which are child care centers. A few years ago there was a proposal to formally start using amenity=childcare to refer to child care facilities, a proposal that was rejected by some voters who thought it overlapped too much with amenity=kindergarten. Nevertheless, OSM users can use whatever tags they want,3 and many of them are using amenity=childcare, there are 1,329 instances (triple the amenity=stripclub count, although that tag is more formally approved). There are 504 instances of the more obscure syntax of social_facility:for=child and :for=juvenile, although I suspect many of those are covering group homes, orphanages, community centers and various social work facilities for children.

(We could also search OSM by name to try to count child care centers and strip clubs. There seem to be roughly 1100 with “day care” in the name and 640 with “child care”, but those names are likely very English specific and don’t provide for good comparisons with stripclubs, which I expect rarely include “strip club” in the venue name. I can find about 20 that use some variation of “gentlemen’s club”.)

But these numbers don’t discount the concern that the distribution of mapped venues may be skewed in a way that might be sexist in intent or impact. Rather, I suspect that statistics would actually bring the problem into greater relief. For example, if business license records show 100 or 1000 times as many child care centers as strip clubs in many jurisdictions and the OSM database only shows 5 times as many child care centers, that would be an important result. Rather than comparing only two numbers, we would do better to compare the proportions of different venues to some independent measure to see which are disproportionately present or missing. Side note: if you gather that data, give it to OSM volunteers so they can identify where or why that skew is happening.4 We would learn more by comparing other categories as well: while the stripclub/childcare example may be relevant (in particular because of the taxonomic question, see below), not all and not only women care about childcare services.

Beyond statistical counts of features, we can and should use qualitative methods to evaluate sexism in the map and in the community. For example, that rejected 2011 proposal for a recommended amenity=childcare tag revealed that the (mostly male) OSM editors may discount the need for a separate tag, to the detriment of users who would benefit from it. Because OSM conducts votes on these taxonomic questions, complete with explanations memorialized in wiki form, researchers and the public can review the debate, comment by comment. That proposal is also an interesting case because it reveals a linguistic difference. As I understand it, “kindergarten” is used differently in Germany (which many OSM editors call home), where a day care (Kindertagesstätte) might be more closely related. None of this is my discovery, so don’t take my word for it:

  • Dr. Monica Stephens gave a nice (short, I just watched it, it’s awesome, go watch it!) talk on exactly this topic in 2012 — you can watch the video online — comparing the sparse selection of childcare tags to the large diversity of bar/nightclub/swingerclub tags. See also her paper from 2013.5
  • The discussion page on the OSM wiki has a good “post-mortem” discussion of the childcare proposal that’s worth reading.
  • This blog post from April does some numerical analyses after the childcare tag controversy, and also tries to analyze the presence of commercial venues that might tend to bias towards one gender or another, with mixed results.

The second lesson to remember is that maps always reflect perspectives’ of their creators; there is no present or future “objective” map.

But unlike Google Maps, which rigorously chronicles every address, gas station, and shop on the ground, OpenStreetMap’s perspective on the world is skewed by its contributors.

I don’t dispute the latter clause: OSM is absolutely skewed by its contributors. However, I don’t see that maps that don’t rely on crowdsourced data (whether it’s Google Maps or the USGS or any other) are, in contrast, objective in a way that OpenStreetMap could never be.

All maps are skewed by the selections of the humans who make them or, increasingly, the technology that humans build in order to make them. In fact, one might define a map as exactly the process of selecting some geographic data and leaving out all others. This American Life illustrates this point beautifully.6

Google Maps may have a relatively exhaustive accounting of commercial venues. Even in that incredibly narrow category, though, consider last week’s article on mistakes in the Google business directory from malicious or mistaken reports. I say “narrow” because what about the parts of our physical world that aren’t commercial venues or roads for automobiles? Here’s a quick list of some of the interesting feature types in OpenStreetMap that aren’t as easy to find in Google Maps.7

  • Car-sharing locations: as a user of the CityCarShare nonprofit, I’m pleased that many CityCarShare locations are available in the OSM database and are typically rendered on the basemap. Searching in my neighborhood, I find that Google Maps actually does let you find car-sharing locations, but maybe only for Zipcar?

  • Benches: OSM has the locations of 400,000 benches! (Mostly in Europe and some in the US and I bet this isn’t nearly exhaustive enough, but I love that it’s there.)

  • Mailboxes: While the USPS can let you search online for locations of those blue mailboxes, Google Maps only directs you to FedEx or Mailboxes Etc venues. In Oakland there aren’t many of these marked in OSM yet, but when I was traveling in Brussels, I thought it was pretty awesome to be able to pull up the exact location of the nearest red postbox. (161,982 in OSM.)

  • Fire Hydrants: Almost a quarter million of these in OSM. Maybe they’d be useful for Adopt-a-Hydrant websites, without a city having to import all the data themselves.

  • Wheelchair Accessibility: This one is a real challenge. It would be awesome if streets, sidewalks, businesses, toilets could all have metadata about their wheelchair accessibility, so that, for example, your navigation software could tell you how to get from point A to point B without ever directing you to take stairs, or cross a road where there isn’t a curb ramp. OSM has 600,000 tags with wheelchair accessibility metadata, but even that surely isn’t nearly enough. (OSM Wiki has a page on wheelchair routing and there are also some Google Maps projects for crowdsourcing that kind of data.)

  • Trees: OSM has 3.4 million individual trees mapped. Ha, awesome. I really want to map all the trees in our neighborhood in order to make a more beautiful and detailed print map of the area. (See also, the Urban Forest Map of all 88,000 trees on San Francisco's streets.)

Personally, I like to use OpenStreetMap for its detailed data on hiking trails, including gates and fences along the route. Others use OSM for bicycle routing and Google Maps also has a different mode for viewing bike lanes. But it should be clear that no single map contains everything, and certainly not everything in an objective way that doesn’t involve the perspectives of both the designer of the map and the contributor of the data. Even the distribution of categories themselves is a pragmatic, rather than essentialist, exercise.

It can be tempting, and perhaps more so now with maps that are more databases than static cartographic projections, to believe that a map can contain everything, such that claims of sexism could simply be refuted. Maybe if we just had more and more data, all the data, then the perspective of the cartographer would disappear as it became more and more precise, until the map itself contained everything in the territory. Indeed, digital maps have made it possible to represent different maps in different situations; to show different ownership of the same territory based on who’s looking at the map, for example. But mapping, like any data collection and analysis project, will always have perspectives. We can do better or worse at being aware of these perspectives and adjusting our practices to address disparities in the design of maps, but we shouldn’t imagine that one day there will be such an authoritative source that we can stop asking whether the map is sexist or how to make it less so.

Please forgive my verbose enthusiasm. Yes, of course, I’m super into maps, but I can’t help but think that these same lessons will arise in every data project we pursue.

Thanks for reading my ramblings,

P.S. And thanks to Brendan, Geoff, Julie and Zeina for helping to clarify some of this before I posted it publicly.

  1. In short, is my vague impression correct that mapping technology meetings are more disproportionately white even than other technology-focused communities? Is there greater representation of Brits and Americans? And if so, why, and what are the implications?

  2. A commenter on the FastCo site has already pointed this out in brief, but I’ll share my data with some links anyway.

  3. This point is extremely important — a case where the users/implementers can behave in ways that contradict the attempt to standardize. It’s a good check on what I think was a real mistake in not approving the tag. I’m not sure, however, if this voting affects the common renderings of the map, like at openstreetmap.org.

  4. As one friend put it, and maybe this is the issue with many disparities in tech, the community is good-hearted but just naive.

  5. Monica Stephens, “Gender and the Geoweb: Divisions in the Production of User-Generated Cartographic Information,” GeoJournal 78, no. 6 (2013): 981–96, http://link.springer.com/article/10.1007/s10708-013-9492-z.

  6. The whole episode is great, but the first few minutes of the prologue are enough, and lovely.

  7. This might seem like I’m poking fun or diminishing Google’s awesome map, but I really don’t mean it that way at all. Different maps work for different uses, and while I think there’s often a healthy competition among Web mappers, these are just examples.

Sure, I'm in for tonight's Homebrew meeting. I don't have a ton of progress to report, but I've been working on academic writing that can be simultaneously posted to the Web (where it can be easily shared and annotated) and also formatted to PDF via LaTeX. Oh, and I'm excited to chat with people about OpenPGP for indieweb purposes.

P.S. While I like the idea of posting RSVPs via my website, it seems a little silly to include them in RSS feeds or the blog index page like any other blog entry. What are people doing to filter/distinguish different kinds of posts?

Thanks for writing. I’m inspired to write a couple of comments in response.

First, are academic, professional ethicists as irrelevant as you suggest? (Okay, that’s a bit of a strawman framing, but I hope the response is still useful.)

Floridi is an interesting example. I’m also a fan of his work (although I know him more for his philosophy of information work — I like to cite him on semantics/ontologies, for example (Floridi 2013) — rather than his ethics work), but he’s also in the news this week because he’s on Google’s panel of experts (their “Advisory Council”) for determining the right balance in processing right-to-be-forgotten requests.

Also, I think we see the influence of these ethical and other academic theories play out in practical terms, even if they’re not cited in a direct company response to a particular scandal. For example, you can see Nissenbaum’s contextual integrity theory of privacy (Nissenbaum 2004) throughout the Federal Trade Commission’s 2012 report on privacy (FTC 2012), even though she’s never explicitly cited. And, forgive me for rooting for the home team here, but I think Ken and Deirdre’s research of “on the ground” privacy (Bamberger and Mulligan 2011) played a pretty prominent role in the White House framework for consumer privacy (“Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy” 2012).

But second, I’m even more excited about your conclusion. Yes, decentralize!, despite the skepticism about it (Narayanan et al. 2012). But more than just repeating that rallying cry (which I still think needs repeating – I’m trying to support #indieweb as my part of that), is the form of the problem.

I think a really cool project that everybody who cares about this should be working on is designing and executing on building that alternative to Facebook. That’s a huge project. But just think about how great it would be if we could figure out how to fund, design, build, and market that. These are the big questions for political praxis in the 21st century.

Politics in our century might be defined by engineering challenges, and if that’s true, then it emphasizes even more how coding is not just entangled with, but is itself a question of, policy and values. I think our institution could dedicate a group blog just to different takes on that.


Some references:

Bamberger, KA, and DK Mulligan. 2011. “Privacy on the Books and on the Ground.” Stanford Law Review. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1568385.

“Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy.” 2012. White House, Washington, DC. http://www.whitehouse.gov/the-press-office/2012/02/23/fact-sheet-plan-protect-privacy-internet-age-adopting-consumer-privacy-b.

Floridi, Luciano. 2013. “Semantic Conceptions of Information.” In Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Spring 201. http://plato.stanford.edu/archives/spr2013/entries/information-semantic/.

FTC. 2012. “Protecting Consumer Privacy in an Era of Rapid Change Recommendations for Businesses and Policymakers.” Technical report March. Federal Trade Commission. http://ftc.gov/os/2012/03/120326privacyreport.pdf.

Narayanan, Arvind, Vincent Toubiana, Helen Nissenbaum, and Dan Boneh. 2012. “A Critical Look at Decentralized Personal Data Architectures.” http://arxiv.org/abs/1202.4503.

Nissenbaum, Helen. 2004. “Privacy as Contextual Integrity.” Washington Law Review 79 (1): 101–139. http://heinonlinebackup.com/hol-cgi-bin/get_pdf.cgi?handle=hein.journals/washlr79&section=16.