{"id":323,"date":"2009-07-13T10:19:43","date_gmt":"2009-07-13T15:19:43","guid":{"rendered":"http:\/\/dilettantes.code4lib.org\/?p=323"},"modified":"2009-07-13T10:19:43","modified_gmt":"2009-07-13T15:19:43","slug":"the-concept-of-where","status":"publish","type":"post","link":"https:\/\/rossfsinger.me\/blog\/2009\/07\/the-concept-of-where\/","title":{"rendered":"The Concept of Where"},"content":{"rendered":"<p>There were three main reasons that I took the old lcsh.info data that I had lying around and made <a href=\"http:\/\/lcsubjects.org\/\">http:\/\/lcsubjects.org<\/a>:<\/p>\n<ol>\n<li>There were projects (including internal Talis ones) that really wanted to use that data and impatience was growing as to when the Library of Congress would launch id.loc.gov.<\/li>\n<li><a href=\"http:\/\/www.ldodds.com\/\" target=\"_blank\">Leigh Dodds<\/a> had just released <a href=\"http:\/\/pho.rubyforge.org\/rdoc\/index.html\" target=\"_blank\">Pho<\/a> and needed testers.\u00c2\u00a0 I had also, to date, done virtually nothing interesting with the Platform and wanted a somewhat turnkey operation to get started with it.<\/li>\n<li>While it&#8217;s great that the Library of Congress has made this data available, what is <em>really<\/em> interesting is seeing how this stuff relates to other data sets.\u00c2\u00a0 It&#8217;s unlikely that LoC will be too open to experimentation in this regard, these are, after all, authorities, so LCSubjects.org seemed a good place to provide both this experimentation and community-driven editing (which will, hopefully, be coming soon &#8212; Per an idea proposed by <a href=\"http:\/\/slashdotrobot.wordpress.com\/\" target=\"_blank\">Chris Clarke,<\/a> I would like to store user-added changes into their own named graphs, but that support needs to be added to the Platform) &#8211; which will, hopefully, make it more dynamic and interesting, while still deferring &#8220;authority&#8221; to the Library of Congress.<\/li>\n<\/ol>\n<p>In the pursuit of number three, I had a handful of what I hoped were fairly &#8220;low hanging fruit&#8221; projects to help kickstart this process and actually make LCSubjects <em>linked<\/em> data instead of just <em>linkable<\/em> data (since that was fairly redundant to id.loc.gov\/authorities\/, anyway).\u00c2\u00a0 I have rolled out the first of these, which was an attempt to provide some sense of geocoding to the geographic headings.<\/p>\n<p>There are just over 58,000 geographic subject headings in the current dump that LoC makes available.\u00c2\u00a0 11,362 of these have a \u00e2\u0081\u00b0 symbol in them (always in a non-machine readable editorial note).\u00c2\u00a0 I decided to take this subset and see how many I could identify as a single geographic &#8220;point&#8221; (i.e. a single, valid latitudinal and longitudinal coordinate pair), converted those from degree, minute, second format to decimal format and then saw how many of those had a direct match to points in <a href=\"http:\/\/www.geonames.org\/\" target=\"_blank\">Geonames<\/a>.<\/p>\n<p>Given that these are entered as prose notes, the matching was fairly spotty.\u00c2\u00a0 I was able to identify 9,127 distinct &#8220;points&#8221;.\u00c2\u00a0 837 concepts had either too many coordinates (concepts like <a href=\"http:\/\/lcsubjects.org\/subjects\/sh86002144#concept\" target=\"_blank\">this one<\/a> or <a href=\"http:\/\/lcsubjects.org\/subjects\/sh85102309#concept\" target=\"_blank\">this one<\/a>, for example) or only 1.\u00c2\u00a0 It&#8217;s messy stuff.\u00c2\u00a0 This also means there are about another 1,000 that missed my regex completely (\/[0-9]*\u00e2\u0081\u00b0[^NSEW]*[NSEW]\\b\/), but I haven&#8217;t had time to investigate what these might look like.\u00c2\u00a0 Given that these are just text notes, though, I was pretty surprised at the number of actual positive matches I got.\u00c2\u00a0 These are now available in the triples using the <a href=\"http:\/\/www.w3.org\/2003\/01\/geo\/\" target=\"_blank\">Basic Geo (WGS84 lat\/long) vocabulary<\/a>.<\/p>\n<p>Making the links to Geonames wasn&#8217;t nearly as successful.\u00c2\u00a0 Only about 197 points matched.\u00c2\u00a0 Some of those that did could be considered <a href=\"http:\/\/lcsubjects.org\/subjects\/sh85015863#concept\" target=\"_blank\">questionable<\/a> (click on the geonames link to see what I mean).\u00c2\u00a0 <a href=\"http:\/\/lcsubjects.org\/subjects\/sh85010942#concept\" target=\"_blank\">Others are pretty perfect<\/a>.<\/p>\n<p>All in all, a pretty successful experiment.\u00c2\u00a0 I&#8217;d like to take another pass at it and see how many prefLabels or altLabels match to the Geonames names and add those, as well.\u00c2\u00a0 Also, just after I added the triples, there was an announcement for <a href=\"http:\/\/linkedgeodata.org\/About\" target=\"_blank\">LinkedGeoData.org<\/a>, which will probably provide much better wgs84:location coverage (I can do searches like http:\/\/linkedgeodata.org\/triplify\/near\/%latitude%,%longitude%\/1 which would find points of interest within 1 meter of my coordinate pair).\u00c2\u00a0 So stay tuned for those links.<\/p>\n<p>Lastly, one of the cooler by-products of adding these coordinates is <a href=\"http:\/\/api.talis.com\/stores\/lcsh-info\/services\/sparql?query=PREFIX+wgs84%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2003%2F01%2Fgeo%2Fwgs84_pos%23%3E%0D%0APREFIX+xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0ASELECT+%3Fconcept%0D%0AWHERE{%0D%0A++%3Fconcept+a+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23Concept%3E.%0D%0A++%3Fconcept+wgs84%3Alat+%3Flatitude.%0D%0A++%3Fconcept+wgs84%3Along+%3Flongitude.%0D%0A++FILTER%28%28%28xsd%3Afloat%28%3Flatitude%29+%3E%3D%2234.983333333333333%22^^xsd%3Afloat%29+%26%26+%28xsd%3Afloat%28%3Flatitude%29+%3C%3D+%2236.683333333333333%22^^xsd%3Afloat%29%29+%26%26%28%28xsd%3Afloat%28%3Flongitude%29+%3C%3D+%22-81.65%22^^xsd%3Afloat%29+%26%26+%28xsd%3Afloat%28%3Flongitude%29+%3E%3D+%22-90.316666666666667%22^^xsd%3Afloat%29%29%29%0D%0A}\" target=\"_blank\">functionality like this<\/a> which roughly gives you all of the LCSH with coordinates found roughly inside the geographic boundaries of Tennessee (TN is a parallelogram, so this box style query isn&#8217;t perfect).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There were three main reasons that I took the old lcsh.info data that I had lying around and made http:\/\/lcsubjects.org: There were projects (including internal Talis ones) that really wanted to use that data and impatience was growing as to when the Library of Congress would launch id.loc.gov. Leigh Dodds had just released Pho and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[65],"tags":[],"class_list":["post-323","post","type-post","status-publish","format-standard","hentry","category-linked-data"],"_links":{"self":[{"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/posts\/323","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/comments?post=323"}],"version-history":[{"count":1,"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/posts\/323\/revisions"}],"predecessor-version":[{"id":324,"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/posts\/323\/revisions\/324"}],"wp:attachment":[{"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/media?parent=323"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/categories?post=323"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rossfsinger.me\/blog\/wp-json\/wp\/v2\/tags?post=323"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}