ProgrammableWebFirst Data Enlists Apigee to Handle Electronic Payments

The Apple Pay platform for payments that went live this week has boosted interest in payment systems of all types. But while Apple is clearly poised to become a leader in the category, the impact of a potentially widely used payment system is just beginning to be felt.

Amazon Web ServicesMulti-AZ Support / Auto Failover for Amazon ElastiCache for Redis

Like every AWS offering, Amazon ElastiCache started out simple and then grew in breadth and depth over time. Here's a brief recap of the most important milestones:

  • August 2011 - Initial launch with support for the Memcached caching engine in one AWS Region.
  • December 2011 - Expansion to four additional Regions.
  • March 2012 - The first of several price reductions.
  • April 2012 - Introduction of Reserved Cluster Nodes.
  • November 2012 - Introduction of four additional types of Cache Nodes.
  • September 2013 - Initial support for the Redis caching engine including Replication Groups with replicas for increased read throughput.
  • March 2014 - Another price reduction.
  • April 2014 - Backup and restore of Redis Clusters.
  • July 2014 - Support for M3 and R3 Cache Nodes.
  • July 2014 - Node placement across more than one Availability Zone in a Region.
  • September 2014 - Support for T2 Cache Nodes.

When you start to use any of the AWS services, you should always anticipate a steady stream of enhancements. Some of them, as you can see from list above, will give you additional flexibility with regard to architecture, scalability, or location. Others will improve your cost structure by reducing prices or adding opportunities to purchase Reserved Instances. Another class of enhancements simplifies the task of building applications that are resilient and fault-tolerant.

Multi-AZ Support for Redis
Today's launch is designed to help you to add additional resilience and fault tolerance to your Redis Cache Clusters. You can now create a Replication Group that spans multiple Availability Zones with automatic failure detection and failover.

After you have created a Multi-AZ Replication Group, ElastiCache will monitor the health and connectivity of the nodes. If the primary node fails, ElastiCache will select the read replica that has the lowest replication lag (in other words, the one that is the most current) and make it the primary node. It will then propagate a DNS change, create another read replica, and wire everything back together, with no administrative work on your side.

This new level of automated fault detection and recovery will enhance the overall availability of your Redis Cache Clusters. The following situations will initiate the failover process:

  1. Loss of availability in the primary's Availability Zone.
  2. Loss of network connectivity to the primary.
  3. Failure of the primary.

Creating a Multi-AZ Replication Group
You can create a Multi-AZ Cache Replication Group by checking the Multi-AZ checkbox after selecting Create Cache Cluster:

A diverse set of Availability Zones will be assigned by default. You can easily override them in order to better reflect the needs of your application:

Multi-AZ for Existing Cache Clusters
You can also modify your existing Cache Cluster to add Multi-AZ residency and automatic failover with a couple of clicks.

Things to Know
The Multi-AZ support in ElastiCache for Redis currently makes use of the asynchronous replication that is built in to newer versions (2.8.6 and beyond) of the Redis engine. As such, it is subject to its strengths and weaknesses. In particular, when a read replica connects to a primary for the first time or when the primary changes, the replica will perform a full synchronization with the primary. This ensures that the cached information is as current as possible, but it will impose an additional load on the primary and the read replica(s).

The entire failover process, from detection to the resumption of normal caching behavior, will take several minutes. Your application's caching tier should have a strategy (and some code!) to deal with a cache that is momentarily unavailable.

Available Now
This new feature is available now in all public AWS Regions and you can start using it today. The feature is offered at no extra charge to all ElastiCache users.

-- Jeff;

ProgrammableWebConcur Highlights Uber & Airbnb APIs at DevCon

Business travel API platform Concur is hosting the developer conference The Perfect Trip DevCon 2014 in San Francisco on Thursday. This is the second year that Concur — which was recently acquired by the enterprise provider SAP — has hosted an annual developer-focused event aimed at encouraging new apps to be built to service the $1.12 billion business travel industry.

ProgrammableWebToday in APIs: Qucit Offers Predictive Bikeshare Availability API

Qucit offers predictive bikeshare availability API for the town of Bordeaux. SAP HANA launches API and other cloud tools. Plus: LinguaSys launches natural language API portal for developers, and Code Chica to hold a hackathon for teen girls this weekend.

ProgrammableWeb: APIsSpreadSheetSpace

SpreadSheetSpace uses REST API to allow the user to link Excel sheets online. This app allows transformation of Microsoft Excel into a live data analysis tool through linking it to corporate data in a secure and controlled way. Due to the PKI encryption, which allows full privacy and selective sharing of Excel cells, the API is served over HTTPS , therefore HTTP is not supported. SpreadSheetSpace is in beta.
Date Updated: 2014-10-24
Tags: [field_primary_category], [field_secondary_categories]

ProgrammableWebZiftr Provides Cryptocurrency API Implementation Advice

Will bitcoin become a prominent part of online retail? A growing number of companies believe so and are building cryptocurrency-enabled products and APIs to support retail applications.

Amazon Web ServicesOpenID Connect Support for Amazon Cognito

This past summer, we launched Cognito to simplify the task of authenticating users and storing, managing, and syncing their data across multiple devices. Cognito already supports a variety of identities — public provider identities (Facebook, Google, and Amazon), guest user identities, and recently announced developer authenticated identities.

Today we are making Amazon Cognito even more flexible by enabling app developers to use identities from any provider that supports OpenID Connect (OIDC). For example, you can write AWS-powered apps that allow users to sign in using their user name and password from Salesforce or Ping Federate. OIDC is an open standard enables developers to leverage additional identity providers for authentication. This way they can focus on developing their app rather than dealing with user names and passwords.

Today's launch adds OIDC provider identities to the list. Cognito takes the ID token that you obtain from the OIDC identity provider and uses it to manufacture unique Cognito IDs for each person who uses your app. You can use this identifier to save and synchronize user data across devices and to retrieve temporary, limited-privilege AWS credentials through the AWS Security Token Service.

Building upon the support for SAML (Security Assertion Markup Language) that we launched last year, we hope that today's addition of support for OIDC demonstrates our commitment to open standards. To learn more and to see some sample code, see our new post, Building an App using Amazon Cognito and an OpenID Connect Identity Provider on the AWS Security Blog. If you are planning to attend Internet Identity Workshop next week, come meet the members of the team that added this support!

-- Jeff;

ProgrammableWebProgress Software Acquires Telerik, Gains Developer Community Depth

As part of an effort to shore up both the front-end and back-end of its application development platforms, Progress Software will acquire Telerik for $262.5 million.

Norman Walsh (Sun)The short-form week of 13–19 Oct 2014

<article class="essay" id="content" lang="en">

The week in review, 140 characters at a time. This week, 22 messages in 21 conversations. (With 2 favorites.)

This document was created automatically from my archive of my Twitter stream. Due to limitations in the Twitter API and occasional glitches in my archiving system, it may not be 100% complete.

In a conversation that started on Saturday at 06:54pm

@sayveiga Looking awfully tasty!—@ndw
@ndw @sayveiga c u next Labor Day?—@sayveiga
@sayveiga If that's an invitation... :-)—@ndw

In a conversation that started on Monday at 10:21am

Not being able to alt+click in VirtualBox is a bit of a problem. Searching the web wasn't fruitful. Anyone else encountered and solved it?—@ndw
@ndw Run onscreen keyboard in guest to keep alt pressed & click as always or select another keyboard layout (say dvorak) in guest—@VSChawathe
@ndw Which platform? I can Alt+click fine in full screen mode, windows host, linux guest. Auto Capture Keyboard. What happens when you do?—@nsushkin
@nsushkin Linux host, windows guest. The cursor (in Photoshop) changes, but clicking has no effect.—@ndw
@ndw Maybe your Linux window manager captures Alt+click before VirtualBox can process it. Search for "linux disable alt click move window"—@nsushkin
@nsushkin That might be it. Alas, all attempts to change that behavior have failed. Compiz settings thing-a-ma-bob always reverts to default—@ndw

Monday at 10:51pm

RT @avernet: A 6 yo girl we didn't know tells us about her favorite show, and that we can catch it "on Netflix or Amazon Video". TV network…—@ndw

Monday at 10:58pm

RT @ealvarezgibson: The big glass obviously just worked really hard to become a self-made glass. http://t.co/CbukNHM0ud —@ndw

Tuesday at 06:23am

RT @mdubinko: The future won't be what it once was.—@ndw

Tuesday at 06:33am

RT @individeweal: Dear internet: use more actual words, fewer videos and pictures of words http://t.co/YMA22AkjOj —@ndw

In a conversation that started on Tuesday at 06:43am

@ndw Please tell me that I am not fibbing when I tell people that Docbook docs are maintained in Docbook.—@maltbyd
@maltbyd They'd be maintained in something else!? No, you're not fibbing.—@ndw
@ndw I just knew you'd eat your own dogfood. Sadly it seems other document content modelers don't. #S1000D —@maltbyd

Tuesday at 07:24am

So, in AngularJS, <html ng-app> works but <html ng-app="ng-app"> does not? It's some weird metasyntactic token not a shorttag minimization?—@ndw

Tuesday at 09:36am

"Sometimes the only choices you have are bad ones, but you still have to choose."—@ndw

Tuesday at 09:17pm

A French Tarot scoring web app, http://t.co/FYrkzbjjKz Because reasons.—@ndw

Tuesday at 10:02pm

FAV
The American Ebola problem may end up costing .01% of the defense budget to limit fatalities to .001% of the obesity epidemic. #dontpanic —@mdubinko

Wednesday at 07:12am

XML Stars, the journal is out! http://t.co/lNS6ggKpyZ Stories via @ndw @georgebina —@dominixml

In a conversation that started on Wednesday at 02:20pm

Vendor X writes to my MarkLogic email: "I saw that you may be working with NoSQL tech..." Dude. I work with the f'ing Ferrari of NoSQL.—@ndw
@ndw Interesting analogy. :) http://t.co/963xpLPI1y —@bsletten
@bsletten *snort* Not a problem.—@ndw

Wednesday at 02:22pm

I wonder if the Nexus 6 is a form factor that can replace both my Nexus 5 and my Nexus 7. And I wonder if I'll want the Nexus 9 anyway.—@ndw

Wednesday at 03:18pm

@edibleaustin Very nice.—@ndw

Wednesday at 05:27pm

FAV
So far every case of Ebola in this country got it by helping people. So relax, Republicans, you're in the clear.—@TinaDupuy

In a conversation that started on Friday at 03:35pm

The phototropism of an avocado sprout is quite impressive.—@ndw
@ndw for some reason, I'm seeing you in front of a mirror practicing saying that over and over as you prepare to go on stage. #breakaleg —@peteaven
@peteaven @ndw yes, Henry Higgins style.—@collwhit

Saturday at 10:23am

RT @HaroldItz: Unfortunately, Fox News can be spread through the air. #Ebola —@ndw

Saturday at 10:25am

RT @pourmecoffee: If you're trying to be the defender of democracy, families and public health, don't make it harder to vote, marry and get…—@ndw

Saturday at 10:45pm

@kendall Way too long since I had sushi. Must fix that.—@ndw

In a conversation that started on Sunday at 09:48am

@danbri @plusnet Revenue stream.—@ndw
.@ndw certainly glad to part with @plusnet - worst paid Internet connection in ~23 years online. Paying to leave is consistent w/ that :/—@danbri
</article>

Jeremy Keith (Adactio)Be progressive

Aaron wrote a great post a little while back called A Fundamental Disconnect. In it, he points to a worldview amongst many modern web developers, who see JavaScript as a universally-available technology in web browsers. They are, in effect, viewing a browser’s JavaScript engine as a runtime environment, and treating web development no different to any other kind of software development.

The one problem I’ve seen, however, is the fundamental disconnect many of these developers seem to have with the way deploying code on the Web works. In traditional software development, we have some say in the execution environment. On the Web, we don’t.

Treating JavaScript support in “the browser” as a known quantity is as much of a consensual hallucination as deciding that all viewports are 960 pixels wide. Even that phrasing—“the browser”—shows a framing that’s at odds with the reality of developing for the web; we don’t have to think about “the browser”, we have to think about browsers:

Lakoffian self-correction: if I’m about to talk about doing something “in the browser”, I try to catch myself and say “in browsers” instead.

While we might like think that browsers have all reached a certain level of equilibrium, as Aaron puts it “the Web is messy”:

And, as much as we might like to control a user’s experience down to the very pixel, those of us who have been working on the Web for a while understand that it’s a fool’s errand and have adjusted our expectations accordingly. Unfortunately, this new crop of Web developers doesn’t seem to have gotten that memo.

Please don’t think that either Aaron or I are saying that you shouldn’t use JavaScript. Far from it! It’s simply a matter of how you wield the power of JavaScript. If you make your core tasks dependent on JavaScript, some of your potential users will inevitably be left out in the cold. But if you start by building on a classic server/client model, and then enhance with JavaScript, you can have your cake and eat it too. Modern browsers get a smooth, rich experience. Older browsers get a clunky experience with full page refreshes, but that’s still much, much better than giving them nothing at all.

Aaron makes the case that, while we cannot control which browsers people will use, we can control the server environment.

Stuart takes issue with that assertion in a post called Fundamentally Disconnected. In it, he points out that the server isn’t quite the controlled environment that Aaron claims:

Aaron sees requiring a specific browser/OS combination as an impractical impossibility and the wrong thing to do, whereas doing this on the server is positively virtuous. I believe that this is no virtue.

It’s true enough that the server isn’t some rock-solid never-changing environment. Anyone who’s ever had to do install patches or update programming languages knows this. But at least it’s one single environment …whereas the web has an overwhelming multitude of environments; one for every browser/OS/device combination.

Stuart finishes on a stirring note:

The Web has trained its developers to attempt to build something that is fundamentally egalitarian, fundamentally available to everyone. That’s why the Web’s good. The old software model, of something which only works in one place, isn’t the baseline against which the Web should be judged; it’s something that’s been surpassed.

However he wraps up by saying that…

…the Web is the largest, most widely deployed, most popular and most ubiquitous computing platform the world has ever known. And its programming language is JavaScript.

In a post called Missed Connections, Aaron pushes back against that last point:

The fact is that you can’t build a robust Web experience that relies solely on client-side JavaScript.

While JavaScript may technically be available and consistently-implemented across most devices used to access our sites nowadays, we do not control how, when, or even if that JavaScript is ultimately executed.

Stuart responds in a post called Reconnecting (and, by the way, how great is it to see this kind of thoughtful blog-to-blog discussion going on?).

I am, in general and in total agreement with Aaron, opposed to the idea that without JavaScript a web app doesn’t work.

But here’s the problem with progressively enhancing from server functionality to a rich client:

A web app which does not require its client-side scripting, which works on the server and then is progressively enhanced, does not work in an offline environment.

Good point.

Now, at this juncture, I could point out that—by using progressive enhancement—you can still have the best of both worlds. Stuart has anticpated that:

It is in theory possible to write a web app which does processing on the server and is entirely robust against its client-side scripting being broken or missing, and which does processing on the client so that it works when the server’s unavailable or uncontactable or expensive or slow. But let’s be honest here. That’s not an app. That’s two apps.

Ah, there’s the rub!

When I’ve extolled the virtues of progressive enhancement in the past, the pushback I most often receive is on this point. Surely it’s wasteful to build something that works on the server and then reimplement much of it on the client?

Personally, I try not to completely reinvent all the business logic that I’ve already figured out on the server, and then rewrite it all in JavaScript. I prefer to use JavaScript—and specifically Ajax—as a dumb waiter, shuffling data back and forth between the client and server, where the real complexity lies.

I also think that building in this way will take longer …at first. But then on the next project, it takes less time. And on the project after that, it takes less time again. From that perspective, it’s similar to switching from tables for layout to using CSS, or switching from building fixed-with sites to responsive design: the initial learning curve is steep, but then it gets easier over time, until it simply becomes normal.

But fundamentally, Stuart is right. Developers don’t like to violate the DRY principle: Don’t Repeat Yourself. Writing code for the server environment, and then writing very similar code for the browser—I mean browsers—is a bad code smell.

Here’s the harsh truth: building websites with progressive enhancement is not convenient.

Building a client-side web thang that requires JavaScript to work is convenient, especially if you’re using a framework like Angular or Ember. In fact, that’s the main selling point of those frameworks: developer convenience.

The trade-off is that to get that level of developer convenience, you have to sacrifice the universal reach that the web provides, and limit your audience to the browsers that can run a pre-determined level of JavaScript. Many developers are quite willing to make that trade-off.

Developer convenience is a very powerful and important force. I wish that progressive enhancement could provide the same level of developer convenience offered by Angular and Ember, but right now, it doesn’t. Instead, its benefits are focused on the end user, often at the expense of the developer.

Personally, I’m willing to take that hit. I’ve always maintained that, given the choice between making something my problem, and making something the user’s problem, I’ll choose to make it my problem every time. But I absolutely understand the mindset of developers who choose otherwise.

But perhaps there’s a way to cut this Gordian knot. What if you didn’t need to write your code twice? What if you could write code for the server and then run the very same code on the client?

This is the promise of isomorphic JavaScript. It’s a terrible name for a great idea.

For me, this is the most exciting aspect of Node.js:

With Node.js, a fast, stable server-side JavaScript runtime, we can now make this dream a reality. By creating the appropriate abstractions, we can write our application logic such that it runs on both the server and the client — the definition of isomorphic JavaScript.

Some big players are looking into this idea. It’s the thinking behind AirBnB’s Rendr.

Interestingly, the reason why many large sites are investigating this approach isn’t about universal access—quite often they have separate siloed sites for different device classes. Instead it’s about performance. The problem with having all of your functionality wrapped up in JavaScript on the client is that, until all of that JavaScript has loaded, the user gets absolutely nothing. Compare that to rendering an HTML document sent from the server, and the perceived performance difference is very noticable.

Here’s the ideal situation:

  1. A browser requests a URL.
  2. The server sends HTML, which renders quickly, along with with some mustard-cutting JavaScript.
  3. If the browser doesn’t cut the mustard, or JavaScript fails, fall back to full page refreshes.
  4. If the browser does cut the mustard, keep all the interaction in the client, just like a single page app.

With Node.js on the server, and JavaScript in the client, steps 3 and 4 could theoretically use the same code.

So why aren’t we seeing more of these holy-grail apps that achieve progressive enhancement without code duplication?

Well, partly it’s back to that question of controlling the server environment.

This is something that Nicholas Zakas tackled a year ago when he wrote about Node.js and the new web front-end. He proposes a third layer that sits between the business logic and the rendered output. By applying the idea of isomorphic JavaScript, this interface layer could be run on the server (as Node.js) or on the client (as JavaScript), while still allowing you to have the rest of your server environment running whatever programming language works for you.

It’s still early days for this kind of thinking, and there are lots of stumbling blocks—trying to write JavaScript that can be executed on both the server and the client isn’t so easy. But I’m pretty excited about where this could lead. I love the idea of building in a way that provide the performance and universal access of progressive enhancement, while also providing the developer convenience of JavaScript frameworks.

In the meantime, building with progressive enhancement may have to involve a certain level of inconvenience and duplication of effort. It’s a price I’m willing to pay, but I wish I didn’t have to. And I totally understand that others aren’t willing to pay that price.

But while the mood might currently seem to be in favour of using monolithic JavaScript frameworks to build client-side apps that rely on JavaScript in browsers, I think that the tide might change if we started to see poster children for progressive enhancement.

Three years ago, when I was trying to convince clients and fellow developers that responsive design was the way to go, it was a hard sell. It reminded me of trying to sell the benefits of using web standards instead of using tables for layout. Then, just as the Doug’s redesign of Wired and Mike’s redesign of ESPN helped sell the idea of CSS for layout, the Filament Group’s work on the Boston Globe made it a lot easier to sell the idea of responsive design. Then Paravel designed a responsive Microsoft homepage and the floodgates opened.

Now …who wants to do the same thing for progressive enhancement?

Amazon Web ServicesNow Open - AWS Germany (Frankfurt) Region - EC2, DynamoDB, S3, and Much More

It is time to expand the AWS footprint once again, this time with a new Region in Frankfurt, Germany. AWS customers in Europe can now use the new EU (Frankfurt) Region along with the existing EU (Ireland) Region for fast, low-latency access to the suite of AWS infrastructure services. You can now build multi-Region applications with the assurance that your content will stay within the EU.

New Region
The new Frankfurt Region supports Amazon Elastic Compute Cloud (EC2) and related services including Amazon Elastic Block Store (EBS), Amazon Virtual Private Cloud, Auto Scaling, and Elastic Load Balancing.

It also supports AWS Elastic Beanstalk, AWS CloudFormation, Amazon CloudFront, Amazon CloudSearch, AWS CloudTrail, Amazon CloudWatch, AWS Direct Connect, Amazon DynamoDB, Amazon Elastic MapReduce, AWS Storage Gateway, Amazon Glacier, AWS CloudHSM, AWS Identity and Access Management (IAM), Amazon Kinesis, AWS OpsWorks, Amazon Route 53, Amazon Relational Database Service (RDS), Amazon Redshift, Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), and Amazon Simple Workflow Service (SWF).

The Region supports all sizes of T2, M3, C3, R3, and I2 instances. All EC2 instances must be launched within a Virtual Private Cloud in this Region (see my blog post, Virtual Private Clouds for Everyone for more information).

There are also three edge locations in Frankfurt for Amazon Route 53 and Amazon CloudFront.

This is our eleventh Region (see the AWS Global Infrastructure map for more information). As usual, you can see the full list in the Region menu of the AWS Management Console:

Rigorous Compliance
Every AWS Region is designed and built to meet rigorous compliance standards including ISO 27001, SOC 1, PCI DSS Level 1, to name a few (see the AWS Compliance page for more info). AWS is fully compliant with all applicable EU Data Protection laws. For customers who wish to use AWS to store personal data, AWS provides a data processing agreement. More information on how customers can use AWS to meet EU Data Protection requirements can be found at AWS Data Protection.

Customers
Many organizations in Europe are already making use of AWS. Here's a very small sample:

mytaxi (Slideshare presentation) is a very popular (10 million users and 45,000 taxis) taxi booking application. They use AWS to help them to service their global customer base in real time. They plan to use the new Region to provide even better service to their customers in Germany.


Wunderlist (case study) was first attracted to AWS by, as they say, the "fantastic technology stack." Empowered by AWS, they have developed an agile deployment model that allows them to deploy new code several times per day. They can experiment more often (with very little risk) and can launch new products more quickly. They believe that the new AWS Region will benefit their customers in Germany and will also inspire the local startup scene.

AWS Partner Network
Members of the AWS Partner Network (APN) have been preparing for the launch of the new Region. Here's a sampling (send me email with launch day updates).

Software AG is using AWS as a global host for ARIS Cloud, a Business Process Analysis-as-a-Service (BPAaaS) product. AWS allows Software AG to focus on their core competency—the development of great software and gives them the power to roll out new cloud products globally within days.

Trend Micro is bringing their security solutions to the new region. Trend Micro Deep Security helps customers secure their AWS deployments and instances against the latest threats, including Shellshock and Heartbleed.

Here are a few late-breaking (post-launch additions):

  1. BitNami - Support for the new Amazon Cloud Region in Germany.
  2. Appian - Appian Cloud Adds Local Hosting in Germany

Here are some of the latest and greatest third party operating system AMIs in the new Region:

  1. Canonical - Ubuntu Server 14.04 LTS
  2. SUSE - SUSE Linux Enterprise Server 11 SP3

For Developers - Signature Version 4 Support
This new Region supports only Signature Version 4. If you have built applications with the AWS SDKs or the AWS Command Line Interface (CLI) and your API calls are being rejected, you should update to the newest SDK and CLI. To learn more, visit Using the AWS SDKs and Explorers.

AWS Offices in Europe
In order to support enterprises, government agencies, academic institutions, small-to-mid size companies, startups, and developers, there are AWS offices in Germany (Berlin, Munich), the UK (London), Ireland (Dublin), France (Paris), Luxembourg (Luxembourg City), Spain (Madrid), Sweden (Stockholm), and Italy (Milan).

Use it Now
This new Region is open for business now and you can start using it today!

-- Jeff;

PS - Like our US West (Oregon) and AWS GovCloud (US) Regions, this region uses carbon-free power!

Amazon Web ServicesMLB.com Statcast Debuts at the World Series - Powered by AWS

Yesterday, the team at MLB Advanced Media (MLBAM) launched MLB.com Statcast for the 2014 World Series. This cool new video experience, powered by AWS, demonstrates for fans how high-resolution cameras and radar equipment precisely track the position of the ball and all of the players on the field during a baseball game. The equipment captures 20,000 position metrics for the ball every second. It also captures 30 position metrics for each player every second.

The data is used to create a newly introduced video overlay experience — MLB.com Statcast powered by AWS — to display the computed performance metrics that measure the performance of each player. This data, and the renderings that it creates, help to provide today's baseball fans with the detailed and engaging online content that they crave.

Here are a couple of examples that will show you more about the data collected and displayed through Statcast, using a diving catch from Game 6 of the ALCS. First, the pitch:

The reaction in center field:

And the catch:

Watch the complete video to see and hear the action!

-- Jeff;

ProgrammableWebCrowdfunder.co.uk API Simplifies Crowdfunding Integration

When it comes to financing a new business, project or exciting venture, getting large sums of money from investors isn't always an option. That's where crowdsourcing can become a viable option. Getting small amounts of money from a large number of people may be a more realistic solution in a lot of areas. This kind of fundraising is a great idea, but with it comes the need for a reliable platform that can make the collecting of cash a simple and user-friendly process.

ProgrammableWebToday in APIs: Deeplink Launches API for Better Mobile App Interoperability

Deeplink launches tools to make it easier to jump between mobile apps. Khronos publishes new specification for mapping vision apps for front end processing. Plus: Gigaom to hold event on IoT and automotives, and businesses plea for hackathon to solve U.S./Canada border delays.

ProgrammableWebCognitive Scale Simplifies Development of Cognitive Computing Applications

Cognitive computing is defined by the development of new classes of applications that leverage advanced artificial intelligence to make better decisions. The challenge developers now face is finding ways to practically build those applications in a way that delivers value to the business in a couple of weeks versus years of development effort.

Jeremy Keith (Adactio)A question of markup

Hi,

I’m really sorry it’s taken me so long to write back to you (over a month!)—I’m really crap at email.

I’m writing to you hoping you can help me make my colleagues take html5 “seriously”. They have read your book, they know it’s the “right” thing to do, but still they write !doctype HTML and then div, div, div, div, div…

Now, if you could provide me with some answers to their “why bother?- questions” would be really appreciated.

I have to be honest, I don’t think it’s worth spending lots of time agonising over what’s the right element to use for marking up a particular piece of content.

That said, I also think it’s lazy to just use divs and spans for everything, if a more appropriate element is available.

Paragraphs, lists, figures …these are all pretty straightforward and require almost no thought.

Deciding whether something is a section or an article, though …that’s another story. It’s not so clear. And I’m not sure it’s worth the effort. Frankly, a div might be just fine in most cases.

For example, can one assume that in the future we will be pulling content directly from websites and therefore it would be smart to tell this technology which content is the article, what are the navigation and so on?

There are some third-party tools (like Readability) that pay attention to the semantics of the elements you use, but the most important use-case is assistive technology. For tools such as screen readers, there’s a massive benefit to marking up headings, lists, and other straightforward elements, as well as some of the newer additions like nav and main.

But for many situations, a div is just fine. If you’re just grouping some stuff together that doesn’t have a thematic relation (for instance, you might be grouping them together to apply a particular style), then div works perfectly well. And if you’re marking up a piece of inline text and you’re not emphasising it, or otherwise differentiating it semantically, then a span is the right element to use.

So for most situations, I don’t think it’s worth overthinking the choice of HTML elements. A moment or two should be enough to decide which element is right. Any longer than that, and you might as well just use a div or span, and move on to other decisions.

But there’s one area where I think it’s worth spending a bit longer to decide on the right element, and that’s with forms.

When you’re marking up forms, it’s really worth making sure that you’re using the right element. Never use a span or a div if you’re just going to add style and behaviour to make it look and act like a button: use an actual button instead (not only is it the correct element to use, it’s going to save you a lot of work).

Likewise, if a piece of text is labelling a form control, don’t just use a span; use the label element. Again, this is not only the most meaningful element, but it will provide plenty of practical benefit, not only to screen readers, but to all browsers.

So when it comes to forms, it’s worth sweating the details of the markup. I think it’s also worth making sure that the major chunks of your pages are correctly marked up: navigation, headings. But beyond that, don’t spend too much brain energy deciding questions like “Is this a definition list? Or a regular list?” or perhaps “Is this an aside? Or is it a footer?” Choose something that works well enough (even if that’s a div) and move on.

But if your entire document is nothing but divs and spans then you’re probably going to end up making more work for yourself when it comes to the CSS and JavaScript that you apply.

There’s a bit of a contradiction to what I’m saying here.

On the one hand, I’m saying you should usually choose the most appropriate element available because it will save you work. In other words, it’s the lazy option. Be lazy!

On the other hand, I’m saying that it’s worth taking a little time to choose the most appropriate element instead of always using a div or a span. Don’t be lazy!

I guess what I’m saying is: find a good balance. Don’t agonise over choosing appropriate HTML elements, but don’t just use divs and spans either.

Hope that helps.

Hmmm… you know, I think I might publish this response on my blog.

Cheers,

Jeremy

ProgrammableWebGiphy Launches Sticker API to Accommodate Transparent GIFs

Giphy, animated search engine for discovering and sharing GIF images, has expanded its API offering to included a Sticker API. The Sticker API extends the existing library of GIFs with images that use transparent backgrounds. After integration with the API, developers can insert the stickers into websites and apps.

Amazon Web ServicesAWS Ad Tech Conference - This Friday in San Francisco!

The advertising space is going through a rapid, technology-enabled, data-driven transformation!

Many of the companies driving this change are using AWS services like Amazon Elastic MapReduce, Amazon Redshift, Amazon DynamoDB, Amazon Kinesis, and Amazon CloudFront to serve, ingest, process, store, analyze, track, and optimize their online advertising campaigns.

If you work for an ad tech company in the San Francisco area you should consider attending a free one-day event for developers and architects this coming Friday (October 24th) in San Francisco.

Attend, Learn, Meet
If you attend the event you will get to learn AWS in a series of five technical deep dive sessions that are laser focused on the key AWS technologies that I mentioned above. You will also get to hear AWS customers such as Adroll (ad retargeting), Blinkx (video discovery and sharing), Bloomreach (big data marketing), Krux Digital (cross-screen data management), SetMedia (digital video classification), Tune and Viglink (automated monetization) share their real-life use cases, architectures, and the lessons they learned on their journey to the cloud. The day will end with a networking reception at 5:00 PM.

This event is designed for developers and architects who are already familiar with AWS and are looking to increase their knowledge of key ad tech enabling services and learn directly from their industry peers. This is not an introductory or business-level event.

Register Now
The event runs from 10:00 AM to 6:00 PM this coming Friday. It will be held in the AWS Pop-up Loft at 925 Market Street in San Francisco. Registration is mandatory, space is limited, and there's no charge to attend. To register:

  1. Go to the AWS Pop-up Loft site and click Register to attend the AWS Loft. If this is your first time registering for an event at the AWS Pop-Up Loft, you'll need to create a new account first. Otherwise, just log in to the site first.
  2. On the Evening Events/Sessions go to Friday, 10/24/14, check the box for Advertising Technology Day and continue through the registration process.

Agenda
Here is the agenda for the day:

Time Session
9:30 AM Arrive and Register
10:00 AM Customer Presentation (Viglink)
10:30 AM Customer Presentation (Krux)
11:00 AM Amazon EMR Best Practices
11:30 AM Customer Presentation (Bloomreach)
12:00 PM Lunch and Informal Q&A
12:30 PM Amazon Redshift Best Practices
1:00 PM Customer Presentation (Tune)
1:30 PM Amazon Kinesis Best Practices
2:00 PM Customer Presentation (SET Media)
2:30 PM Amazon CloudFront Best Practices
3:00 PM Customer Presentation (Blinkx)
3:30 PM Amazon DynamoDB Best Practices
4:00 PM Customer Presentation (AdRoll)
4:30 PM Q&A
5:00 PM Happy Hour Networking Reception

-- Jeff;

Amazon Web ServicesNew AWS Directory Service

Virtually every organization uses a directory service such as Active Directory to allow computers to join domains, list and authenticate users, and to locate and connect to printers, and other network services including SQL Server databases. A centralized directory reduces the amount of administrative work that must be done when an employee joins the organization, changes roles, or leaves.

With the advent of cloud-based services, an interesting challenge has arisen. By design, the directory is intended to be a central source of truth with regard to user identity. Administrators should not have to maintain one directory service for on-premises users and services, and a separate, parallel one for the cloud. Ideally, on-premises and cloud-based services could share and make use of a single, unified directory service.

Perhaps you want to run Microsoft Windows on EC2 or centrally control access to AWS applications such as Amazon WorkSpaces or Amazon Zocalo. Setting up and then running a directory can be a fairly ambitious undertaking once you take in to account the need to procure and run hardware, install, configure and patch the operating system, and the directory, and so forth. This might be overkill if you have a user base of modest size and just want to use the AWS applications and exercise centralized control over users and permissions.

The New AWS Directory Service
Today we are introducing the AWS Directory Service to address these challenges! This managed service provides two types of directories. You can connect to an existing on-premises directory or you can set up and run a new, Samba-based directory in the Cloud.

If your organization already has a directory, you can now make use of it from within the cloud using the AD Connector directory type. This is a gateway technology that serves as a cloud proxy to your existing directory, without the need for complex synchronization technology or federated sign-on. All communication between the AWS Cloud and your on-premises directory takes place over AWS Direct Connect or a secure VPN connection within a Amazon Virtual Private Cloud. The AD Connector is easy to set up (just a few parameters) and needs very little in the way of operational care and feeding. Once configured, your users can use their existing credentials (user name and password, with optional RADIUS authentication) to log in to WorkSpaces, Zocalo, EC2 instances running Microsoft Windows, and the AWS Management Console. The AD Connector is available in Small (up to 10,000 users, computers, groups, and other directory objects) and Large (up to 100,000 users, computers, groups, and other directory objects).

If you don't currently have a directory and don't want to be bothered with all of the care and feeding that's traditionally been required, you can quickly and easily provision and run a Samba-based directory in the cloud using the Simple AD directory type. This directory supports most of the common Active Directory features including joins to Windows domains, management of Group Policies, and single sign-on to directory- powered apps. EC2 instances that run Windows can join domains and can be administered en masse using Group Policies for consistency. Amazon WorkSpaces and Amazon Zocalo can make use of the directory. Developers and system administrators can use their directory credentials to sign in to the AWS Management Console in order to manage AWS resources such as EC2 instances or S3 buckets.

Getting Started
Regardless of the directory type that you choose, getting started is quick and easy. Keep in mind, of course, that you are setting up an important piece of infrastructure and choose your names and passwords accordingly. Let's walk through the process of setting up each type of directory.

I can create an AD Connector as a cloud-based proxy to an existing Active Directory running within my organization. I'll have to create a VPN connection from my Virtual Private Cloud to my on-premises network, making use of AWS Direct Connect if necessary. Then I will need to create an account with sufficient privileges to allow it handle lookup, authentication, and domain join requests. I'll also need the DNS name of the existing directory. With that information in hand, creating the AD Connector is a simple matter of filling in a form:

I also have to provide it within information about my VPC, including the subnets where I'd like the directory servers to be hosted:

The AD Connector will be up & running and ready to use within minutes!

Creating a Simple AD in the cloud is also very simple and straightforward. Again, I need to choose one of my VPCs and then pick a pair of subnets within it for my directory servers:

Again, the Simple AD will be up, running, and ready for use within minutes.

Managing Directories
Let's take a look at the management features that are available for the AD Connector and Simple AD. The Console shows me a list of all of my directories:

I can dive in to the details with a click. As you can see at the bottom of this screen, I can also create a public endpoint for my directory. This will allow it to be used for sign-in to AWS applications such as Zocalo and WorkSpaces, and to the AWS Management Console:

I can also configure the AWS applications and the Console to use the directory:

I can also create, restore, and manage snapshot backups of my Simple AD (backups are done automatically every 24 hours; I can also initiate a manual backup at any desired time):

Get Started Today
Both types of directory are available now and you can start creating and using them today in the US East (Northern Virginia), US West (Oregon), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Europe (Ireland) Regions. Prices start at $0.05 per hour for Small directories of either type and $0.15 per hour for Large directories of either type in the US East (Northern Virginia) Region. See the AWS Directory Service page for pricing information in the other AWS Regions.

-- Jeff;

Amazon Web ServicesCloudFront Update - Trends, Metrics, Charts, More Timely Logs

The Amazon CloudFront team has added a slew of analytics and reporting features this year. I would like to recap a pair of recent releases and then introduce you to the features that we are releasing today. As you probably know, CloudFront is a content delivery web service that integrates with the other parts of AWS for easy and efficient low-latency delivery of content to end users.

CloudFront Usage Charts
We launched a set of CloudFront Usage Charts back in March. The charts let you track trends in data transfer and requests (both HTTP and HTTPS) for each of your active CloudFront web distributions. Data is shown with daily or hourly granularity. These charts are available to you at no extra charge. You don't have to make any changes to your distribution in order to collect the data or to view the charts. Here is a month's worth of data for one of my distributions:

You can easily choose the distribution of interest, the desired time period and the reporting granularity:

You can also narrow down the reports by billing region:

Operational Metrics
Earlier this month CloudFront began to publish a set of Operational Metrics to Amazon CloudWatch. These metrics are published every minute and reflect activity that's just a few minutes old, giving you information that is almost real-time in nature. As is the case with any CloudWatch metric, you can display and alarm on any of the items. The following metrics are available for each of your distributions:

  • Requests - Number of requests for all HTTP methods and for both HTTP and HTTPS requests.
  • BytesDownloaded - Number of bytes downloaded by viewers for GET, HEAD, and OPTIONS requests.
  • BytesUploaded - Number of bytes uploaded to the origin with CloudFront using POST and PUT requests.
  • TotalErrorRate - Percentage of all requests for which the HTTP status code is 4xx or 5xx.
  • 4xxErrorRate - Percentage of all requests for which the HTTP status code is 4xx.
  • 5xxErrorRate - Percentage of all requests for which the HTTP status code is 5xx.

The first three metrics are absolute values and make the most sense when you view the Sum statistic. For example, here is the hourly request rate for my distribution:

The other three metrics are percentages and the Average statistic is appropriate. Here is the error rate for my distribution (I had no idea that it was so high and need to spend some time investigating):

Once I track this down (a task that will have to wait until after AWS re:Invent, I will set an Alarm as follows:

The metrics are always delivered to the US East (Northern Virginia) Region; you'll want to make sure that it is selected in the Console's drop-down menu. Metrics are not emitted if the distribution has no traffic. As a consequence, the metric may not appear in CloudWatch if it has no requests.

New - More Timely Logs
Today we are improving the timeliness of the CloudFront logs. There are two aspects to this change. First, we are increasing the frequency with which CloudFront delivers log files to your Amazon Simple Storage Service (S3) bucket. Second, we are reducing the delay between data collection and data delivery. With these changes, the newest log files in your bucket will reflect events that have happened as recently as an hour ago.

We have also improved the batching model as part of this release. As a result, many applications will see fewer files now than they did in the past, despite the increased delivery frequency.

New - Cache Statistics & Popular Objects Report
We are also launching a set of new Cache Statistics reports today. These reports are based on the entries in your log files and are available on a per-distribution and all-distribution basis with day-level granularity for any time frame within a 60-day period and hour-level granularity for any 14-day interval the same 60-day period. These reports allow filtering by viewer location. You can, for example, filter by continent in order to gain a better understanding of traffic characteristics that are dependent on the geographic location of your viewer.

The following reports are available:

  • Total Requests - This report shows the total number of requests for all HTTP status codes and all methods.
  • Percentage of Viewer Requests by Result Type - This report shows cache hits, misses, and errors as percentages of total viewer requests.
  • Bytes Transferred to Viewers - This report shows the total number of bytes that CloudFront served to viewers in response to all requests for all HTTP methods. It also shows the number of bytes served to viewers for objects that were not in the edge cache (CloudFront node) at the time of the request. This is a good approximation for the number of bytes transferred from the origin.
  • HTTP Status Codes - This report shows the number of viewer requests by HTTP status code (2xx, 3xx, 4xx, and 5xx).
  • Unfinished GET Requests - This report shows the percentage of GET requests that didn't finish downloading the requested object, as a percentage of the total requests.

Here are the reports:

The new Popular Objects report shows request count, cache hit and cache miss counts, as well as error rates for the 50 most popular objects during the specified period. This helps you understand which content is most popular among your viewers, or identify any issues (such as high error rates) with your most requested objects. Here's a sample report from one of my distributions:

Available Now
The new reports and the more timely logs are available now. Data is collected in all public AWS Regions.

-- Jeff;

If you want to learn even more about these cool new features, please join us at 10:00 AM (PT) on November 20th for our Introduction to CloudFront Reporting Features webinar.

ProgrammableWebToday in APIs: Breathometer Connects Drunks to Uber with API

Breathometer, the maker of a bluetooth connected breathalyzer, uses Uber's API to connect people who need a ride. Newsly, an overnight hack, provides tailored news using machine learning APIs. Plus: hackathons head to rural colleges, and Infected Flight is the timely winner at Disrupt Europe hackathon.

ProgrammableWeb6 Essential BaaS Features Every Mobile App Needs

Whether you’re building a new mobile app or updating an existing one, adding BaaS features will drive an increase in user engagement and retention, not to mention provide competitive edge over other apps.

Amazon Web ServicesCloudWatch Update - Enhanced Support for Windows Log Files

Earlier this year, we launched a log storage and monitoring feature for Amazon CloudWatch. As a quick recap, this feature allows you to upload log files from your Amazon Elastic Compute Cloud (EC2) instances to CloudWatch, where they are stored durably and easily monitored for specific symbols or messages.

The EC2Config service runs on Microsoft Windows instances on EC2, and takes on a number of important tasks. For example it is responsible for uploading log files to CloudWatch. Today we are enhancing this service with support for Windows Performance Counter data and ETW (Event Tracing for Windows) logs. We are also adding support for custom log files.

In order to use this feature, you must enable CloudWatch logs integration and then tell it which files to upload. You can do this from the instance by running EC2Config and checking Enable CloudWatch Logs integration:

The file %PROGRAMFILES%\Amazon\Ec2ConfigService\Settings\AWS.EC2.Windows.CloudWatch.json specifies the files to be uploaded.

To learn more about how this feature works and how to configure it, head on over to the AWS Application Management Blog and read about Using CloudWatch Logs with Amazon EC2 Running Microsoft Windows Server.

-- Jeff;

Amazon Web ServicesSpeak to Amazon Kinesis in Python

My colleague Rahul Patil sent me a nice guest post. In the post Rahul shows you how to use the new Kinesis Client Library (KCL) for Python developers.

-- Jeff;


The Amazon Kinesis team is excited to release the Kinesis Client Library (KCL) for Python developers! Developers can use the KCL to build distributed applications that process streaming data reliably at scale. The KCL takes care of many of the complex tasks associated with distributed computing, such as load-balancing across multiple instances, responding to instance failures, checkpointing processed records, and reacting to changes in stream volume.

You can download the KCL for Python using Github, or PyPi.

Getting Started
Once you are familiar with key concepts of Kinesis and KCL, you are ready to write your first application. Your code has the following duties:

  1. Set up application configuration parameters.
  2. Implement a record processor.

The application configuration parameters are specified by adding a properties file. For example:

# The python executable script 
executableName = sample_kclpy_app.py

# The name of an Amazon Kinesis stream to process.
streamName = words

# Unique KCL application name
applicationName = PythonKCLSample

# Read from the beginning of the stream
initialPositionInStream = TRIM_HORIZON

The above example configures KCL to process a Kinesis stream called "words" using the record processor supplied in sample_kclpy_app.py. The unique application name is used to coordinate amongst workers running on multiple instances.

Developers have to implement the following three methods in their record processor:

initialize(self, shard_id)
process_records(self, records, checkpointer)
shutdown(self, checkpointer, reason)

initialize() and shutdown() are self-explanatory; they are called once in the lifecycle of the record processor to initialize and clean up the record processor respectively. If the shutdown reason is TERMINATE (because the shard has ended due to split/merge operations), then you must also take care to checkpoint all of the processed records.

You implement the record processing logic inside the process_records() method. The code should loop through the batch of records and checkpoint at the end of the call. The KCL assumes that all of the records have been processed. In the event the worker fails, the checkpointing information is used by KCL to restart the processing of the shard at the last checkpointed record.

# Process records and checkpoint at the end of the batch
    def process_records(self, records, checkpointer):
        for record in records:
            # record data is base64 encoded
            data = base64.b64decode(record.get('data'))
            ####################################       
            # Insert your processing logic here#
            ####################################       
       
        #checkpoint after you are done processing the batch  
        checkpointer.checkpoint()

The KCL connects to the stream, enumerates shards, and instantiates a record processor for each shard. It pulls data records from the stream and pushes them into the corresponding record processor. The record processor is also responsible for checkpointing processed records.

Since each record processor is associated with a unique shard, multiple record processors can run in parallel. To take advantage of multiple CPUs on the machine, each Python record processor runs in a separate process. If you run the same KCL application on multiple machines, the record processors will be load-balanced across these machines. This way, KCL enables you to seamlessly change machine types or alter the size of the fleet.

Running the Sample
The release also comes with a sample word counting application. Navigate to the amazon_kclpy directory and install the package.

$ python setup.py download_jars
$ python setup.py install

A sample putter is provided to create a Kinesis stream called "words" and put random words into that stream. To start the sample putter, run:

$ sample_kinesis_wordputter.py --stream words .p 1 -w cat -w dog -w bird

You can now run the sample python application that processes records from the stream we just created:

$ amazon_kclpy_helper.py --print_command --java <path-to-java> --properties samples/sample.properties

Before running the samples, you'll want to make sure that your environment is configured to allow the samples to use your AWS credentials via the default AWS Credentials Provider Chain.

Under the Hood - What You Should Know
KCL for Python uses KCL for Java. We have implemented a Java based daemon, called MultiLangDaemon that does all the heavy lifting. Our approach has the daemon spawn a sub-process, which in turn runs the record processor, which can be written in any language. The MultiLangDaemon process and the record processor sub-process communicate with each other over STDIN and STDOUT using a defined protocol. There will be a one to one correspondence amongst record processors, child processes, and shards. For Python developers specifically, we have abstracted these implementation details away and expose an interface that enables you to focus on writing record processing logic in Python. This approach enables KCL to be language agnostic, while providing identical features and similar parallel processing model across all languages.

Join the Kinesis Team
The Amazon Kinesis team is looking for talented Web Developers and Software Development Engineers to push the boundaries of stream data processing! Here are some of our open positions:

-- Rahul Patil

ProgrammableWebKPIs for APIs: Developer Experience Can Make or Break Your API

This is the second post of a three-part series covering key performance indicators for APIs, based on John Musser's presentation at the Business of APIs Conference.

ProgrammableWebMendix Moves to Simplify Mobile App Development

Mendix today moved to simplify the development of mobile applications with enhancements to its cloud platform that enables developers to tie components together to create a mobile application that can be instantly deployed on any number of mobile computing devices.

Amazon Web ServicesNext Generation Genomics With AWS

My colleague Matt Wood wrote a great guest post to announce new support for one of our genomics partners.

-- Jeff;


I am happy to announce that AWS will be supporting the work of our partner, Seven Bridges Genomics, who has been selected as one of the National Cancer Institute (NCI) Cancer Genomics Cloud Pilots. The cloud has become the new normal for genomics workloads, and AWS has been actively involved since the earliest days, from being the first cloud vendor to host the 1000 Genomes Project, to newer projects like designing synthetic microbes, and development of novel genomics algorithms that work at population scale. The NCI Cancer Genomics Cloud Pilots are focused on how the cloud has the potential to be a game changer in terms of scientific discovery and innovation in the diagnosis and treatment of cancer.

The NCI Cancer Genomics Cloud Pilots will help address a problem in cancer genomics that is all too familiar to the wider genomics community: data portability. Today's typical research workflow involves downloading large data sets, (such as the previously mentioned 1000 Genomes Project or The Cancer Genome Atlas (TCGA)) to on-premises hardware, and running the analysis locally. Genomic datasets are growing at an exponential rate and becoming more complex as phenotype-genotype discoveries are made, making the current workflow slow and cumbersome for researchers. This data is difficult to maintain locally and share between organizations. As a result, genomic research and collaborations have become limited by the available IT infrastructure at any given institution.

The NCI Cancer Genomics Cloud Pilots will take the natural step to solve this problem, by bringing the computation to where the data is, rather than the other way around. The goal of the NCI Cancer Genomics Cloud Pilots are to create cloud-hosted repositories for cancer genome data that reside alongside the tools, algorithms, and data analysis pipelines needed to make use of the data. These Pilots will provide ways to provision computational resources within the cloud so that researchers can analyze the data in place. By collocating data in the cloud with the necessary interface, algorithms, and self-provisioned resources, these Pilots will remove barriers to entry, allowing researchers to more easily participate in cancer research and accelerating the pace of discovery. This means more life-saving discoveries such as better ways to diagnose stomach cancer, or the identification of novel mutations in lung cancer that allow for new drug targets.

The Pilots will also allow cancer researchers to provision compute clusters that change as their research needs change. They will have the necessary infrastructure to support their research when they need it, rather than make a guess at the resources that they will need in the future every time grant writing season starts. They will also be able to ask many more novel questions of the data, now that they are no longer constrained by a static set of computational resources.

Finally, the NCI Cancer Genomics Pilots will help researchers collaborate. When data sets are publicly shared, it becomes simple to exchange and share all the tools necessary to reproduce and expand upon another lab's work. Other researchers will then be able to leverage that software within the community, or perhaps even in an unrelated field of study, resulting in even more ideas be generated.

Since 2009, Seven Bridges Genomics has developed a platform to allow biomedical researchers to leverage AWS's cloud infrastructure to focus on their science rather than managing computational resources for storage and execution. Additionally, Seven Bridges has developed security measures to ensure compliance with Health Insurance Portability and Accountability Act (HIPAA) for all data stored in the cloud. For the NCI Cancer Genomics Cloud Pilots, the team will adapt the platform to meet the specific needs of the cancer research community as the develop over the course of the Pilot. If you are interested in following the work being done by Seven Bridges Genomics or giving feedback as their work on the NCI Cancer Genomics Cloud Pilots progresses, you can do so here.

We look forward to the journey ahead with Seven Bridges Genomics. You can learn more about AWS and Genomics here.

-- Matt Wood, General Manager, Data Science

Jeremy Keith (Adactio)Indie web building blocks

I was back in Nürnberg last week for the second border:none. Joschi tried an interesting format for this year’s event. The first day was a small conference-like gathering with an interesting mix of speakers, but the second day was much more collaborative, with people working together in “creator units”—part workshop, part round-table discussion.

I teamed up with Aaron to lead the session on all things indie web. It turned out to be a lot of fun. Throughout the day, we introduced the little building blocks, one by one. By the end of the day, it was amazing to see how much progress people made by taking this layered approach of small pieces, loosely stacked.

relme

The first step is: do you have a domain name?

Okay, next step: are you linking from that domain to other profiles of you on the web? Twitter, Instagram, Github, Dribbble, whatever. If so, here’s the first bit of hands-on work: add rel="me" to those links.

<a rel="me" href="https://twitter.com/adactio">Twitter</a>
<a rel="me" href="https://github.com/adactio">Github</a>
<a rel="me" href="https://www.flickr.com/people/adactio">Flickr</a>

If you don’t have any profiles on other sites, you can still mark up your telephone number or email address with rel="me". You might want to do this in a link element in the head of your HTML.

<link rel="me" href="mailto:jeremy@adactio.com" />
<link rel="me" href="sms:+447792069292" />

IndieAuth

As soon as you’ve done that, you can make use of IndieAuth. This is a technique that demonstrates a recurring theme in indie web building blocks: take advantage of the strengths of existing third-party sites. In this case, IndieAuth piggybacks on top of the fact that many third-party sites have some kind of authentication mechanism, usually through OAuth. The fact that you’re “claiming” a profile on a third-party site using rel="me"—and the third-party profile in turn links back to your site—means that we can use all the smart work that went into their authentication flow.

You can see IndieAuth in action by logging into the Indie Web Camp wiki. It’s pretty nifty.

If you’ve used rel="me" to link to a profile on something like Twitter, Github, or Flickr, you can authenticate with their OAuth flow. If you’ve used rel="me" for your email address or phone number, you can authenticate by email or SMS.

h-entry

Next question: are you publishing stuff on your site? If so, mark it up using h-entry. This involves adding a few classes to your existing markup.

<article class="h-entry">
  <div class="e-content">
    <p>Having fun with @aaronpk, helping @border_none attendees mark up their sites with rel="me" links, h-entry classes, and webmention endpoints.</p>
  </div>
  <time class="dt-published" datetime="2014-10-18 08:42:37">8:42am</time>
</article>

Now, the reason for doing this isn’t for some theoretical benefit from search engines, or browsers, but simply to make the content you’re publishing machine-parsable (which will come in handy in the next steps).

Aaron published a note on his website, inviting everyone to leave a comment. The trick is though, to leave a comment on Aaron’s site, you need to publish it on your own site.

Webmention

Here’s my response to Aaron’s post. As well as being published on my own site, it also shows up on Aaron’s. That’s because I sent a webmention to Aaron.

Webmention is basically a reimplementation of pingback, but without any of the XML silliness; it’s just a POST request with two values—the URL of the origin post, and the URL of the response.

My site doesn’t automatically send webmentions to any links I reference in my posts—I should really fix that—but that’s okay; Aaron—like me—has a form under each of his posts where you can paste in the URL of your response.

This is where those h-entry classes come in. If your post is marked up with h-entry, then it can be parsed to figure out which bit of your post is the body, which bit is the author, and so on. If your response isn’t marked up as h-entry, Aaron just displays a link back to your post. But if it is marked up in h-entry, Aaron can show the whole post on his site.

Okay. By this point, we’ve already come really far, and all people had to do was edit their HTML to add some rel attributes and class values.

For true site-to-site communication, you’ll need to have a webmention endpoint. That’s a bit trickier to add to your own site; it requires some programming. Here’s my minimum viable webmention that I wrote in PHP. But there are plenty of existing implentations you can use, like this webmention plug-in for WordPress.

Or you could request an account on webmention.io, which is basically webmention-as-a-service. Handy!

Once you have a webmention endpoint, you can point to it from the head of your HTML using a link element:

<link rel="mention" href="https://adactio.com/webmention" />

Now you can receive responses to your posts.

Here’s the really cool bit: if you sign up for Bridgy, you can start receiving responses from third-party sites like Twitter, Facebook, etc. Bridgy just needs to know who you are on those networks, looks at your website, and figures everything out from there. And it automatically turns the responses from those networks into h-entry. It feels like magic!

Here are responses from Twitter to my posts, as captured by Bridgy.

POSSE

That was mostly what Aaron and I covered in our one-day introduction to the indie web. I think that’s pretty good going.

The next step would be implementing the idea of POSSE: Publish on your Own Site, Syndicate Elsewhere.

You could do this using something as simple as If This, Then That e.g. everytime something crops up in your RSS feed, post it to Twitter, or Facebook, or both. If you don’t have an RSS feed, don’t worry: because you’re already marking your HTML up in h-entry, it can be converted to RSS easily.

I’m doing my own POSSEing to Twitter, which I’ve written about already. Since then, I’ve also started publishing photos here, which I sometimes POSSE to Twitter, and always POSSE to Flickr. Here’s my code for posting to Flickr.

I’d really like to POSSE my photos to Instagram, but that’s impossible. Instagram is a data roach-motel. The API provides no method for posting photos. The only way to post a picture to Instagram is with the Instagram app.

My only option is to do the opposite of POSSEing, which is PESOS: Publish Elsewhere, and Syndicate to your Own Site. To do that, I need to have an endpoint on my own site that can receive posts.

Micropub

Working side by side with Aaron at border:none inspired me to finally implement one more indie web building block I needed: micropub.

Having a micropub endpoint here on my own site means that I can publish from third-party sites …or even from native apps. The reason why I didn’t have one already was that I thought it would be really complicated to implement. But it turns out that, once again, the trick is to let other services do all the hard work.

First of all, I need to have something to manage authentication. Well, I already have that with IndieAuth. I got that for free just by adding rel="me" to my links to other profiles. So now I can declare indieauth.com as my authorization endpoint in the head of my HTML:

<link rel="authorization_endpoint" href="https://indieauth.com/auth" />

Now I need some way of creating and issuing authentation tokens. See what I mean about it sounding like hard work? Creating a token endpoint seems complicated.

But once again, someone else has done the hard work so I don’t have to. Tokens-as-a-service:

<link rel="token_endpoint" href="https://tokens.indieauth.com/token" />

The last piece of the puzzle is to point to my own micropub endpoint:

<link rel="micropub" href="https://adactio.com/micropub" />

That URL is where I will receive posts from third-party sites and apps (sent through a POST request with an access token in the header). It’s up to me to verify that the post is authenticated properly with a valid access token. Here’s the PHP code I’m using.

It wasn’t nearly as complicated as I thought it would be. By the time a post and a token hits the micropub endpoint, most of the hard work has already been done (authenticating, issuing a token, etc.). But there are still a few steps that I have to do:

  1. Make a GET request (I’m using cURL) back to the token endpoint I specified—sending the access token I’ve been sent in a header—verifying the token.
  2. Check that the “me” value that I get back corresponds to my identity, which is https://adactio.com
  3. Take the h-entry values that have been sent as POST variables and create a new post on my site.

I tested my micropub endpoint using Quill, a nice little posting interface that Aaron built. It comes with great documentation, including a guide to creating a micropub endpoint.

It worked.

Here’s another example: Ben Roberts has a posting interface that publishes to micropub, which means I can authenticate myself and post to my site from his interface.

Finally, there’s OwnYourGram, a service that monitors your Instagram account and posts to your micropub endpoint whenever there’s a new photo.

That worked too. And I can also hook up Bridgy to my Instagram account so that any activity on my Instagram photos also gets sent to my webmention endpoint.

Indie Web Camp

Each one of these building blocks unlocks greater and greater power:

Each one of those building blocks you implement unlocks more and more powerful tools:

But its worth remembering that these are just implementation details. What really matters is that you’re publishing your stuff on your website. If you want to use different formats and protocols to do that, that’s absolutely fine. The whole point is that this is the independent web—you can do whatever you please on your own website.

Still, if you decide to start using these tools and technologies, you’ll get the benefit of all the other people who are working on this stuff. If you have the chance to attend an Indie Web Camp, you should definitely take it: I’m always amazed by how much is accomplished in one weekend.

Some people have started referring to the indie web movement. I understand where they’re coming from; it certainly looks like a “movement” from the outside, and if you attend an Indie Web Camp, there’s a great spirit of sharing. But my underlying motivations are entirely selfish. In the same way that I don’t really care about particular formats or protocols, I don’t really care about being part of any kind of “movement.” I care about my website.

As it happens, my selfish motivations align perfectly with the principles of an indie web.

ProgrammableWebCOWL Project Promises to Better Secure JavaScript Applications

Modern Web applications by definition are an amalgamation of JavaScript code typically mashed together to create something greater than the sum of its parts. The challenge is that every developer has to trust that the sensitive data won’t inadvertently leak out.

ProgrammableWebInfinigon Launches API for Access to ECHO Platform

Infinigon Group, real-time social analytics solution provider, has launched an API for access to its ECHO platform. ECHO constitutes a real-time analytics tool that captures market data from Tweets and provides actionable data to traders. The platform processes millions of Tweets each day, and the new API provides programmatic access to the data. 

ProgrammableWeb5 Ways To Increase Developer Onboarding

Having developers adopt an API can be a difficult task, but one that can be overcome by framing the API as a product, with developers being the number one customer. If an API provider is noticing a lag in general interest, it could be due to any one of 5 major problems.

ProgrammableWebFIWARE Open API Platform Makes 80 Million EUR Available to Startups

Open API platform FIWARE is inviting applications from businesses, small enterprise and startups to participate in a range of accelerator programs aimed at creating a new wave of innovative tech for agriculture, smart cities, e-health, manufacturing and the Internet of Things. FIWARE is a European Commission-funded initiative under the Future Internet program.

ProgrammableWebPresentation at KeenCon on the &quot;APIs of Things&quot;

Hugo Fiennes, a designer of the Nest, and Co-Fonder and CEO of Eletricimp, speaks at KeenCon regarding the APIs of Things. Coming from both a hardware and software background, his half-hour talk is an informative overview on his experience of designing and implementing IoT devices.

ProgrammableWebThe IoT Enters a 20 Year Prototyping Phase

"Things" plugging into the IoT realm are changing rapidly, not only in number but in scale and complexity. The IoT is part of an evolving consumer space determined largely by taste, style, and culture. One needs only to look at the varieties of multi-colored leather bands offered with Apple's Watch to see that IoT commodities are intermingled with fashion trends. This early into the game, one might go as far to call all IoT devices prototpyes.

ProgrammableWebLinguaSys launching GlobalNLP API for natural language processing in the cloud

Human language technology company LinguaSys is this week launching an API offering that allows developers to use its GlobalNLP natural language processing software in the cloud.

Paul Downey (British Telecom)One CSV, thirty stories: 7. Prices redux

This is day 7 of One CSV, 30 stories a series of articles exploring price paid data from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from GitHub

Continuing on from yesterday’s foray into prices, today sees more of the same with more or less the same gnuplot script.

The prices file from Day 2 contains almost 150,000 different prices:

$ wc -l price.tsv
141464
Count Price (£)
208199 250000
185912 125000
163323 120000
159519 60000
147645 110000
145214 150000
140833 115000
134731 135000
131334 175000
131223 85000
129597 130000
129336 105000
126161 165000
126004 95000
124379 145000
123968 75000
123893 140000
123451 160000
123340 90000
120306 100000
119776 80000

which when plotted by rank using the gnuplot pseudo-column zero :

plot "/dev/stdin" using 0:1 with boxes lc rgb "black"

shows how the prices are distributed in quite a steep power-curve, a long-tail if you will:

Price rank

A quick awk script to collate prices, modulo 10:

cut -f1 < data/pp.tsv | awk '{ print $1 % 10 }' | sort | uniq -c | sort -rn

gives us the distribution of the last digit in the prices:

Count Price (£1)
18437019 0
715633 5
56195 9
21890 2
17549 6
17395 3
16889 1
16235 7
14888 8
11878 4

Last digit of the price

and can be tweaked to show the last two digits:

Count Price (£10)
16282411 0
2087949 50
636253 95
45710 99
22419 75
20194 25
11271 45
11121 60
9890 20
9425 80
9235 40
7677 90
6855 70
6532 10
6519 55
5924 30

Last two digits of the price

and the last three digits in the prices:

Count Price (£100)
3682320 0
3332503 5000
980975 8000
897786 2000
835579 7000
765799 3000
732587 9950
713121 6000
707063 4000
687129 9000
596687 7500
567882 2500
503076 1000
298398 8500
294878 4950
267618 9995

Last three digits of the price

A logarithmic scale can help see patterns in the lower values whilst showing the peaks on the same page; it’s a bit like squinting at the chart from a low angle:

Last 3 digits of the price on a log scale

I think tomorrow will be pretty average.

David MegginsonA different kind of data standard

This year, the UN gave me the chance to bring people together to work on data standards for humanitarian crises (like the Ebola or Syria crisis). We put together a working group from multiple humanitarian agencies and NGOs and got to work in February.  The result is the alpha version of the Humanitarian Exchange Language (HXL, pronounced /HEX-el/), a very different kind of data standard.

<section id="whats-wrong">

What’s wrong with data standards these days?

Unlike most data standards, HXL is cooperative rather than competitive. A competitive standard typically considers the way you currently work to be a problem, and starts by presenting you with a list of demands:

  • Switch to a different data format (and acquire and learn new software tools).
  • Change the information you share (and the way your organisation collects and uses that information).
  • Abandon what is valuable and unique about your organisation’s data (and conform to the common denominator).

For HXL, we reversed the process and started by asking humanitarian responders how they’re actually working right now, then thought out how we could build a cooperative standard to enhance what they already do.

</section> <section id="not-json">

Not JSON or XML

Given the conditions under which humanitarian responders work in the field (iffy connectivity, time pressure, lots to do besides putting together data reports), we realised that an XML-, JSON-, or RDF-based standard wasn’t going to work.

The one data tool people already know is the infamous spreadsheet program, so HXL would have to work with spreadsheets.  But it also had to be able to accommodate local naming conventions (e.g. we couldn’t force everyone to use “ADM1″ as a header for what is a province in Guinea, a departamento in Colombia, or a governorate in Syria). So in the end, we decided to come up to add a row of hashtags below the human-readable headers to signal the common meaning of the columns and the start of the actual data. It looks a bit like this:

Location name Location code People affected
#loc #loc_id #aff_num
Town A 01000001 2000
Town B 01000002 750
Town C 01000003 1920

The tagging conventions are slightly more-sophisticated than that, also including special support for repeated fields, multiple languages, and compact-disaggregated data (e.g. time-series data).

</section> <section id="uptake">

HXL in action

While HXL is still in the alpha stage, the Standby Task Force is already using it as part of the international Ebola response, and we’re running informal interoperability trials with the International Aid Transparency initiative, with planned more-formal trials with UNHCR and IOM.

We also have an interactive HXL showcase demo site, and a collection of public-domain HXL libraries available on GitHub. More news soon.

</section> <section id="credits">

Credits

Thanks to the Humanitarian Data Exchange project (managed by Sarah Telford) at the UN’s Office for the Coordination of Humanitarian Affairs for giving me the opportunity and support to do this kind of work, to the Humanitarian Innovation Fund for backing it financially, to the HXL Working Group for coming together to figure this stuff out, and especially to CJ Hendrix and Carsten Keßler for their excellent work on an earlier incarnation of HXL and for their ongoing support.

</section>
Tagged: hxl

ProgrammableWebToday in APIs: Facebook Doubles Bug Bounty for Advertising

Facebook ponies up even more for developers finding bugs. Apple squeezes out more performance with Metal. Plus: Stitch engineers win $1 million Salesforce hackathon, and tips on creating APIs from Netbean's creator.

ProgrammableWebVodafone India Searches for Best API Use via appStar 2014

Vodafone India, India's leading telecommunications service provider, has launched appStar 2014. appStar 2014 offers brands, e-tailers, app developers, and more the opportunity to show off API skills through app development that utilize Vodafone's network APIs.

Amazon Web ServicesAWS Week in Review - October 13, 2014

Let's take a quick look at what happened in AWS-land last week:

Monday, October 13
Tuesday, October 14
Wednesday, October 15
Thursday, October 16
Friday, October 17

Here are some of the events that we have on tap for the next week or two:

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.

-- Jeff;

Amazon Web ServicesFast, Easy, Free Data Sync from RDS MySQL to Amazon Redshift

As you know, I'm a big fan of Amazon RDS. I love the fact that it allows you focus on your applications and not on keeping your database up and running. I'm also excited by the disruptive price, performance, and ease of use of Amazon Redshift, our petabyte-scale, fully managed data warehouse service that lets you get started for $0.25 per hour and costs less than $1,000 per TB per year. Many customers agree, as you can see from recent posts by Pinterest, Monetate, and Upworthy.

Many AWS customers want to get their operational and transactional data from RDS into Redshift in order to run analytics. Until recently, it's been a somewhat complicated process. A few week ago, the RDS team simplified the process by enabling row-based binary logging, which in turn has allowed our AWS Partner Network (APN) partners to build products that continuously replicate data from RDS MySQL to Redshift.

Two APN data integration partners, FlyData and Attunity, currently leverage row-based binary logging to continuously replicate data from RDS MySQL to Redshift. Both offer free trials of their software in conjunction with Redshift's two month free trial. After a few simple configuration steps, these products will automatically copy schemas and data from RDS MySQL to Redshift and keep them in sync. This will allow you to run high performance reports and analytics on up-to-date data in Redshift without having to design a complex data loading process or put unnecessary load on your RDS database instances.

If you're using RDS MySQL 5.6, you can replicate directly from your database instance by enabling row-based logging, as shown below. If you're using RDS MySQL 5.5, you'll need to set up a MySQL 5.6 read replica and configure the replication tools to use the replica to sync your data to Redshift. To learn more about these two solutions, see FlyData's Free Trial Guide for RDS MySQL to Redshift as well as Attunity's Free Trial and the RDS MySQL to Redshift Guide. Attunity's trial is available through the AWS Marketplace, where you can find and immediately start using software with Redshift with just a few clicks.

Informatica and SnapLogic also enable data integration between RDS and Redshift, using a SQL-based mechanism that queries your database to identify data to transfer to your Amazon Redshift clusters. Informatica is offering a 60-day free trial and SnapLogic has a 30 day free trial.

All four data integration solutions discussed above can be used with all RDS database engines (MySQL, SQL Server, PostgreSQL, and Oracle). You can also use AWS Data Pipeline (which added some recent Redshift enhancements), to move data between your RDS database instances and Redshift clusters. If you have analytics workloads, now is a great time to take advantage of these tools and begin continuously loading and analyzing data in Redshift.

Enabling Amazon RDS MySQL 5.6 Row Based Logging
Here's how you enable row based logging for MySQL 5.6:

  1. Go to the Amazon RDS Console and click Parameter Groups in the left pane:
  2. Click on the Create DB Parameter Group button and create a new parameter group in the mysql5.6 family:
  3. Once in the detail view, click the Edit Parameters button. Then set the binlog_format parameter to ROW:
For more details please see Working with MySQL Database Log Files.

Free Trials for Continuous RDS to Redshift Replication from APN Partners
FlyData has published a step by step guide and a video demo in order to show you how to continuously and automatically sync your RDS MySQL 5.6 data to Redshift and you can get started for free for 30 days. You will need to create a new parameter group with binlog_format set to ROW and binlog_checksum set to NONE, and adjust a few other parameters as described in the guide above.

AWS customers are already using FlyData for continuous replication to Redshift from RDS. For example, rideshare startup Sidecar seamlessly syncs tens of millions of records per day to Redshift from two RDS instances in order to analyze how customers utilize Sidecar's custom ride services. According to Sidecar, their analytics run 3x faster and the near-real-time access to data helps them to provide a great experience for riders and drivers. Here's the data flow when using FlyData:

Attunity CloudBeam has published a configuration guide that describes how you can enable continuous, incremental change data capture from RDS MySQL 5.6 to Redshift (you can get started for free for 5 days directly from the AWS Marketplace. You will need to create a new parameter group with binlog_format set to ROW and binlog_checksum set to NONE.

For additional information on configuring Attunity for use with Redshift please see this quick start guide.

Redshift Free Trial
If you are new to Amazon Redshift, you’re eligible for a free trial and can get 750 free hours for each of two months to try a dw2.large node (16 GB of RAM, 2 virtual cores, and 160 GB of compressed SSD storage). This gives you enough hours to continuously run a single node for two months. You can also build clusters with multiple dw2.large nodes to test larger data sets; this will consume your free hours more quickly. Each month's 750 free hours are shared across all running dw2.large nodes in all regions.

To start using Redshift for free, simply go to the Redshift Console, launch a cluster, and select dw2.large for the Node Type:

Big Data Webinar
If you want to learn more, do not miss the AWS Big Data Webinar showcasing how startup Couchsurfing used Attunity’s continuous CDC to reduce their ETL process from 3 months to 3 hours and cut costs by nearly $40K.

-- Jeff;

ProgrammableWeb: APIsDiffbot Discussion

Diffbot provides developers tools that can identify, analyze, and extract the main content and sections from any web page. The Diffbot Discussion API extracts discussions and posting information from web pages. It can return information about all identified objects on a submitted page and the Discussion API returns all post data in a single object. The Diffbot Discussion API is currently in Beta.
Date Updated: 2014-10-20
Tags: [field_primary_category], [field_secondary_categories]

ProgrammableWeb: APIsDiffbot Image

Diffbot provides developers tools that can identify, analyze, and extract the main content and sections from any web page. The purpose of Diffbot’s Image API is to extract the main images from web pages. The Image API can analyze a web page and return full details on the extracted images.
Date Updated: 2014-10-20
Tags: [field_primary_category], [field_secondary_categories]

ProgrammableWeb: APIsDiffbot Analyze

Diffbot provides developers tools that can identify, analyze, and extract the main content and sections from any web page. The Diffbot Analyze API can analyze a web page visually, and take a URL and identify what type of page it is. Diffbot’s Analyze API can then decide which Diffbot extraction API (article, discussion, image, or product) may be appropriate, and through automatic extraction, will be returned in the Diffbot Analyze API call.
Date Updated: 2014-10-20
Tags: [field_primary_category], [field_secondary_categories]

Paul Downey (British Telecom)One CSV, thirty stories: 6. Prices

This is day 6 of One CSV, 30 stories a series of articles exploring price paid data from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from GitHub

I was confident today was going to be “Talk like a statistician day” but my laptop was tied up for most of it whilst Yosemite installed itself, meaning I didn’t have time to play with R after all. Instead let’s continue to dig into how property is priced.

We saw in yesterday’s scatter plots how prices clump around integer values, and then skip around where stamp duty kicks in, £60k in this section:

Zooming in on the prices scatterplot

I didn’t have much time, so grabbed gnuplot again to make another scatter plot, this time using the prices file we made on Day 2:

 #!/usr/bin/env gnuplot
set terminal png font "helvetica,14" size 1600,1200 transparent truecolor
set output "/dev/stdout"
set key off
set xlabel "Price paid (£)"
set xrange [0:1500000]
set format x "%.0s%c"
set ylabel "Number of transactions"
set yrange [0:150000]
set format y "%.0s%c"
set style circle radius 4500
plot "/dev/stdin" using 2:1 \
    with circles lc rgb "black" \
    fs transparent \
    solid 0.5 noborder
$ price.gpi < price.tsv > price.png

Transactions by price

Maybe the same plot with boxes will be clearer:

 plot "/dev/stdin" using 2:1 with boxes lc rgb "black"

Frequency of prices

So even more confirmation that people prefer whole numbers and multiples of 10 when pricing houses, and market them either just below a stamp duty band or some way beyond it. The interference lines at the lower prices look interesting. More on that tomorrow.

Paul Downey (British Telecom)One CSV, thirty stories: 5. Axes

This is day 5 of One CSV, 30 stories a series of articles exploring price paid data from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from GitHub

I’m falling behind on the schedule to write a post each day thanks to falling into a time sink hand-coding PostScript code to generate axes. As fun as that was, it wasn’t helping us towards the goal of better understanding the data. I had literally lost the plot. Returning to the brief, the scatter plots from yesterday need axes to understand when the dips occurred and at at what price the horizontal bands are at.

So time to break out gnuplot a great package for generating charts from scripts. I found gnuplotting.org extremely helpful when it came to remembering how to drive this venerable beast, and trying to fathom new features for transparency:

#!/usr/bin/env gnuplot
set terminal png \
    font "helvetica,14" \
    size 1600,1200 \
    transparent truecolor
set output "/dev/stdout"
set key off
set xlabel "Date"
set xdata time
set timefmt "%Y-%m-%d"
set xrange ["1994-10-01":"2015-01-01"]
set format x "%Y"
set ylabel "Price paid (£)"
set yrange [0:300000]
set format y "%.0s%c"
set style circle radius 100
plot "/dev/stdin" using 1:2 \
    with circles lc rgb "black" fs transparent solid 0.01 noborder

Ignoring the outliers, and digging into the lower popular prices:

Scatter plot of lower house prices

The axes help us confirm the dip of the recession in 2009, and reveals seasonal peaks in summer and strong vertical gaps each new year. Horizontal bands show how property prices bunch between round numbers. Prices below 50k start to disappear from 2004, and skip around stamp duty bands, particularly noticeably at £250k and £60k, which was withdrawn in 2005 when the gap closes, and then opens up again at £125 which was introduced in 2006. Finally, there’s a prominent gap to correlate with the £175k band which ran between 2008 and 2010.

The seasonal trends are worth exploring further, but I think we first need to dig deeper into the horizontal banding, so I’m 82.3% confident tomorrow will be “Talk like a statistician day”.

Daniel Glazman (Disruptive Innovations)La France et Ebola

Je ne comprends absolument pas ce qui se passe en France dans les aéroports avec cette mesure débile de contrôle des passagers arrivant de Guinée-Conarkry, et uniquement eux. Il est tellement facile de faire une escale, et il est tellement facile pour un douanier de savoir, à la présentation du passeport, d'où arrive exactement le passager. Tout passager, sur quelque vol que ce soit, en provenance d'un pays à risque, devrait être contrôlé plus minutieusement qu'on ne le fait aujourd'hui. La mesure en cours est une cautère sur une jambe de bois ; elle ne couvre qu'une petite partie des précautions qui devraient s'imposer. Quand on connait un peu Ebola, l'absence de tout symptôme durant la relativement longue période d'incubation, il faudrait faire mieux et plus que cela. C'est, je trouve, lamentable et dangereux.

ProgrammableWebToday in APIs: PowerClerk Interconnect Reduces Solar Install Cost, Integrates Via API

Clean Power Research launches PowerClerk Interconnect to dramatically reduce software costs associated with solar power installs. Google has announced the launch of a new Tag Manager API. Plus: Tech Crunch and Makers Academy search for rockstar developers at London hackathon. 

ProgrammableWebGoogle Announces Improvements to Google Tag Manager Including New API

Google has just announced improvements to the Google Tag Manager tool which includes a new intuitive interface, more 3rd-party templates, and the new Google Tag Manager API. Google Tag Manager is a free tool designed primarily for marketers that makes it easy to add and update website and mobile app tags including conversion tracking, site analytics, and remarketing tags.

ProgrammableWebGoogle Launches Android 5.0 Lollipop SDK

Google today delivered an updated set of tools to Android app writers. It published the latest preview images of Android 5.0 Lollipop for the Nexus 5 smartphone and Nexus 7 tablets, as well as released the Android 5.0 SDK. With these tools, developers have all they need to get their apps up to speed.

Google made early versions of both Android 5.0 and the SDK available in June. Now, with Android 5.0's commercial release just several weeks away, it's time to get your apps in shape.

Paul Downey (British Telecom)One CSV, thirty stories: 4. Scattering

This is day 4 of One CSV, 30 stories a series of articles exploring price paid data from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from GitHub

I had some feedback after yesterday mostly from people enjoying my low-tech approach, which was nice. Today I wanted to look at the price paid for property. All 19 million prices on a single page in a hope to see any apparent trends or anomalies.

To do this we only need the date and the price columns, and we might as well sort them by date as I’m pretty sure that’ll be useful later:

awk -F'⋯' '{print $2 "⋯" $1}' < data/pp.tsv | sort > prices.tsv

Now to scatter the prices with time on the x-axis, and the price paid on the y-axis. We’ll use yet another awk script to do this:

cat prices.tsv | {
cat <<!
%!
%%Orientation: Landscape
%%Page: 1 1
0 0 0 setrgbcolor
/p {
    1 0 360 arc fill
} def
!
awk -F'	' -v max=15000000 '
    function epoch(s) {
        gsub(/[:-]/, " ", s);
        s = s " 00 00 00"
        return mktime(s);
    }
    NR == 1 {
        first = epoch($1);
        last = systime() - first;
    }
    {
        this = epoch($1) - first;
        x = 600 * this / last;
        y = 600 * $2 / max;
        printf "%d %d p\n", x, y;
    }'
echo showpage
}

which generates a rather large PostScript document:

%!
%%Orientation: Landscape
%%Page: 1 1
0 0 0 setrgbcolor
/p {
    1 0 360 arc fill
} def
0 4 p
0 0 p
   ... [19 million lines removed] ...
595 3 p
595 13 p
showpage

Back in the day the quickest way to see the output would be to attach a laser printer to the parallel port on the back of a server and cat prices.ps > /dev/lp but these days we have a raft of ways of executing PostScript. Most anything that can render a PDF can usually also run the older PostScript language — it’s a little bit weird how we bat executable programs back and forth when we’re exchanging text and images. Just to emphasise the capacity for mischief, the generated 1.5 Gig PostScript reliably crashes the Apple OS X preview application, so it’s best to use something more solid, such as the open source ImageMagick in this case to make a raster image:

scatterps.sh < data/prices.tsv | convert -density 300 - out.png

This image is intriguing, but we should be able to differentiate the density of points if we make them slightly transparent. PostScript is notoriously poor at rendering opacity, but luckily ImageMagick has its own drawing language which makes png files directly and it’s fairly straightforward to tweak the awk to generate MVG:

We can see from this a general, apparently slow trend in the bulk of house prices, with seasonal and a marked dip at what looks like 2009. There’s also a strange vertical gap in higher priced properties towards the right which along with the horizontal bands more apparent on the first plot could be down to bunching around the stamp duty bands.

So there’s a few stories to delve into. I completely mismanaged my time writing this post, so will leave adding axis to the graphs until tomorrow.

ProgrammableWebHow To Develop an Android Wear Application

Android Wear from Google is a platform that connects your Android phone to your wrist. Since its release earlier this year, Android Wear has garnered a lot of attention, both from a consumer angle and also from developers, who want to ensure that they understand and have their applications ready to take advantage of a new way in which users will be interacting with contextual information.

This article will give a brief introduction to Android Wear and then jump into the platform vis-a-vis the developer.

ProgrammableWebKPIs for APIs: API Calls Are the New Web Hits

This is the first post of a three-part series covering key performance indicators (KPIs) for APIs, based on John Musser's presentation at the Business of APIs Conference. You can read the second part here

ProgrammableWeb: APIsbx.in.th

bx.in.th is a Thailand-based Bitcoin and cryptocurrency exchange platform operated by Bitcoin Exchange Thailand (Bitcoin Co. Ltd.). Their API accessibility is divided into Public and Private. The bx.in.th Public API allows anyone to view market data from the exchange, including rates, orderbook, currency pairing for comparison, high and low trades, average Bitcoin pricing, and more. The Private API requires an API key for use. HTTP POST requests can be made to place orders and manage existing orders. Private account data may be returned, such as balances, order history, transaction history, and withdrawal requests. All requests made to the API will return JSON encoded data as a response.
Date Updated: 2014-10-17
Tags: [field_primary_category], [field_secondary_categories]

ProgrammableWeb: APIsVIDAL Group

VIDAL Group is a French healthcare informatics group specializing in databasing and distributing healthcare data, pharmaceutical information, treatment specifications, and scientific publications for patients and healthcare practitioners in the European continent and worldwide. VIDAL Group also supports a medical software application under the same name. VIDAL's database may be accessed by 3rd party developers to construct healthcare-related applications and websites. After acquiring an app ID and API key from MIDAL, users can query the VIDAL server to return data on drug scores, allergies, product information, ingredients, related documents, and more.
Date Updated: 2014-10-17
Tags: [field_primary_category], [field_secondary_categories]

ProgrammableWeb: APIsCrowdfunder

Crowdfunder is a UK based platform where people can crowdsource funding for unique projects. Crowdfunder projects typically involve social endeavors related to community, charity, environment, art, music, publishing, film, and theatre. Currently in an open beta, HTTP GET calls to the Crowdfunder API can be made to request JSON lists of all current campaigns filtered by project name and category. Implementing the API, users may have programmatic access to specific details on individual projects, including all project fields: biography, description, URL, current funding amount, last pledge amount, project video, image, category, and additional details. As the API is in beta, Crowdfunder is accepting any feedback users may have while implementing the API.
Date Updated: 2014-10-17
Tags: [field_primary_category], [field_secondary_categories]

ProgrammableWeb: APIsCompany Check

The Company Check API provides direct access to a wealth of information on companies and directors. The API platform is useful to developers to incorporate company, director, financial, credit data, and many more data fields into software and business apps. By applying for the API Key, developers can choose between different levels of account plans.
Date Updated: 2014-10-17
Tags: [field_primary_category], [field_secondary_categories]

ProgrammableWeb: APIsGlobalNLP

Via RESTful connectivity, GlobalNLP handles a wide variety of natural language processing. Currently, the API supports many NLP processes including: stemming, morphological synthesis, word sense disambiguation, entity extraction, and automatic translation. A full list of supported processes is listed in the documentation along with code samples in JavaScript, C#, PHP, Python, Ruby, and more. A free account guarantees 20 API calls a minute and 500 calls a month with higher volumes available with a paid account.
Date Updated: 2014-10-17
Tags: [field_primary_category], [field_secondary_categories]

Footnotes

Updated: .  Michael(tm) Smith <mike@w3.org>