May 17, 2012

So Long EditLive! and Thanks For All The Fish (with rounded corners and drop shadows)

Since the very early days of my blogging, I’ve integrated a copy of EditLive! to make the editing pleasant and more powerful. For many, many years there was simply no way I could bring myself to use anything else. Lately though, Apple have been making Java applets less and less appealing while browsers have been continuously improving their content editable suport and JavaScript editors have gotten better at working around the remaining quirks and smoothing off the rough edges on the editing experience.  The final straw for me though was that the current early access release of EditLive! 8.0, which appears to include a lot of fixes for the latest OS X, breaks backwards compatibility with a number of APIs I use to integrate it into this blog.

So the time has finally come to switch to TinyMCE. I still have a custom plugin to fix some of the incredibly poor configuration choices made by default in WordPress (P tags are important remember!) but it’s significantly less custom code and complexity. There are also a bunch of new features that have built into WordPress which I’d never enabled in my EditLive! integration which are nice to have.  The biggest win however is that I can now write blog posts on my iPad, so hopefully I can share a few more links and quick thoughts without having to drag out my laptop.

It is sad to retire EditLive! though. I worked to build it almost from scratch for nearly 10 years and for most if not all of that time it was the absolute cream of the crop for in browser editing with features and usability that were simply unmatched. I have some slight consolation that I’ve also been able to spend some time contributing to TinyMCE and I know the guys behind it are top notch so I’m happy to pass the baton of favourite editor over. I will miss the integrated image editor and its rounded corners and drop shadows…

by Adrian Sutton at May 17, 2012 10:20 AM

Why Open Source Your Secrets

Here's a video of my Open Conference session on the business benefits of open sourcing your software.  Given that the conference was at a weekend and had a very intimate feel, I think I was a teeny bit more honest than I usually am.  Enjoy.


by Trisha (noreply@blogger.com) at May 17, 2012 09:39 AM

May 16, 2012

Growing a Team By Investing in Tools

ToolsIt’s widely recognised that adding people to a team often doesn’t result in increases in velocity and can even reduce the velocity of the team due to the extra communication overhead. Instead, teams look to increasing levels of automation and tools to improve their productivity without adding extra people.

Where many teams struggle however is in finding the right balance of building out the product versus “sharpening the saw” by improving their tools and automating processes. With the same people attempting to do both jobs that contention is unavoidable.

The solution then is to build a second team who’s job is entirely focussed on building, maintaining, configuring and improving the tools that the product team uses. Since the majority of tools used by a project are fairly orthogonal to the project itself – and often shared with other product teams – the tool team can be much more independent, reducing the communication overhead. Not only do the product team now have more time to devote to improving the product, they are continuously made more productive because of the work done by the tools team.

The most common argument against having a separate team devoted to tools is that the people building the actual product are in the best position to know what tools will make them work better and exactly how those tools should work. While this is certainly true, agile development methods are explicitly designed to help one team efficiently deliver solutions to other people’s problems. The customer for the tools team is the product team – and they already have a common understanding of software development and speak roughly the same language.

This certainly isn’t a new idea – Fred Brooks suggests it in The Mythical Man Month – and it’s used in many large companies but many small teams have a hard time switching over to this level of thinking, unable to really believe that adding people to the product team may make it go slower and that the same number of people could do significantly more if they had better tools. The time to create a dedicated tools team is significantly earlier in a team’s growth than most people initially think.

by Adrian Sutton at May 16, 2012 12:27 AM

May 13, 2012

Disruptor 2.10 Release

The Disruptor version 2.10 has been released. It is available from the Google Code download page and has been submitted to Maven central.

Changes

  • Remove deprecated timeout methods.
  • Added OSGI metadata to jar file.
  • Removed PaddedAtomicLong and use Sequence in all places.
  • Fix various generics warnings.
  • Change Sequence implementation to work around IBM JDK bug and improve performance by ~10%.
  • Add a remainingCapacity() call to the Sequencer class.

by Michael Barker (noreply@blogger.com) at May 13, 2012 04:59 PM

May 12, 2012

Update on events

Just a quick note to say I was interviewed for another podcast, again to talk about all-female events.  It's only a short one and there's probably not much in there that I haven't said before, either on here or in person.

From the 21st May, I'm at GOTO, both Copenhagen and Amsterdam.  I'll be talking about code & the Disruptor, thank goodness, and will be trying not to rant about the subject of women in technology.  If you see me there, come and say hello!

On Friday 25th May, after all the GOTO craziness, I'm going to repeat the Disruptor presentation in Rotterdam at 010DEV, an event rather fantastically called "The Disruptor and the Perfect Programmer", which someone on Twitter correct noted sounds like a fairy tale.

After all that, I'm hopefully going to take June off to play Diablo 3 and Prototype 2, and read the next Game of Thrones book.  All these joys I have been denying myself to make sure I get everything sorted in time for next week.

by Trisha (noreply@blogger.com) at May 12, 2012 01:14 PM

May 03, 2012

Featured on a BBC Podcast

This week's BBC Outriders podcast features yours truly venting about The Subject That Won't Go Away, Women in Technology.  I was interviewed at Sunday's Girl Geek conference, and got a chance to voice my opinions once again.  For those who can't be bothered to listen, they can probably be summarised as:

  • There are genuine problems that face people in our industry, let's talk about those that you have actually faced, not ones that you imagine exist.
  • In my opinion, now is a great time for women to make a name for themselves - conference organisers are crying out for you to attend and (if you want) speak, and our industry needs talented people of any type and isn't that fussy about who you are.
  • Please, please can we start talking about the good stuff that we see as women in IT?  We shouldn't only talk about the issues we face.  Yes, we need to highlight problems and address them, but I believe that this message is drowning out all the great things about what we do, and why we love our jobs.  We should be encouraging people (not just women) to join us, not putting them off.

by Trisha (noreply@blogger.com) at May 03, 2012 09:37 AM

May 02, 2012

Go Faster By Not Working

We all know by now that continuous integration is part of good software development – check in regularly and have a suite of automated tests run to confirm that everything is working as expected. If a test fails, jump on it quickly and get the build back to green. Simple right?

But what happens when something goes wrong in a way that can’t be fixed quickly? For example, the build server has a hardware fault or runs out of disk space. You can’t just rollback the faulty change, its going to take time to get the build back to green. As your CI system grows, it may take time just to understand what went wrong. If your team is small they may all be taken up fixing the problem, but if the team is larger a pair focusses on fixing the build as quickly as possible and the other developers carry on working. Now you have two problems.

You still have the build problem, but now you also have a process problem because you’re no longer doing continuous integration. When things are working well in continuous integration, you have a continuous stream of commits proceeding through the build pipeline. If a bug is introduced the build quickly picks it up and you can identify the problem change easily because it can only be one of a few commits.

Continuous integration working well - a stream of commits passing through the pipeline.

On the other hand, if developers keep working while the build is broken, they build up a large backlog of commits which makes it more difficult to identify which revision broke the build. It also makes it significantly harder to resolve the build problem because the code keeps changing and you can easily wind up with multiple build breakages starting to overlap and interact.

Broken continuous integration - a huge pile of commits building up.

To avoid this problem, many companies put up an embargo on commits or close the source tree to prevent any further changes from being committed. This controls change in the build environment and makes it easier to resolve the problem, but it doesn’t prevent the build-up of changes. The result is that when the embargo is lifted, there is a huge swarm of incoming changes all at once, introducing merging problems and making it difficult to identify the culprit if any of them introduce another problem. There could well be multiple problems introduced by that batch of changes with their effects overlapping and interacting making it even harder. Essentially, the longer an embargo is up the greater the chance that it will need to be put back up because of problems in the batch of changes developed during the embargo.

So what’s the answer? Simple – stop working. The team as a whole will go faster if developers simply stop writing code once they reach the point where they would normally commit but can’t because there’s an embargo. For short embargos, most developers won’t be affected at all, but as the embargo lasts longer more and more developers will have to stop work. This feels really bad, but it ensures we keep doing continuous integration and overall benefits the team’s productivity. For build problems that are hard to understand, it also means that gradually more and more developers are available to spitball ideas about what’s wrong and to pick up lines of investigation to help get the build working again.

Also, not coding doesn’t mean that developers can’t do anything at all, maybe now is a good time to do those higher level design sessions and ensure everyone is pushing in the same direction, maybe read up on technology that is either in use but not fully understood, or that could be of benefit if it was introduced. If there are spikes to be played, they can usually still be picked up and worked on, write a blog post (like say, this one). Or even just take an early lunch.

The bottom line is that build breakages are always hugely expensive – pretending that everything is normal and you can continue work when the build system is broken doesn’t make them any less expensive, it just makes you look busier while creating the next problem.

by Adrian Sutton at May 02, 2012 12:37 AM

April 29, 2012

In which I defend the Male species at an all Female event

Google Campus is an awesome space
Today I was at the Girl Geek Meetup conference.  I didn't advertise it much because I've said in the past I don't really agree with women-only events, and actually I felt quite uncomfortable telling you guys I was going to be there, knowing the majority of my readers weren't allowed to attend.

It's probably worth explaining why I went, so a) I can give you guys and excuse but b) conference organisers can see what people like me are looking for in a conference.

Graduate Developer Community Meet a Mentor Programme
The primary reason I went is because the new Meet a Mentor programme I'm involved in does not have a lot of women mentors.  This is simply a numbers game - when you don't have all that many people signed up to be mentors yet, and you have the "normal" proportion of women in that group, you'll be lucky if you get one female mentor turning up at these events.  Since one of the things we want to showcase to undergraduates is diversity, it's pretty important we get all sorts of people involved, not just mining from the usual suspects in the London Java Community.  This was a good conference to target since a) the attendees come from different technology and industry backgrounds and b) yes, sorry, we want more women doing it.

Don't knock it till you've tried it
I've said before I don't think all women events are the way forward.  But I haven't been to one for a long time, and I firmly believe you shouldn't say bad things about something unless you've tried it out.  So I wanted to go to see what the advantages and disadvantages of an event like this were.  If I had a terrible time, at the very least I would have material for the blog.  I also feel like I have a bit of a responsibility to spy on these things, since my male colleagues cannot.

Networking
Similar to the Meet a Mentor goal above, I wanted to meet some different people.  I'm getting comfortable in my particular circles, and I'm starting to meet some of the same people at various events.  This event was based in the very start-up friendly Shoreditch, and didn't target a specific technology or business, so it gave me the opportunity to meet new people.

They gave me a platform to rant
I only really felt comfortable agreeing to go when I was invited to participate in a panel about whether all-women events were useful, or if they created a Girl Ghetto.  Knowing I could publicly voice my opinions on the subject at one of these events made me much happier to attend.



And actually... I had a very good time!  It was refreshing to meet new people, especially because they're mostly in different technology and business spaces.  I was really inspired to see the number of entrepreneurs, and people working for startups.  And I loved the range of ages, the diversity of people's backgrounds (ethnically, educationally, geographically...), and the fact that everyone bothered to come out on a very rainy Sunday to learn stuff and meet people.

It did feel different to "normal" conferences.  However.  I personally do not believe this was because it was an all-women event.

The LJC Open Conference last year had a very different feel to JAX London and JavaOne - it was more intimate, easier to wing-it in presentations, friendlier.  You could chat to pretty much anyone over the coffee. You could smile at random people.  These are all the same things I felt about GGM today.

I can't help but think this atmosphere of supportive learning and collaboration is at least partly a function of things like:

  • Size - smaller conferences are less intimidating, and you get to meet with people more than once, giving you the impression you're with friends rather than spectators.
  • Venue - being seated (or even standing) around a table or in a coffee-break area is more intimate.  It means the speaker can make better eye contact, feels more engaged with the audience, and there's less of a barrier for active participation.  If the presenter is less than 3 metres from you, it's much easier to ask a question.  This leads to a chattier, more tailored session - more of a dialogue than a speech.  As a speaker I really prefer these sessions, and I think as an attendee I get more out of this too.
  • Common Goal - when you give up a day in your weekend to do "work" stuff, you have to have a clear idea of why you're doing it and what you're getting out of it.  I would say (although I could be wrong) that our common goal was to network, with a side-effect of learning "stuff".  Events at the weekend are really only something you do for yourself, since you're not being paid for it.  That gives you common ground.
I wanted to get a feel for if these awesome women were only going to events like these, and weren't going to "normal" conferences.  I think it was a mix.  I don't think (and I could be wrong) that people were coming to this because there were no guys.  I think they were coming because they were invited.  

Someone mentioned that a conference organiser once told them they have to ask a man to speak at their conference at most two times.  They have to ask a woman up to seven times, to overcome their concerns and reassure them that they really do want them.  I wonder if women also need to be asked several times simply to attend conferences?  This would make a lot of sense - about 15-20% of the technical workforce are women, but around 5% of conference attendees are women.  It's possibly that by the time women have been invited to, or told about, a conference enough times to make them want to go, it's totally sold out.  Do we need to very explicitly invite women to our events?  Show them we really want them?  Multiple times?  And this is just for attendees, not even speakers.

I'll come back to some of this.

The schedule
I presented on four occasions - yes, I was so greedy for fame that I actually had almost no time to talk to people or to see anyone else's sessions.

Why Open Source Your Secrets
A different spin on the Disruptor, when I answer the number one (well, number two actually) question: why did we open source it?

The slides are pretty simple and I'm not sure there's a lot of value in putting them online without the talking, but if anyone's interested I can either shrink it to a lightning talk at one of the LJC events, or attempt to expand it into a fully-fledged presentation

GDC: Meet a Mentor
What it is, what's involved, what's in it for you.  This is all material that needs to live somewhere on the interwebs so I promise to link to it as soon as it's available.  If anyone is interested in a really lightweight process for mentoring university undergraduates, drop me a line and I'll try and tell you all about it.

Technology in Electronic Trading (co-presented with Annalisa Sarasini)
This is the session I was most nervous about, since with Annalisa in Nepal and me in Seville, we didn't actually get a chance to talk about what we were doing, let alone rehearse it.  However I think it went really well - at the end of it, the folks who saw it said they had a better understanding of the use of technology in the banking/finance world, and I think it's a great way to show how computer science can truly be applied.  We'd love to run this as a full talk somewhere.

Panel: Women-only events - are we creating a Girl Ghetto?
Oh my goodness, all these women!
In which I get to say that I think it's grossly unfair we do not allow men to this event.  In which I state that this behaviour is sexist and (in my mind) unhelpful.  But in which I also am told that women are still either a) experiencing alienation/sexist behaviour at conferences (thankfully it seemed like a limited amount) or b) feel like there are issues around boys club, macho posturing males, and thought if they presented at a conference they might get questions designed specifically to make them feel stupid.

I think this is a very important point - even if none of this is happening, even if these women have never experienced this themselves, they think it might be happening.  Also, please note, these issues do not affect just women.  These perceived problems are stopping less confident men from attending or speaking at these events.

What can be done about these things?  My answer is, it's up to us.  The women, the less confident, the non-posturing non-alpha males, to make the changes we want to see.  And I can't claim all the glory for that answer.  When, during my very short stay at ThoughtWorks, I complained to Martin Fowler that GOTO 2011 was full of white, male speakers, he said to me, why don't you speak then?  Of course I said "who, me??".  I didn't think I had anything to talk about.  After attending a few conferences I realised that doesn't stop people.  And I realised people have to start somewhere.  And you have to start by doing some poor presentations to practice - you can't instantly be awesome.  It takes time, and practice, and Barrack Obama didn't come out of the womb knowing how to make an impact during a speech.  And guess what?  This year I'm speaking at GOTO Copenhagen and Amsterdam.

So my answer is, you want something?  Ask for it.  You want more women presenters?  Volunteer.  That panel is full of men?  Find out who organised it and what you need to do to be on it next year.  It's down to us.  Not to the conference organisers.  Not to some great beneficent them. The responsibility is ours.  But it's better than that - the control is ours.

Only you know what it is you want.  Only you can speak for you.  Not for all women; not for all white/black/purple people; not for all straight/gay/ambivalent folks; not for all geeks.  The question isn't "What do women want?" but "What do you want?".

I think I saw some lightbulbs go off in the audience.  I hope so.  These women were all awesome, they have a lot to share with the world.  We all do, even the tiniest experience, the smallest thing we've learnt is something someone else might not know yet.  Let's get out there and share it.

by Trisha (noreply@blogger.com) at April 29, 2012 08:23 PM

April 20, 2012

Overheard: Agile truths

After attending a number of conferences and events, and performing numerous interviews, I'm starting to hear the same things again and again.  Since Dan North challenged all my assumptions at QCon, I'm reluctant to outright ridicule them, but I will put forward my personal opinion.

Note: these are things I have heard from multiple sources, so with any luck I am not breaking the sanctity of the confessional interview.

I've never pair programmed, but I've frequently worked with a partner on critical production problems
I find this fascinating.  If there's one thing that needs to be fixed as fast, as correctly, as efficiently as possible, it's a production issue.  And when there is one, "everyone" knows that two heads are better than one, even The Business.

If this is the case, why is it so hard to sell pair programming as the default state of affairs?

Is it because creating new features is seen as just typing, where the bottleneck is access to the physical keyboard?  Is it because fixing defects when the pressure isn't on is suddenly easier for one person on their own without help?

This state of affairs is interesting to me as it implies that when proverbial hits the fan, the instinctive thing to do is to work collaboratively.  Why don't we do it more often?

We use Test Driven Development to get coverage
Seems weird to me to write your tests first to get coverage.  If unit test coverage is your most important metric (and other people have covered why this might not be the case), I'm not sure why you would write your tests first.  Seems to me that you'd get better coverage writing the tests after the code.  That way you can be sure you've covered every eventuality.

To me, the statement implies two assumptions which I would challenge:
a) The primary value of writing your tests first is to meet your coverage requirements
b) Coverage is a meaningful metric

TDD/BDD has a number of benefits (...and now I'm reluctant to list them here in case people repeat them back to me in an interview).  Good coverage will probably be a side effect of being forced to write your tests first, but I'm not convinced that's the best thing that will come out of using TDD.

I only test first when I know what I want to code
I've overheard people saying that they test first when they know what the code is going to look like.  So you dive straight into the code when you don't know what you're doing???

Of course there is a place for this - spikes, prototyping, getting a feel for a new library, so on and so forth.  But I feel that for most code that you write in your day job, you probably have a business requirement and possibly (probably?) a less firm idea of how you're going to code it.  To me, this translates into writing the test first (which documents what you want to deliver, which you already know) and then getting that to pass (which is writing the code, which is the bit you might not know).

If you know exactly what the code is going to look like a) I would question that statement and b) what's the point of the test?


What are the real answers?
At QCon I saw on Twitter a number of complaints because the presentations there gave opinions, guidelines, and, worst of all, a lot of "it depends".  But people seemed to want The Answers.

In my opinion, what developers get paid for is working out the "it depends" parameters and selecting an approach, technical or process-wise, that works for their situation.

So although I have strong opinions on all the above subjects, and although LMAX has specific approaches to both pair programming and automated testing, sadly I'm not going to go into lots of details about those.

Mostly because I'm still interviewing candidates, and I don't want to give away the correct answers....

by Trisha (noreply@blogger.com) at April 20, 2012 10:56 AM

April 16, 2012

World’s fastest break-in attempt

Actually, I’m pretty sure it isn’t, but still…

From spinning up a new EC2 instance today to getting the first e-mail from fail2ban took a little under 6 hours.

I don’t know if this makes me happy or sad. Happy because I have a Puppet-based bootstrap system which can bring a freshly minted box up to code in around 5 minutes (including iptables, fail2ban and a locked down SSH configuration), or sad because… well… have people really got nothing better to do?

In related news, when will fail2ban support IPv6? There seem to be lots of threads in lots of different issue tracking systems (most lately Github), many of which include patches, but no actual IPv6 action. Now that makes me sad. :-(

by Danny at April 16, 2012 07:14 PM

WordPress, nginx, W3TC and robots.txt

A quick note to try and save somebody else the hours of pain I just experienced…

Here’s the scenario: you’re being dead clever and ditching Apache in favour of Nginx to run your WordPress blog/site and pretty much have everything right. You’re NOT using a plugin to generate robots.txt for you – after all, WordPress does a good enough job through the Settings > Privacy page. You browse to http://domain.com/robots.txt and everything looks pretty sweet. Heck, you might even go and change the privacy settings and grab robots.txt again to make sure it’s all working the way you expect.

Then… you drop the W3 Total Cache bomb. Now, W3TC is pretty well regarded, but it hasn’t had any love for a several months. In fact, it hasn’t even been updated to say it’s compatible with WordPress 3.3.0+ (which it appears to be, AFAICT, although some people have had issues with Minify). What it does have though, is Nginx support out of the box.

What does that mean? Well, if W3TC detects that it is running on Nginx, it will write out a snippet of Nginx configuration which deals with all the cleverness needed to get Nginx to serve W3TC page cache files statically off the disk without having to go through PHP. (This, my friends, is a large part of the secret sauce that makes an Nginx/PHP stack so much faster than Apache/PHP.) Theoretically, all you have to do is use the include directive to pull this snippet into your virtual host configuration file, and you’re good to go. (If you do this then don’t forget to nginx -s reload every time you tweak your W3TC settings.)

And then it hits you. robots.txt has stopped working.

Here’s my solution (in my virtual host file, if you care):

    location = /robots.txt {
        # Force robots.txt through the PHP. This supercedes a match in the
        # generated W3TC rules which forced a static file lookup
        rewrite ^ /index.php;
    }

This is a pretty specific location (using = and not having a regexp), so it trumps anything in the W3TC generated config. Any request for robots.txt is rewritten to index.php which your regular Nginx rules should then hand off to PHP-FPM, which means WordPress will dynamically generate the content for you.

Wow. That took me, literally, 2-3 hours to figure out. Mostly because I didn’t notice it had stopped working when I added W3TC into the mix. Once I’d figured out W3TC (or rather the W3TC generated config) was the culprit, the actual fix was pretty quick.

I’ll be writing more about my Nginx config and the relative performance against Apache2 on an Amazon EC2 Micro instance soon. In the mean time, I hope I saved you some time!

by Danny at April 16, 2012 03:29 PM

April 05, 2012

April Update from LMAX Technology

Here at LMAX we've been working dead hard to deliver more features and more ways to access the platform, and we've neglected the poor blog.

There are a few things of note that are worth mentioning from the LMAX technology camp:
  • Mike Barker is featured talking about the Disruptor on the Distributed Podcast.
  • Mike, Trisha Gee, and Andy Stewart all gave presentations at QCon London.  Links to the presentations will be posted as soon as they are available.
And there are more events that we're going to be at too:

by Trisha (noreply@blogger.com) at April 05, 2012 11:21 AM

April 03, 2012

Interview on the Disributed Podcast

Last month the guys from the Distributed Podcast interviewed me about LMAX and the Disruptor. Many thanks to Jonathan and Rinat, I had a great time.

by Michael Barker (noreply@blogger.com) at April 03, 2012 08:17 PM

March 27, 2012

QCon London 2012

I'm late with my write-up of QCon, and what's worse, it will be partial - "sadly" I was in Lanzarote on a training week with the running club from the Thursday (8th) so I missed most of it.  A sacrifice I had to make for 7 days in the sunshine….

Firstly, me me me
I presented the talk I previewed at Skillsmatter the previous week, something I was calling the User's Guide to the Disruptor, but actually turned out to be how-can-Trish-fill-95-slides-with-pictures-and-finish-in-under-40-minutes.

The audience was different to the Skillsmatter event, not surprisingly.  What was surprising is that I expected people at the conference to be less aware of the Disruptor, and those who came to the Disruptor-only LJC event to have had more exposure to it.  It was a (pleasant) surprise to see how many of the standing-room-only audience had not only heard of the Disruptor but had read stuff about it (I always love it when people have read my blog), played with it and were even using it in anger.

Because of that, I think if anything the talk did not go into enough detail, or enough new stuff, to please everyone.  Tough crowd!  But it was gratifying to hear the audience correct me in some of my answers, and answer other people's questions - it's always nice to know people are listening.

Of course, I will post a link to the presentation when it's available.  For now only the slides are online.

I enjoyed QCon
QCon was noticeably different to the other conferences I've been to in the last six months.  For one, it's not a Java conference - sure, I was hosted on the Java track, but QCon is wider-ranging than just one technology platform.  I'm not sure if it's because of this, or because it was based in the notoriously impatient London, but I felt like there was a message of "Look, let's just get stuff done, OK?".  Ultimately we get paid to deliver stuff for the business, and since my favourite question is always "but what are we trying to achieve?" I like to hear ideas around how to actually deliver.  Don't get me wrong, I've loved the technology conferences - I like to hear about new stuff I had no idea about, and I really like the community vibe from them.  But it's a nice change to be shaken up into thinking exactly why we do all this.

The Data Panorama
Firstly, Rebecca Parsons and Martin Fowler from my old employer ThoughtWorks put Big Data into perspective.  Previously I hadn't really cared about it - we process lots of data at LMAX but we don't really have to dig into it, so Big Data is not top of my "oooh I really worry about that" problems.  There were quite a few interesting points I took out of this:

 - In the past, it was easy (and possibly even correct) to model the whole application based on the data you were collecting or manipulating (and probably storing in a relational database).  These days it's not just the data from your app you need to worry about (and that can get big enough), but also all the news, blogs, twitter, and Facebook stuff generated by you and about you.  In addition, your data might not even be located with your app - the cloud has made the physical question of location redundant.  All of this pushes you towards an architecture which has to separate data from the application, and forces you to ponder your design.  I heard good arguments for Domain Driven Design here, which is nice because we like that at LMAX.
 - Reporting and analytics on Big Data must be more fluid.  There's so much data about you, your users, your application, out there that you don't even know the questions to ask.  Instead you want to be able to spot patterns in data you didn't even know you had,  I thought this was dead interesting - I studied Computer Science and Artificial Intelligence at university, and we were told data mining using AI was going to be fundamental to companies who wanted to be on the bleeding edge.  Only now is it looking like people realise it's becoming that important.
 - Martin referred to Data Scientists and said he was suspicious of scientists.  I'd been reading The Black Swan on the train in that morning and couldn't help but wonder if he's read the same thing - that was talking about how you should treat someone with suspicion if they suggest you can apply science and logic to anything that is… well, actually to anything other than actual science (by which he meant physics I believe).  By saying it's a science you suggest it's predictable and follows rules.  And if it was predictable, it wouldn't be hard.  Big Data is anything other than predictable - the data could be corrupt (it's safest to assume some of it is); it's generated by people (and we all know how fickle they are); and if you're collecting lots of it practically randomly, then cause/effect/correlation are not guaranteed.  So Martin suggests the term Data Journalist.  There was a storm on Twitter which suggested a certain amount of disagreement with this term, but I like it.  But then, I like to pretend I'm a writer and not a programmer.
 - They gave some examples of using data to drive economic growth (e.g. Kenya) - what I thought was interesting about this is that it was a win-win situation - expose the data to grow technology skills, but get a lot of interesting/useful/socially responsible applications in return.
 - Something I liked was about the idea of embracing "bad" data, stuff that's not trustworthy for whatever reason.  You can assume that the good data overwhelms the bad but you can't be sure.  By looking for more fluffy patterns, vague correlations, in your data, you might expose something interesting even if the data's not "correct".  As humans, we can't expect that we won't make a mistake - it's better assume we will and work out how to deal with it.
 - There was a call to not passively accept requirements, but to play an active role.  I like to think this ties in to my post about working with your customer.  But then I would.
 - I got a lot from this keynote, even if it was just a feeling of "I knew it!  I was right!". 

Highly available systems in Erlang
Joe Armstrong was a very interesting person to listen to, clearly someone who's been there and done that. Even without the presentation, the slides are interesting in their own right as they contain a lot of information and guidelines.
The points I took away are:
 - If part of the system fails, it's not up to that part to fix itself.  You need special help to deal with failures.  If you fell over with a heart attack, you wouldn't try and heal yourself, you'd get a medic
 - Isolation between threads/programs will mean that those different things cannot interfere with each other (i.e. no shared state).
 - My favourite quote was "If you make things synchronous you'll bugger things up".  In theory, it's so much easier for us humans (programmers) to think synchronously.  But whenever you design your system to by asynchronous, you find that your system actually becomes simpler and not more complicated.

JVM performance optimizations at Twitter's scale
 - I had a terrible view in the fully packed room, so I was just picking up phrases. I heard Attila mention the Uncanny Valley, a phenomenon I was introduced to by my sister when she was studying her Cybernetics PhD (Google it, I found it fascinating).  
 - There was a lot of really useful information about how the Java GC works.  It seemed to back up the (rough) premise we work on - stuff that's very short-lived is fine, and stuff that lasts "forever" is fine - it's the stuff in between which  cripples your system when it keeps getting shoved around.

Decisions Decisions
Dan North was, as always, an excellent speaker.  He entertained us but got us all thinking.  Five years ago at QCon London (my very first conference ever, and the thing that motivated me to start a professional blog), I saw Dan speaking and I was inspired to think about my working practices.  He was talking about BDD at the time, which was a relatively new concept to me.  I came away from that QCon with clearer ideas of what awesome development practices should look like.  Never did I imagine that five years later, not only would I be working in an environment which follows a lot of those practices (and pioneers many more) but that I would actually be speaking at the same conference.

I've come a long way since then, and of course, Dan isn't talking about the same things either.  I've been to a lot of conferences this year, and I heard nothing really preaching to the "meta-agile" LMAX (still have a post pending about our agile practices...).  When it comes to agile, there seem to be very few people who we can learn off - don't get me wrong, there are lots of things we want to improve on, which is why we're looking for people to learn off.  But most people are still preaching TDD and we want to know "what next?".

Well, Dan took everything we thought we knew, and ripped it to pieces.  Taught us to challenge everything we think we know.

It was irritating actually because he had no answers.  But he did tell us that the answers we think we already have might not be the right ones….

Well worth watching his talk when it comes online, I really can't summarise it here.

Developers Have a Mental Disorder
I nearly missed this ending keynote.  I'm so glad I didn't, Greg Young is awesome at ranting.  He said developers have a disease - we overcomplicate things, when we try to simplify things.  We want to abstract stuff, we love to look for patterns and reuse when actually sometimes we just need to solve the problem.

In my notes I have "People come to conferences for answers, when they should be remembering to use their brains".  I assume that's a quote from him, and not a comment I thought of at the time, but I wholeheartedly agree with it - if a shiny new technology solved your problems, you'd be out of a job.  Your job is to take the problem and figure out how to solve it.  Not to drag and drop an answer into place.

Another talk that I can't do justice to, watch the video when it comes online.

…and finally….
At the end of the day, the Atlassian-sponsored community night was awesome.  I got to chat to (be prepared for gratuitous name dropping here) Martin Fowler; my old friend Simon Brown; my ubiquitous LJC colleagues Martijn and John; the Zero Turnaround guys; a couple of ex-colleagues from Evolution / Detica; a heap of LMAX and ex-LMAX guys; and, of course, bundles and bundles of new interesting people.

Summary (i.e the short version for those who can't be bothered to read this whole post)
Maybe it's wishful thinking, but the messages I took from QCon were:
  • If you want to solve the problem your business has, you might want to model your system around their world.  Funny, that sounds suspiciously like Domain Driven Design.
  • Synchronous is bad, mmmkay? 
  • Hardcore understanding of what the computer is really doing seems to be coming back into fashion.  Hmmm, I wonder who started that…? (tongue firmly in cheek, we can't have been the only ones)
  • "You can't tune something you don't understand" - testing and monitoring is kinda important.
  • Our business is all about trade offs.  There is no perfect solution, they pay us because it's very very hard to work out something that's Good Enough.
  • Ultimately it feels like back to basics: understand the problem; model the domain, and have sympathy for the hardware that's running your solution.

My Corporate Bit
QCon overall turned out to be a bit of an LMAX fest in the end, with Mike & Martin and Andy Stewart (our Chief Lord Business Analyst), all giving presentations there as well as me.  It's nice to be on home turf, and it's very cool to see that we have such a range of things to talk about that so many of us are invited to speak.

PS
I just found out there's a QCon in New York.  My invitation seems to have got lost in the post.  Don't suppose anyone wants me to speak there…?

by Trisha (noreply@blogger.com) at March 27, 2012 08:26 AM

Ignoring Changes to Tracked Files in Git

I’m going to want to remember this one day, so here’s a pointer to Rob Wilderson’s Ignore Changes to Tracked Files in Git.

I’m especially going to want to remember the bit about how to find which files I’ve ignored in this way.

by Adrian Sutton at March 27, 2012 06:25 AM

March 22, 2012

Java Magazine: Intro to the Disruptor Part One

This month's Java Magazine features an article by yours truly, which is yet another intro to the Disruptor.  It's basically a summary of the stuff I've written in this blog, updated for version 2.7 - so the names of the classes should be up to date and the responsibilities follow the simplified pattern we use now.  If you were looking for an more recent version of my introduction blog posts, this article gives a reasonable overview.

This is intended as part one of a series, as it's a basic and high-level view with no code examples.  In fact, it probably could be used to document the C# version as well as the Java version, although I haven't taken a look at that for a while.  Next, I would like to give some more code examples of how you use it - as always, any suggestions welcome.

by Trisha (noreply@blogger.com) at March 22, 2012 10:13 AM

March 21, 2012

New Disruptor Presentation Unveiled to the LJC!

A few weeks ago, I presented my new "User's Guide to the Disruptor" talk to the London Java Community.  Since it was very kindly hosted at Skillsmatter, there is a video of the presentation available, and the slides are below.


The presentation is a little different to the ones we've done before.  Previously we've gone into how it works and why it's fast.  This time I wanted to step back a little from the internals and show how real developers might actually use it.  The example is somewhat contrived, but the idea is to give some hints on how to break your problem down into something that will work with a Disruptor at the heart of it.

I thought the event went really well.  It was a tiny bit completely intimidating, as there were no lightning talks and I was the only attraction.  Seeing over a hundred people turn up after work, before beer, to hear you talk is a humbling experience.  Fortunately, I think the audience was perfect for the presentation - they had heard of the Disruptor but hadn't seen anything very detailed about it, so my walkthrough of how you might use it seemed to go down well.  I certainly got a lot of very sensible questions (which hopefully I've remembered to repeat for the benefit of the recording), and people had some good ideas about how and when to use it.

I ran the same presentation to a different audience at QCon London a week later, I'll post a link to that if/when it becomes available.

by Trisha (noreply@blogger.com) at March 21, 2012 09:28 AM

March 19, 2012

Resolving SVN Tree Conflicts

SVN tree conflicts can be insanely annoying – especially when they occur because you’ve deleted something locally and then an incoming change also deletes it. They can also be hard to resolve from most IDEs.

The trick is to resolve them from the command line.  Edit/create/delete the files to get things into the state they should be and then from the command line run:

svn resolve --accept working <path>

No more tree conflict.

by Adrian Sutton at March 19, 2012 11:19 PM

March 02, 2012

Alternating Table Row Colours Filling All Available Space

CSS3 makes it trivial to have alternating row colours for tables, but when the table is in a fixed height scroll panel, it’s much more difficult to have those alternating row colours extend beyond the bottom of the table content to fill the available space. Here’s the approach I use.

First let’s start with a simple table with alternating colours in a scroll panel:

<!DOCTYPE html>
<html>
<head>
<meta charset=UTF-8>
<title>Alternating Rows Example</title>
<style>
* { margin: 0; padding: 0; }
table {
  border-collapse: collapse;
  width: 100%;
  height: 100%;
}
.scroll {
  position: absolute;
  top: 0;
  bottom: 0;
  left: 0;
  right: 0;
  overflow-y: scroll;
  overflow-x: auto;
}
.scroll tr {
  height: 20px;
}
.striped tr:nth-child(odd) {
  background-color: #EAEAEA;
}
</style>
</head>
<body>
<div class="scroll">
<table class="striped">
<tbody>
<tr><td>Row 1</td></tr>
<tr><td>Row 2</td></tr>
<tr><td>Row 3</td></tr>
<tr><td>Row 4<br>With an extra line</td></tr>
<tr><td>Row 5</td></tr>
</tbody>
</table>
</div>
</body>
</html>

That gets us alternating row colours where there is table content. To get it to extend beyond the table content we need to add a filler row:

<div class="scroll">
<table class="striped">
<tbody>
<tr><td>Row 1</td></tr>
<tr><td>Row 2</td></tr>
<tr><td>Row 3</td></tr>
<tr><td>Row 4<br>With an extra line</td></tr>
<tr><td>Row 5</td></tr>
<tr class="filler"><td></td></tr>
</tbody>
</table>
</div>

This gives us an extra row in the table that we know will always start immediately after the last row of content in the table and line up. We can now set a prepared image as the background of that row that preserves the alternating colours and stretch it out to take up the remaining space:

.striped .filler {
  line-height: 0;
  background: url(stripes.png) repeat top left;
  height: 100%;
  padding: 0;
}

.striped .filler td {
 padding: 0;
}

.striped .filler:nth-child(even) {
  background-position: 0 20px;
}

Apart from the background image, we set the height to 100% so that our filler row fills the remaining space. If the table is empty, the filler row will get 100% height and fill all available space, if however the table has other rows in it, the table rendering algorithm will shrink our filler row down so that it can fit within the bounds of the table – preventing the filler row from overflowing the scroll container and activating the scroll bar.

Since our background image starts with an odd row colour, if the filler row is an even row, we use background-position to shift the background image down by the height of a stripe so it effectively starts with an odd row colour.

We also add line-height: 0 and padding: 0 so that when the table content fills all available space (or starts scrolling), the filler row has no minimum height and disappears entirely.

Note that this approach will adjust perfectly well even if the content rows vary in height, because the filler row will always start immediately after the last content row and fill the available space. The height of the alternating colours in our image will just provide the default row height in the filler space.

If we’re only targeting modern browsers anyway, we can go ahead and replace the background image with a dynamically generated gradient:

.striped .filler {
  line-height: 0;
  height: 100%;
  padding: 0;
  background-image: linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -o-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -moz-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -webkit-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -ms-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);

  background-image: -webkit-gradient(
linear,
left bottom,
left top,
color-stop(0.5, rgb(255,255,255)),
color-stop(0.5, rgb(234,234,234))
  );

  -webkit-background-size: 40px 40px;
  -moz-background-size: 40px 40px;
  -ms-background-size: 40px 40px;
  -o-background-size: 40px 40px;
  background-size: 40px 40px;
}

The final version of our page is:

<!DOCTYPE html>
<html>
<head>
<meta charset=UTF-8>
<title>Alternating Rows Example</title>
<style>
* { margin: 0; padding: 0; }
table {
  border-collapse: collapse;
  width: 100%;
  height: 100%;
}
.scroll {
  position: absolute;
  top: 0;
  bottom: 0;
  left: 0;
  right: 0;
  overflow-y: scroll;
  overflow-x: auto;
}
.scroll tr {
  height: 20px;
}
.striped tr:nth-child(odd) {
  background-color: #EAEAEA;
}
.striped .filler {
  line-height: 0;
  height: 100%;
  padding: 0;
  background-image: linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -o-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -moz-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -webkit-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);
  background-image: -ms-linear-gradient(bottom, rgb(255,255,255) 50%, rgb(234,234,234) 50%);

  background-image: -webkit-gradient(
linear,
left bottom,
left top,
color-stop(0.5, rgb(255,255,255)),
color-stop(0.5, rgb(234,234,234))
  );

  -webkit-background-size: 40px 40px;
  -moz-background-size: 40px 40px;
  -ms-background-size: 40px 40px;
  -o-background-size: 40px 40px;
  background-size: 40px 40px;
}

.striped .filler td {
 padding: 0;
}

.striped .filler:nth-child(even) {
  background-position: 0 20px;
}
</style>
</head>
<body>
<div class="scroll">
<table class="striped">
<tbody>
<tr><td>Row 1</td></tr>
<tr><td>Row 2</td></tr>
<tr><td>Row 3</td></tr>
<tr><td>Row 4<br>With an extra line</td></tr>
<tr><td>Row 5</td></tr>
<tr class="filler"><td></td></tr>
</tbody>
</table>
</div>
</body>
</html>
view raw final.html This Gist brought to you by GitHub.

Or try the demo in your browser.

by Adrian Sutton at March 02, 2012 03:19 AM

February 28, 2012

How To Round-Trip Data Via SSH

I keep forgetting this so for the record, you can use SSH to round trip data out to a remote server and back to your own box.  This is most useful for adding latency and bandwidth limitations to a connection when doing website testing.  The trick is to use both local and remote port forwards:

ssh -L 9090:localhost:9091 -R 9091:localhost:80 remote.server.com

The -L argument starts listening on port 9090 locally and sends all that traffic to port 9091 on the remote server (the domain name, localhost, is resolved by the remote server so it refers to the remote server, not your local machine).  Then the -R argument listens on port 9091 of the remote server and forwards all that traffic back to your machine’s port 80 (here localhost is resolved on your local machine).

You don’t have to use localhost as the domain name. For example, if the site you want to test is deployed on your intranet at testmachine.intranet which remote.server.com doesn’t have access to, you could use:

ssh -L 9090:localhost:9091 -R 9091:testmachine.intranet:80 remote.server.com

Or if the test site is publicly available you can do it all without the -R argument:

ssh -L 9090:testmachine.com:80 remote.server.com

In all these cases, you connect to localhost:9090 to utilise the tunnel.

by Adrian Sutton at February 28, 2012 01:59 AM

February 15, 2012

Firefox 10.0.1 - MPROTECT strikes again!

It's been a while and Firefox has moved from version 5 to version 10.0.1, now that's a pace! ;) But the important bits are...enforcing MPROTECT has never been easier...well, almost. ;)

Thanks to this attachment in this bug, the latest version of Firefox compiles fine on hardened profiles (or simply on grsec kernels).

In order to enable MPROTECT restrictions, edit the ebuild and at the top add pax_kernel flag to IUSE so it reads like this:

IUSE="bindist +crashreporter +ipc +minimal pgo selinux system-sqlite +webm pax_kernel"

also, add the following snippet in src_configure() before the # Finalize and report settings line:

if use pax_kernel; then
   mozconfig_annotate '' --disable-methodjit
   mozconfig_annotate '' --disable-tracejit
fi

...and get rid of the following lines in src_install():

# Pax mark xpcshell for hardened support, only used for startupcache creation.
pax-mark m "${S}/${obj_dir}"/dist/bin/xpcshell

and this:

# Required in order to use plugins and even run firefox on hardened.
pax-mark m "${ED}"${MOZILLA_FIVE_HOME}/{firefox,firefox-bin,plugin-container}

NOTE:You wan't be able to run Java or Flash as they require RWX mappings which will be not allowed when MPROTECT is enforced. If you need to use them, you can use different browser for it, for instance Chromium.

Now digest your local ebuild:

# ebuild /usr/local/portage/www-client/firefox/firefox-10.0.1.ebuild digest
>>> Creating Manifest for /usr/local/portage/www-client/firefox

...and you're ready to emerge! ;] Once done, start Firefox. If you're starting it from the command line, you'll see the following (expected) error:

LLVM ERROR: Allocation failed when allocating new memory in the JIT
Can't allocate RWX Memory: Operation not permitted

which is exactly what we wanted :) ...and to verify that it works as expected:

$ for pid in $(ps -ef | grep [f]irefox | awk '{print $2}'); do cat /proc/$pid/status | grep PaX; done
PaX: PeMRs

Note the capital 'M' - you're mprotected! ;]

by radegand (noreply@blogger.com) at February 15, 2012 08:43 PM

February 14, 2012

Devoxx presentation on Continuous Delivery

At Devoxx 2011 I did a talk on Continuous Delivery, in which I describe the process and principles of CD, using our experience at LMAX as an example. This has now been published here.

February 14, 2012 03:34 AM

February 13, 2012

Slides From Recent Presentations

My slides from JAX London - Beginner's Guide to Hardcore Concurrency:

Video: LJC@Playfish, JAX London

My slides from Devoxx - Disruptor Tools In Action:

Video: Devoxx (Payment required)

by Michael Barker (noreply@blogger.com) at February 13, 2012 11:54 PM

February 12, 2012

Why the customer isn't always right

Last week I went to get my hair cut (yes, sorry, this is a story about hair).  I had thought long and hard about what I wanted.  I researched, checked styles online, and bought a magazine so I could show my hairdresser exactly what I was after and there would be no confusion.  I was determined I would not be spending that ridiculous amount of money on something I was not going to be happy with.  I was even bold enough to ask for some changes to it at the end, which I have never ever had the courage to do before.

He did an excellent job.  It was almost exactly what I had asked for, with some variations to account for my particular hair type.  It was a very cute hair style that suited me.  But I had a niggling doubt.

A few days later, that niggle was a certainty.  It wasn't what I wanted.

However, it was what I had asked for.

Being English, the thought of going back and telling him I wasn't happy with it was horrifying.  Especially since he had done a really good job of it.  It wasn't his fault that what I'd asked for wasn't what I had actually wanted.  But I knew what I wanted now, and I was prepared to pay for it (again).

This time I didn't show him a picture.  I didn't point to anything specific.  I said I wanted it much shorter after all and outlined the look I was going for.  We had a conversation about it and I left it to him to apply his skills to actually implement it.

This time, I was much happier with the results.  In fact, it was exactly what I wanted.

So what?

Of course this got me thinking about work.  It's not just a lame excuse to write about girl stuff on a supposedly technical blog.  It got me thinking about our customers.

Whoever our customers are, whether they're end users, internal business owners, external clients, or we work in a bank and have fifteen thousand layers of Business Analysts between us and Real People, whoever they are they want something from us.  And in many places, we, the techies, have trained them to ask for more and more specific things, so they can be sure they're going to get what they really want.  Remember that time Bob asked for that extra field on that report because it would make his job easier?  Remember when Sandra wanted the workflow to be altered in a specific way?  It's because they know that if they ask for something specific, the chances are better that it will make its way to the front of the work queue at some point, and when it comes out the other end it is more likely to be what they wanted.

Only it isn't, is it?

There's always another field to add, or another change to make.  And your customer isn't quite satisfied, even though you did exactly what they asked for.  And sometimes they don't tell you that they're not getting what they need from your system, because it's expensive or time consuming to get changes made.

The key to delivering something they really want, instead of something else that's merely adding to the weight of your code, is finding out what they're actually trying to achieve.

So Bob wants that extra field.  That's easy enough.  But when you ask him why he wants that extra field, turns out it's because the numbers in the reports he's getting at the moment aren't the ones he needs, or aren't quite correct.  He's probably created a monstrous Excel spreadsheet into which he manually types half of the numbers from the reports it took you ages to develop, which munges them into something quite different, in order to get the real thing he wants.  He's just missing this one small piece of information to get the final figure correct.  It would save your company a lot of time/money/mis-typing errors if you completely re-wrote the reports Bob uses to give him exactly what he needs.  Or if you gave him a tool to download CSV formatted raw data from the database.

It's not Bob's fault he can't ask for this, he doesn't know what we're capable of providing him.  It's not our fault for not delivering it the first time round, we don't know what he does day-to-day.  But having a conversation where we recognise where the knowledge lies (Bob knows what the output of his day job is presumably, and we know what's available and how we can provide it), and collaborating to come up with the least rubbish solution for everyone, is a step to providing that overused term, "business value".

We're not on different sides here, we all play for the same team - we want our jobs to be as painless as possible, which (should) ultimately provide better efficiency for our company.  After all, we want the company to make money so they can pay us, right?



Caveat: Mileage may vary.  Sometimes, with some customers, you really do need to tell them what they want.

by Trisha (noreply@blogger.com) at February 12, 2012 09:01 PM

February 01, 2012

Upcoming speaking events

In theory, I am busy writing material for my upcoming speaking events, rather than writing terribly illuminating posts on my blog (see what I did there?).  In actuality I am being lazy and have pretty much taken January off for a recharge.

In the spirit of doing something which ticks both the event-speaking and blogging boxes, this is a quick update on the conferences I'm confirmed for so far.  Put the following dates in your diary - these are my first international solo speaking events:

7th March - QCon London - Concurrent Programming Using The Disruptor (sadly I can't stay for the whole conference as it clashes with the only holiday I had booked for 2012).
23rd May - GOTO Copenhagen - Concurrent Programming Using The Disruptor & War Stories.
25-26th May - GOTO Amsterdam - Concurrent Programming Using The Disruptor.

The presentation will be more of a user's guide to the Disruptor than anything we've done before.  An hour isn't a lot of time to cover all the functionality everyone might want to see, so I'm still trying to work out the balance between giving an introduction/overview for those who haven't seen it before, and going into some of the cool features that have been added since I first started blogging about it.  If there's anything you would particularly like to see covered, let me know - I'll put the most frequently requested things in there.

Ideally I'd run a workshop session at some point, but that will require quite a lot more preparation, so I'll only do that if there is interest in it (if someone wants to fly me somewhere interesting to do that so much the better!!).

Maybe I'll see you at one of these events?

by Trisha (noreply@blogger.com) at February 01, 2012 07:43 AM

January 30, 2012

Adding Latency and Limiting Bandwidth

Some aspects of linux have the reputation of being hard. Traffic control via queueing disciplines for bandwidth management for example. Even the title is enough to strike fear into the heart of a seasoned system admin.

Which is a pity really, as the things outlined in chapter 9 of the lartc are very useful in practise.  The problem is the documentation is very descriptive - which is good once you know roughly what you're doing - but which has quite a steep learning curve if you don't. In fact it's pretty vertical if you don't already know quite a lot about networking. A few more worked examples would help over and above those in the cookbook.

Instead, like most people in a rush, I have relied on attempting to bash together snippets of code that are on random blogs to make /sbin/tc do what I want it to do, without really understanding what is going on. 

This time, when presented with a problem for which this is the exact tool, I found I needed to dive deeper, and actually understand it, as none of the precanned recipes worked. It was a case of "if all else fails try the manual".

So now I think I've got a vague handle on what is going on, I'm documenting what I ended up doing because I'm sure I will need a worked example when I come back to this in the future. If its useful to you too, so much the better.

The Problem

We have a need to test the loading speed of our web page and trading platform under a set of network conditions that approximate the following;

  1. Local LAN, unrestricted
  2. "Europe", 20ms round trip latency, limit of 512kbit/sec in and out
  3. "SE Asia", 330ms round trip latency, limit of 128kbit/sec in and out.

In practise thats quite generous, particularly in the case of the south east asia profile. There was no way I was getting 128kbit on the wifi in shakey's on Rizal Boulevard in Dumaguete earlier this month. Which was better than the hotel wifi.

The Solution

Background

We have selenium to run the tests via webdriver/remotedriver to two windows virtual machines, one running chrome and one running IE. They run on a Linux host system, and can see a loadbalancer behind which lies one of our performance test environments. We need to add latency and bandwidth restrictions to their connections, effectively to put them into each of the traffic classes above depending on which test our CI system asks them to run.

The load balancer has been set up with three virtual servers, all listening on the same IP address but different ports.

  1. Local:  9090
  2. Europe: 9092
  3. SE Asia: 9091

Each virtual server has the same webserver pool behind it, so they're all the same from the point of view of the load balancer, but we'll use the different destination ports to switch the traffic between the different sets of network latency and bandwidth restriction we need to simulate the different customer locations.

The linux virtual machine host has the guests vnet network devices attached to a bridge. In turn the bridge is attached to the network, via a bonded interface. In our case bond0.30. 

To make this work for both machines, we'll apply the traffic management on the bond0.30 side of the bridge.

Ascii art diagram of that;

    IE Windows VM - vnet0                               eth0
\ /
host bridge 30 - bond0.30
/ \
Chrome Windows VM - vnet1  eth1

 

Qdiscs and Classes

There are three creatures we're dealing with here;

  • qdisc - a Queueing Discipline. These are the active things we're going to use to control how the traffic is managed.  qdiscs can be classless or classful. We're going to use a classful qdisc called htb
  • classes - We'll use these to separate the traffic into its constituent flows and to apply different constraints on each flow.
  • filters  - Similarly to iptables, these allow us to specify which traffic ends up in which class.

Chapter 9 says that you can only shape transmitted traffic, which is not 100% accurate, as we can do things to inbound traffic too, however our options are very limited.

So, looking at the default qdiscs, classes and filters

[root@vm01 ~]# tc -s qdisc show dev bond0.30     
qdisc pfifo_fast 0: bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 47844819829 bytes 140593932 pkt (dropped 0, overlimits 0 requeues 22)
 rate 0bit 0pps backlog 0b 0p requeues 22
[root@vm01 ~]# tc -s class show dev bond0.30
[root@vm01 ~]# tc -s filter show dev bond0.30
[root@vm01 ~]#

The "-s" option shows the statistics. So, but default, we have a queue discipline called pfifo_fast, which just passes traffic.

Each device has a default root which we use to build upon. We can also attach handles to classes and qdiscs to allow us to relate each part to the others and build up chains to process the packet stream. "root" is shorthand for a handle of 1:0, or the top of the tree. 

One of the most useful pages I found is here; http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm

Things worth repeating from that link are;

  • tc tool (not only HTB) uses shortcuts to denote units of rate. kbps means kilobytes and kbit means kilobits
  • Note: In general (not just for HTB but for all qdiscs and classes in tc), handles are written x:y where x is an integer identifying a qdisc and y is an integer identifying a class belonging to that qdisc. The handle for a qdisc must have zero for its y value and the handle for a class must have a non-zero value for its y value. The "1:" above is treated as "1:0"

The whole page is worth reading carefully.

The Design

 .

The Pentagons are filters, the circles represent qdiscs, and the rectangles are classes. One important point is that this diagram in no way implies flow. This is hard to get, and I had problems understanding the comments in section 9.5.2.1 "How filters are used to classify traffic" - particularly;

"You should *not* imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to."

.

 The way I squared it in the end was to think of it as an order of application for traffic flowing through the root qdisc.

So in the above we have the root qdisc, which is an instance of the HTB qdisc. From that depends each of the classes we set up to handle the three different classes of traffic. We use htb to limit the outbound bandwidth for each of the classes (1:10, 1:11, 1:12). When we define the root qdisc we specify that class 1:10 will be our default class for the bulk of the traffic we don't want to delay.

Setting up the root qdisc;

INTERFACE=bond0.30
tc qdisc add dev $INTERFACE root handle 1:0 htb default 10

 "root" is a synonym for handle 1:0.  $INTERFACE is defined in the shell script to make the porting from machine to machine easier. This installs the htb qdisc on the root for our bond interface, and tells it that by default all traffic should be put in a class called 1:10.

Now we add classes for each of the types of traffic, along with the bandwidth limits we want to enforce on each of the traffic classes.

# default class
tc class add dev $INTERFACE parent 1:0 classid 1:10 htb rate 1024mbit

# "europe" traffic class - outbound bandwidth limit
tc class add dev $INTERFACE parent 1:0 classid 1:11 htb rate 512kbit

# "se asia" traffic class - outbound bandwidth limit
tc class add dev $INTERFACE parent 1:0 classid 1:12 htb rate 128kbit

 We now attach the network emulator qdisc, netem, which we will use to introduce latency into each of the classes;

# network emulation - add latency.
tc qdisc add dev $INTERFACE parent 1:11 handle 11:0 netem delay 20ms 5ms 25% \
 distribution normal
tc qdisc add dev $INTERFACE parent 1:12 handle 12:0 netem delay 330ms 10ms 25% \
distribution normal

This attaches the emulator instances to their parent classes, with handles that match the parents Y value, for ease of tracing.The netem parameters break down as follows.

  • delay 20ms - This is pretty self explanatory.
  • 5ms - this is a jitter on the latency to give a bit of variation
  • 25% - this indicates how much the variation in the latency of each packet will depend on its predecessor
  • distribution normal - how the variation is distributed. 

The netem module is described completely here:  http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

One thing  that could be improved here is that we're adding all the latency on the outbound leg. Ideally we'd add 165 ms on the way there and on the way back for the SEAsia traffic (and 10ms for the EU traffic). To do that means applying latency to the outbound interfaces in both directions. In our case that would mean applying 165ms of latency to both of the vnet interfaces as well as the bond0.30 interface. However that is tricky to do simply as the virtual machine interface names may change as they get rebooted. Instead this way we end up with the same result for far less faffing about.

Now, all we need to do is add the filters that classify the packets into their classes

SEASIAIP=172.16.10.10
SEASIAPORT=9091
EUIP=172.16.10.10
EUPORT=9092

# filter packets into appropriate traffic classes.
tc filter add dev $INTERFACE protocol ip parent 1:0 prio 1 \
u32 match ip dst $SEASIAIP match ip dport $SEASIAPORT 0xffff flowid 1:12
tc filter add dev $INTERFACE protocol ip parent 1:0 prio 1 \
u32 match ip dst $EUIP match ip dport $EUPORT 0xffff flowid 1:11

The action is mainly in the second line of each command, where we match the target IP of the load balancer, and the ports we've setup. The flowid is the class handle for the appropriate classes. We don't need to set up a filter for the "normal" traffic, as it is covered by the "default 10" part of the original htb root qdisc declaration.

And that takes care of the outbound traffic shaping and latency.

We now need to handle inbound.

For this we use the special ingress qdisc. There's very little we can actually do with this qdisc. It has no classes, and all you can really do is to attach a filter to it. Usefully we can use the "police" key word to restrict (by packet dropping) the inbound flow. Its not exact, but its good enough for our purposes.

# inbound qdisc.
tc qdisc add dev $INTERFACE handle ffff: ingress

# attach a policer for "se asia" class.
tc filter add dev $INTERFACE protocol ip parent ffff: prio 1 \
u32 match ip src $SEASIAIP  match ip sport $SEASIAPORT 0xffff \
police rate 128kbit burst 10k drop flowid :1

# attach a policer for "europe" traffic class.
tc filter add dev $INTERFACE protocol ip parent ffff: prio 1 \
 u32 match ip src $EUIP match ip sport $EUPORT 0xffff \
police rate 512kbit burst 10k drop flowid :2

The handle ffff: is a synonym for the inbound traffic root. All you can do is attach ingress to it as shown. To be frank I've not dived into exactly how the burst keyword affects things. Essentially the above filter rule is the same as the one we used on the outbound side except we now match the source ports and IPs rather than the destination ports and IPs. Then rather than using the flowid argument we use police to instruct the kernel to drop packets from each of our loadbalancer ports if they exceed the stated rates.

Cleanup

To clean up after all of this, its sufficient to just remove the root and ingress qdiscs. Removing the top of the tree removes all the other configuration.

# remove any existing ingress qdisc.
tc qdisc del dev $INTERFACE ingress
# remove any existing egress qdiscs
tc qdisc del dev $INTERFACE root

 Which cleans up all classes and filters.

Conclusion

There's an init script that encapsulates all of the above which can be downloaded from here.

[root@vm01 ~]# chkconfig latency on
[root@vm01 ~]# /etc/init.d/latency      
Usage: /etc/init.d/latency {start|stop|restart|condrestart|status}
[root@vm01 ~]# /etc/init.d/latency start
[root@vm01 ~]# /etc/init.d/latency stop
[root@vm01 ~]# /etc/init.d/latency status
 Active Queue Disciplines for bond0.10

 Active Queueing Classes for bond0.10

 Active Traffic Control Filters for bond0.10
[root@vm01 ~]#

And thats it.

This mainly suits a static configuration, as is the case with our load balancer and continuous integration environment. However for web development use, this approach lacks flexibility, particularly if you don't have root access. For our developers, I looked at ipdelay but eventally settled with charles which was adequate for our purposes.

 HTH.

by atp at January 30, 2012 05:33 PM

January 28, 2012

Adventures with DD-WRT and IPv6 (with a dash of TomatoUSB)

A little under a year ago, I decided two things: first, that it was about time my ageing home network got GigE and 5GHz wireless-N (dual band, of course, to support devices that would only do 2.4GHz); and second, that I would separate the jobs of BEING my network from CONNECTING my network to the internet (since I couldn’t find a good router which would meet these requirements AND had an ADSL modem in it).

So I bought a Linksys/Cisco E3000, made it the backbone of my network and connected it to the internet via my ISP-supplied ADSL modem.

Then an unfortunate incident happened which involved the Linksys/Cisco setup CD, an unwanted but non-removable guest WiFi network, and me swearing a lot.

The time had come (after about 16 hours!) to put DD-WRT on my router. As this post describes, choosing a version of DD-WRT that won’t “brick” your router (as the developers like to describe it) is treacherous to say the least. I eventually settled on dd-wrt.v24-16758_NEWD-2_K2.6_mega (specifically, the nv60k version). Despite the trepidation caused by the dire warnings on the web site, the flashing went well, and I’ve been pleased with DD-WRT ever since. Until…

Last week, I had a 40Mbit/sec fibre broadband connection installed. Amongst other things, my new ISP provides me with a block of IPv6 addresses. Actually 2^80 of them. I seriously need to think about what I’m going to do with them all.

My excitement at having 1,208,925,819,614,629,174,706,176 IP addresses was somewhat dampened when, after a day or so of fiddling and researching, I discovered that DD-WRT’s supposed IPv6 support was limited to the various types of v6-over-v4 tunnels (e.g. Hurricane Electric). Specifically, the PPP daemon doesn’t support IPv6 – so this might just be an issue for PPPoE users. There was no way for me to use all that space natively.

It should be noted here that even if you do want to use a tunnel to reach the IPv6 internet, you will still need to write startup scripts for DD-WRT to load the kernel module (the “Enable IPv6″ checkbox doesn’t actually do anything), start radvd (the “Enable radvd” checkbox doesn’t actually do anything), configure the tunnel interfaces and WAN IP addresses, etc. And even after all of this, you’ll find that the IPv6 user tools (ip6tables, ping6, traceroute6, etc.) aren’t installed, so you’ll have to locate them and hope you have room on your device somewhere.

So the time has come to make the move to TomatoUSB. To some extent, this suffers from the same issues as DD-WRT when it comes to variants, etc., but the information is more logically presented, and there do seem to be fewer choices and fewer potential traps. After looking at the comparison of “mods” on Wikipedia, I chose Toastman’s mod. It seems to have all the features I wanted and he seems to do frequent builds with all the latest updates and patches – in fact, the latest build (1.28.7494.3) was made only 6 days ago. This compares well with DD-WRT which doesn’t appear to have had any real active work/releases for a year or so now.

My first impressions of TomatoUSB are positive. The GUI feels snappy, and has most of the same features as DD-WRT. The real-time bandwidth monitor is definitely prettier than DD-WRTs. And, most importantly, the IPv6 support works out of the box.

TomatoUSB IPv6 configuration screen

Out-of-the-box, ip6tables is configured to allow ICMP packets of every type (so I can ping all my machines from various online ping sites), but disallow all inbound traffic. So, Linux ip6tables bugs aside, I’m secure by default, which is nice. There doesn’t seem to be a GUI interface to setup firewall rules for IPv6, so I guess if I ever to want to let anything in, I’ll have to ssh to the router and do it by hand – but why would I ever want that?

And that’s that. I took under 2 hours to flash TomatoUSB, reproduce all my configuration on it, and get IPv6 working. Nice. I can now browse ipv6.google.com, www.v6.facebook.com/, and I get a dancing turtle when I visit www.kame.net. Also, this:

Results from test-ipv6.com

One last thing: don’t forget to enable IPv6 privacy extensions on all of your hosts!

by Danny at January 28, 2012 09:59 PM

January 25, 2012

The Business of Standards

Recently I’ve been getting spam from the “standards organisation” OASIS inviting a company that I don’t work for1 to join a new standards initiative. There’s no pretense that I’m being invited because of my clearly superior knowledge of the area involved, merely that the company could get great advertising exposure by participating – including being listed on a press release! Naturally we’d need to become OASIS members and pay the appropriate fee, and to be in the press release you’d have to be at minimum a “sponsor level member”.

On the one hand it’s nice that they aren’t pretending to be anything other than a for-profit marketing company, but isn’t it a bit sad that we think standards are just a marketing tool?

1 – yes, they’ve done so much research into how much I have to contribute that they’ve got the company wrong

by Adrian Sutton at January 25, 2012 09:25 PM

January 19, 2012

Diagnosing a Drupal 7 cache generation problem

My company launched their new website recently. When we launched before Christmas we encountered a reoccurring problem that was more difficult than most to diagnose. The problem itself is very specific to our site so I doubt the exact details will help many people, but maybe the troubleshooting steps involved will prove interesting to someone. I'm not particularly proud of the time it took to track down nor our exact thought process (hardly blowing my own horn with this post) but here we go anyway.

The website platform was built for us by a third party, the technology is mostly Drupal 7 with some custom modules written for functionality we required. We wrote our own "Drupal deployment interface" that mirrors the contents of one Drupal site (our Dev server) onto our UAT or Live platform. The Live platform is a simple Apache / Drupal / Varnish stack with a load balancer in front of several web servers, the back-end is several MySQL servers.

When we deployed our final site to launch, we ran into a problem where a specific image on our front page was not displaying. Looking at the HTML source when the image is broken and we see that the image source rather bizarrely contains the hostname 127.0.0.1:

<img alt="" class="media-image" typeof="foaf:Image"
src="http://127.0.0.1/sites/default/files/LMAX-intro-video.jpg" />
Not being Drupal 7 experts we can't code dive into it's PHP with much confidence so on comes the black box testing and some facts we discovered:
  1. We confirm this section of HTML is dynamically generated - it's not a hard coded link to 127.0.0.1 someone's typed into the Drupal interface.
  2. No other images are broken, just this one.
  3. Looking at other images, the source of the image should be starting with "http://www.lmax.com/...".
  4. If we request the correct image link directly it loads fine, so the image file is not missing nor does there appear to be a problem with Apache serving the file.
  5. We restart Varnish on all web servers to see if this is problem between Varnish and Drupal but it does not fix the problem.
  6. We dump all databases and grep for the offending string and pin the problem to one table.
The bad HTML is being stored in Drupal 7's cache_filter table. We delete the entry from the cache_filter table, refresh our site in a browser and the problem is solved, but unfortunately not for good.

The next day the problem re-occurred - our web developer says that he deployed a new copy of the site onto production and the image is missing again. We investigate again and find the same bad HTML on the database servers. We delete the entry from the cache_filter table again then check our website - the image is still broken. Looking back in the cache_filter table and we see that the same bad HTML has been regenerated, despite us deleting that row. Just to be sure, we truncate the entire cache_filter table on both databases and refresh - still contains 127.0.0.1. What we thought resolved our problem yesterday has not worked a second time and we now have no quick fix way of solving it.

We convey to the business that we can't fix this in five minutes and settle down for some more serious investigation. We now know:
  1. The tail end of the problem is the Drupal generated HTML stored in the cache_filter table in our database(s).
  2. The problem appears to occur after a deployment of new content from our Dev server to our Live servers.
  3. We specifically don't restore any cache table content when doing a deployment to avoid any "stale" cache from Dev reaching Live - so after deployment, the cache tables are empty.
  4. Something is continually repopulating the cache with bad HTML.
We have a UAT environment to test deployment specific problems, built for exactly these kinds of problems. We only managed to reproduce the problem once in several test deployments of the same content to our UAT environment - the issue is very intermittent on UAT and practically constant on Live. UAT does not have any load balancing normally, we add some but still cannot reproduce.

We search through the core Drupal 7 PHP code, our custom modules and contributed modules for mention of the host 127.0.0.1. It appears a few times but leads no where relevant. We also spend time playing with Varnish on UAT: we know that each Varnish server's Apache backend is configured over the loopback interface and it's written in the Varnish configuration file with '127.0.0.1'. Our work proves unhelpful there as well.

Trying a different approach, we turn on full query logging on the UAT database, deploy to UAT and browse around our website, looking for "insert into cache_filter" lines. Our thinking is to trace back through the queries for an idea of what occurs before the cache_filter insert and thus hint at what's populating this table. The UAT query log does not help much: we find the insert query but the problem has not occurred after the deployment so the cache_filter contents is correct. Other than witnessing a lot of queries against the domain table, the UAT query log is not very helpful.

We decide to turn on the full query log for one of the production MySQL servers, as we were not happy that our efforts in UAT had exhausted this avenue. We finally have our eureka moment: within seconds of turning on the log we see a queries against the domain table, but these are ever so slightly different:
SELECT domain_id, subdomain, sitename FROM domain WHERE subdomain = '127.0.0.1'
We had a general idea of the domain module and that it works by what hostname someone puts in the browser, so these queries said to us that someone or something was hitting localhost with URL requests and they are getting far enough into our web stack for Drupal to query for it. We immediately revisit Varnish but can't prove it is the cause on UAT yet again. We compare the Apache logs with the Varnish logs, we think on how UAT (unfortunately) differs to production and finally the sack of pennies drops.

The answer was in front of us the entire time - The load balancers use HTTP health checks of "GET /" against the web servers. The load balancer health checks run continuously almost every second and so when a deployment occurs against Live, the load balancers will almost always be the first request to the front page of the website. Since we effectively truncate the cache tables when we deploy, the load balancer health check triggers Drupal to repopulate it's cache. Something about the load balancer's request is causing Drupal to search for a '127.0.0.1' domain, perhaps incomplete HTTP headers, or maybe a REMOTE_HOST header of 127.0.0.1. Since we don't have a domain of 127.0.0.1 the request falls back to our default domain (a feature of the domain module) but somehow content for the front page is being generated incorrectly with details from the original request and cached.

To confirm what was only a theory at this point we changed the load balancer health check to just test the TCP connection rather than a HTTP test, waited for a request to come through and checked the site - the generated content from our request was correct. Rather than keep the TCP health check we found an example Drupal PHP script that does a minimal Drupal bootstrap to check the database health and return HTTP status codes appropriately.

The clarity of hindsight:
  • When we had the issue the very first time, after deleting the bad cache_filter entry I must have refreshed the website faster than the load balancers check, hiding the problem until the next day.
  • When we added the load balancer to UAT, we mustn't have set up a health check (or if so, only a TCP connection check), as we were unable to reproduce the problem in UAT. Lesson: if trying to mirror production, mirror production.
The vast majority of the problem is now worked around, but it is not solved - the issue still re-occurs every once every couple of weeks, in Live and UAT now as well. There are still several questions that I would like to answer:
  1. What is the exact part of the load balancer request that caused Drupal to generate it's cache incorrectly? Is it the REMOTE_HOST header?
  2. Is it just the load balancer or was Varnish also a catalyst? If we take Varnish out of the mix and just have the load balancer point to Apache directly, do we still have a problem?
  3. What's causing the very infrequent re-occurrences of the problem now? Could it be the Varnish cache expiring and requesting a new copy of the object?
Like all Systems Administration problems though, it will get attention when it annoys someone enough to justify spending the time to fix it permanently. If only computers didn't exist, our lives would be so much simpler...

by Luke Bigum (noreply@blogger.com) at January 19, 2012 07:29 PM

removing empty cgroups and other problems

We had to clean up some left over cgroups after another set of experiments with LXC.The guys doing it encountered problems, as the logic is the opposite of what you expect from your experience on a normal unix filesystem.

Specifically the problem happens when you have a nested cgroup - for example /cgroup/foo/bar/

The problem happens because this is a virtual filesystem with its own rules. 

If you create a cgroup (in this case we'll just mount the net_cls cgroup because its smaller)

# mount -t cgroup /dev/cgroup /cgroup -o net_cls 
# cd /cgroup
# ls
cgroup.procs net_cls.classid notify_on_release release_agent tasks
# mkdir foo
# ls
cgroup.procs foo net_cls.classid notify_on_release release_agent tasks
# ls foo
cgroup.procs net_cls.classid notify_on_release tasks

There are a set of files created to manipulate the cgroup. If the tasks entry is empty, then the cgroup is unused. 

However the files can't be removed. 

# rm -f *
rm: cannot remove `cgroup.procs': Operation not permitted
rm: cannot remove `foo': Is a directory
rm: cannot remove `net_cls.classid': Operation not permitted
rm: cannot remove `notify_on_release': Operation not permitted
rm: cannot remove `release_agent': Operation not permitted
rm: cannot remove `tasks': Operation not permitted

So here's where the confusion arose. To remove a directory hierarchy on a normal file system you should first remove all files from each subdirectory and then remove the parent directories. 

This is what "rm -fr  directory" will usually do for you. 

However, as the files cannot be removed, that approach will fail in this virtual filesystem. "rm -rf" is the wrong approach.  

Instead, you need to remove just the cgroup directories. If you want to do this recursively here's the magic incantation (we first make a sub-cgroup to give it more than one level of depth);

# mkdir -p foo/bar
# ls foo/
bar/ cgroup.procs net_cls.classid notify_on_release tasks
# find foo -depth -type d -print -exec rmdir {} \;
foo/bar
foo
# ls
cgroup.procs net_cls.classid notify_on_release release_agent tasks

 As the files aren't real, they don't need to be removed - we only need to remove the cgroup directories. The "-depth" argument tells find to do a depth first recursion, so we remove sub cgroups first. 

If that fails, then you'll most likely have a task occupying one of the cgroups somewhere.

 You can find it like this;

# mkdir -p foo/bar
# echo $$ > foo/bar/tasks
# find foo -depth -type d -print -exec cat {}/tasks \;
foo/bar
21413
21956
21957
foo

And to remove those tasks from the subcgroup add them into the top level. 

# echo $$ > tasks
# find foo -depth -type d -print -exec cat {}/tasks \;
foo/bar
foo
#

The last gotcha with cgroups relates to mounting subsets of functionality.

 If you have created a cgroup, and then attempt to mount or remount /dev/cgroup with different cgroup subsystems active, the mount will fail with a rather unhelpful error message "mount: /dev/cgroup already mounted or /cgroup busy" 

 The answer is to remove the cgroups you have created, so that there are no cgroups on the system, and then you can mount with different subsystems active.  

 As in the following example;

# umount /cgroup
# mount -t cgroup /dev/cgroup /cgroup -o net_cls,cpuacct
mount: /dev/cgroup already mounted or /cgroup busy
# mount -t cgroup /dev/cgroup /cgroup -o net_cls
# find /cgroup/foo -depth -type d -print -exec rmdir {} \;
/cgroup/foo/bar
/cgroup/foo
# umount /cgroup
# mount -t cgroup /dev/cgroup /cgroup -o net_cls,cpuacct
# ls /cgroup
cgroup.procs cpuacct.usage net_cls.classid release_agent
cpuacct.stat cpuacct.usage_percpu notify_on_release tasks

Which is probably why fedora mounts them all separately. Even though that looks messy.

by atp at January 19, 2012 03:57 PM

January 18, 2012

Interview on High performance Java

I recently spoke, with my ex-colleague Martin Thompson, at the GOTO conference in Aarhus. While we were there we were interviewed by Michael Hunger. We discussed various topics centered around the design of high performance systems in Java, the evolution … Continue reading

January 18, 2012 04:47 AM

January 17, 2012

Christmas decorations teach me a lesson about troubleshooting

And now, after an absence of several weeks, you get to see how long it takes me to write some of these posts.


I was putting up the Christmas decorations one Saturday when my worst fear was realised1 - one of my three strings of lights was not working.

The first two went up fine.  The third lit up when I plugged it in, and in less than a second went out.  Curses.  This is not what I wanted, this was supposed to be a short exercise in making my tiny little flat look festive.

So I set about the tedious task of starting from the end closest to the plug and replacing every bulb, one by one, with a spare one to see if it magically lit up again.  When it doesn't, you take the spare back out and replace it with the original bulb.  I remember my parents going through this ritual every Christmas, the tediousness of this activity is more memorable than the fleeting joy of shinies.

While I was doing this, my mind was back on the job I'd been doing at work the previous week - battling an Internet Explorer 7 performance problem.  We have automated performance tests which give us an indication of the load time for our application in Chrome and IE, and some time in the previous couple of weeks our IE performance had significantly degraded in the development code.  Due to a number of too-boring-to-explain-here circumstances, the last known good revision was four days and nearly 250 revisions earlier than the first revision that showed the performance problem.

Since we couldn't see anything to indicate it was an environmental problem, the logical next step was to pinpoint the revision which caused the problem, so we could either fix it or get performance gains from somewhere else in the system.


The most obvious way to do this, given there were no obvious suspects, is with a binary search of the revisions.  Our last known good revision was 081, our first poor performing one was 240.  So the thing to do is to check revision 160, see if it falls on the poor or good performance side.


If 160 proves to be a poor performer, check revision 120....


...if 160 is fine, test revision 200...

...and keep splitting the revisions by half until you find the suspect.

So of course that's what I want to do with my stupid Christmas lights.  I do not want to sequentially check each light bulb, that has a worst-case number-of-bulbs-tried = n, where n is the number of bulbs (probably a couple of hundred, although it felt like several thousand).  So, in computer speak, O(n).  The binary search algorithm is O(log n).  At university, this Big Oh had no context for me.  But when you've taken 10 minutes to get a quarter of the way through your Christmas lights, and you diagnosed your IE performance problems... well, actually it took days.  But the point is, a binary search for the missing bulb would definitely have been a Good Thing.

I know you're dying to know if I tracked down the problem in Internet Explorer - I did.  What's the worst case when you're doing a binary search?  It's when the thing you're looking for is veeeery close to either your start point or your end point.

The revision number I was after was number 237.  Sigh.


And my Christmas tree lights?  Well, through the boredom I remembered that modern lights are in sections, so they have a sort of built-in binary search - well, limited segments will go dark if a single bulb is out - which allows you to narrow down your problem area.  Since the whole string was out, I figured something else was probably wrong.

Turned out the plug had come out of the socket.

So:
Lesson 1: Theoretical computer science does have a place when you care about how long something takes.  When it's you feeling the pain, you'll do anything to make it stop.

Lesson 2: When diagnosing a problem you will always biased towards what you think it is, in the face of actual evidence.  I was afraid I would have to search the whole set of lights for a blown bulb, so that's the problem I was looking for when the lights failed.  In actual fact it was a problem with a much simpler solution.



1 - OK, "worst fear" in this very limited context only - it's not like I lie awake at night in July afraid that one of the bulbs on my Christmas lights has blown.

by Trisha (noreply@blogger.com) at January 17, 2012 06:08 PM

January 14, 2012

Building a CPU Topology on MacOS X

Within a project I've been working on I've had the need to simulate the capabilities of Linux's /proc/cpuinfo on Mac OS. Specifically I needed to build a topology of the CPUs on a given system. I.e. I need to map the operating system's processors to hardware threads, then build a picture of which cores and sockets those threads reside. For example my Mac looks something like:


CPU0 (Thread 0) ---+
|---> Core 0 ---+
CPU1 (Thread 1) ---+ |
| ----> Socket 0
CPU2 (Thread 2) ---+ |
|---> Core 1 ---+
CPU3 (Thread 3) ---+

While this sounds very simple, it's actually fraught with a number of little niggles. Not only did it require getting down and dirty with a bit of C++ and X86 Assembly, it also required writing a MacOS kernel extension.

The first step was to understand what information was available from the CPU. Intel exposes an instruction called CPUID. The is the primary mechanism for getting information about the CPU. There is a raft of information available from listing of the CPU features available (e.g. hyperthreading) to sizes of the various levels of cache and the associated cache lines. To access the CPUID instruction we need a little bit of inline assembler.

The code shows how to get the vendor string from the CPU. On my Mac I get the following:

// Output:
Vendor String: GenuineIntel

For those unfamiliar with Intel inline assembly, the Intel CPU defines a number of registers. The ones used for the CPUID instruction are EAX, EBX, ECX, and EDX (referenced as RAX, RBX, etc if using 64 bit instructions via the REX extension). These used for both input and output. An inline asm segment consists of 3 parts. The first part is the instruction to be executed. In this case the "cpuid" instruction. The second line defines the output parameters. The snippet "=a" (data[0]) means store the result in the EAX register in the variable data[0]. The "=a" refers to the 2nd letter of the register designation. The 3rd and final section are the input parameters. The CPUID instruction takes 2 parameters, one in EAX and one in ECX.

The particular CPUID reference that provides information needed to building the topology is 0xB (11) - the extended topology enumeration leaf. The data returned from this instruction is:


0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
EAX | Shift | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
EBX | No. Process at this level | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
ECX | Level No. | Level type | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
EDX | x2APIC ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The extended topology enumeration leaf is one of the CPUID indexes that also makes use of ECX as an input parameter. This indicates the level of the CPU that you wish to work at. I started at level 0 and worked my way up. The 'Level type' describes whether the current level is a CPU thread (1) or a physical core (2) or invalid (0). All values greater than 2 are reserved, presumably for later use. The 2 other useful values are the x2APIC ID and Shift. The x2APIC ID is a unique identifier for the smallest logic unit in the CPU. The Shift value is used to group x2APIC ID values into units at the next logical level. This is done by shifting the x2APIC ID right by the value specified by Shift. For example on my using the following code on my workstation (2 sockets, 8 cores, 16 threads):

Outputs the following:


Shift: 1, Count: 2, Level Id: 0, Level Type: 1, x2APIC: 33, Group: 16
Shift: 5, Count: 8, Level Id: 1, Level Type: 2, x2APIC: 33, Group: 1

This indicates that the hardware thread indicated by APIC 33 has the core id of 16 and socket id of 1. The socket and core ids aren't really references, but values that indicate threads have have the same id share that unit. This gets me most of the way there, I can group all of my threads into a hierarchy of cores and sockets. However there is one small niggle remaining. I need this code to run all of the CPUs on my system. On Linux this is reasonably simple, you count the number of CPUs then iterate through all of them and use the pthread_setaffinity_np(...) to specify specify which CPU to run the CPUID instructions on.

However, on Mac OS X actual thread affinity is not supported; just a mechanism logically grouping or separating threads, powerful in its own right, but not what I'm after here. This is where the Kernel module comes in. The XNU Kernel defines a funky little method called mp_rendevous(...). This method takes in a number function pointers. The key one (action_func), is run once on each of the CPUs. So to get the topology information for all of the CPUs we can use it like so:

Because the method mp_rendevous() is defined in the kernel the code above needs to be packaged as a kernel extension. Even then, getting access to the method is interesting. It's not defined in a header that can be easily included, however it is available at link time when compiling a kernel module. Therefore in order compile correctly, it's necessary to provide your own forward declaration of the method. The same is true of the cpu_number(). Calling into the kernel method from user space requires use of the IOKit framework, but I'll leave the details of that as an exercise for the reader.

by Michael Barker (noreply@blogger.com) at January 14, 2012 10:15 AM

January 13, 2012

Concurrent Sequencing

A few weeks ago one of the users of the Disruptor posted some worrying benchmarks:

ThreePublisherToOneProcessorSequencedThroughputTest
run 0: BlockingQueue=645,161 Disruptor=1,772 ops/sec
run 1: BlockingQueue=1,250,000 Disruptor=20,000,000 ops/sec
run 2: BlockingQueue=1,250,000 Disruptor=56 ops/sec

It appears under heavy contention with fewer available cores than busy threads the Disruptor can perform terribly. After a bit of investigation I managed to isolate the problem. One of the most complex parts of the Disruptor is the multi-threaded claim strategy. It is the only place in the Disruptor where - out of necessity - we break the single-writer principal.

The approach that we used was very simple. Each thread claims a slot in the ring buffer using AtomicLong.incrementAndGet(). This ensures that each claim will return a unique sequential value. The complexity arrives when the multiple threads try to publish their sequence. We require that all events placed in the ring buffer must be made available to event processors in a strictly sequential order. To ensure this behaviour we have a method called serialisePublishing(). Our simple implementation would have the thread that is publishing busy spin until the last published sequence (the cursor) is one less the value being published.

This works because each sequence published is unique and strictly ascending. For example if one thread wants to publish the value 8, it will spin until the cursor value reaches 7. Because no other thread will be trying publish value 8 it can make progress and ensuring the sequential publishing behaviour in the process. However, this busy spin loop causes problems when there are more threads than cores. The threads that need wait for the prior sequences to be published can starve out the thread that should be updating the cursor. This leads to the unpredictable results shown above.

We need a better solution. In an ideal world there would be a Java API that would compile down to the Intel MONITOR/MWAIT instructions, unfortunately they're limited to Ring 0, so require a little kernel assistance to be useful. Another instruction (but unavailable in Java) would be the Intel PAUSE instruction that could used in the middle of the spin loop. One of the problems with busy loops on modern processors is in order keep the pipeline full the CPU may speculatively execute the condition at the top of the loop, causing an unnecessarily high number of instructions to fill the CPU pipeline. This can starve other logical threads of CPU resources. The PAUSE instruction on hyper-threaded Intel processors can improve this situation.

Java has neither of those, so we need to go back and address the shape of the serialisePublishing method. For the redesign I drew some inspiration from Cliff Click's non-blocking hash map. There 2 aspects of his design that are very interesting:

  • No locks
  • No CAS retries

While the first is obvious, the second is trickier. Any one familiar CAS operations will have seen the traditional loop until success approach to handling concurrent updates. For example the incrementAndGet method inside the AtomicLong in Oracle's JVM uses a similar loop. It could look something like1:

While there are no locks here it is not necessarily wait-free. It is theoretically possible, if a number of threads are trying to increment the value, for one (or more) of the threads to get stuck unable to make progress if other threads are constantly winning the race to the atomic operation. For an algorithm to be wait free all calls must complete within a fixed number of operations. One way to get closer to a true wait free algorithm is to design the algorithm such that a failure of a CAS operation is a signal to exit rather than to retry the operation. Cliff Click's approach was to model the algorithm using a state machine, where all states are valid and a transition between states is typically a CAS operation. E.g. image a state machine with 3 states {A, B, C} and 2 transitions {A->B, B->C}. If a instance of the state machine is in state A and 2 threads try to apply the transition A->B only one will succeed. For the thread that fails to apply its CAS operation retrying the operation makes no sense. The instance has already transitioned to state B. In fact the failure of CAS operation is an indication that the instance is already in the desired state. The thread can exit if B is the desired state of the action or try to apply the B->C transition if that is what's required.

How does this apply to our concurrent sequencing problem? We could allow threads to continue to make progress while waiting for other threads to catch by maintaining a list of sequences that are pending publication. If a thread tries to publish a sequence that is greater than 1 higher than current cursor (i.e. it would need to wait for another thread to publish its sequence) it could place that sequence into the pending list and return. The thread that is currently running behind would publish its own sequence, then check the pending list and publish those sequences before exiting.

To represent this as a state machine we would have 3 states {unpublished, pending, published} and 2 transitions {unpublished->pending, pending->published}. In recognition of the fact that computing resources are finite, we have a guard condition on the unpublished->pending transition. I.e. a limit on number of sequences we allow in the pending state. Because each sequence is unique, the transition unpublished->pending does not require a CAS operation. The pending list is represented as an AtomicLongArray and the transition is a simple AtomicLongArray.set() where the index is the sequence modulo the size of the pending list2. The final transition pending->published is where the CAS operation comes in. The thread will first try to publish its own sequence number. If that passes then the thread will try to publish the next value from the pending list. If the CAS fails the thread leaves the method. The failure means that the value is already published or will be by some other thread.

Running the multi-publisher performance test on my 2-Core laptop (where at least 4 threads would normally be required):

ThreePublisherToOneProcessorSequencedThroughputTest
run 0: BlockingQueue=5,832,264 Disruptor=6,399,590 ops/sec
run 1: BlockingQueue=5,521,506 Disruptor=6,470,816 ops/sec
run 2: BlockingQueue=5,373,743 Disruptor=6,931,928 ops/sec

Sanity restored.

This update will be included in the 2.8 release of the Disruptor as the default implementation of the MultiThreadedClaimStrategy. The old implementation will still be available as MultiThreadedLowContentionClaimStrategy. Those who have plenty of cores where the publishers aren't often contented may find the old implementation faster, which it should be as it is simpler and requires fewer memory barriers. I'm going to continue to revise and work on this code. While improved, it is not truly wait free. It is possible for one of the threads to get stuck doing all of the publishing.

1 The AtomicLong.getAndIncrement() does use a slightly different loop structure by the semantics are the same.
2 Actually it's a mask of the sequence and the pending list size minus 1. This is equivalent when the size of the pending list is a power of 2.

by Michael Barker (noreply@blogger.com) at January 13, 2012 07:58 PM

January 11, 2012

Development Mode – Concatenating Scripts and CSS

HTML 5 Boilerplate reminded me of an old-school tool which is extremely useful for concatenating JS and CSS files on the fly – server side includes:

<FilesMatch ".*\.combined\.(js|css)$">
  Options +Includes
  SetOutputFilter INCLUDES
</FilesMatch>

Then you have a main scripts.combined.js (or css) which contains:

<!--#include file="libs/backbone-min.js" -->
<!--#include file="libs/underscore-min.js" -->
<!--#include file="libs/jquery-1.7.1.min.js" -->

Plus any other files you need, in the order you specify. This works great for development mode so you can change a file on the fly and just refresh the browser without running any kind of build script. When it comes time to push to production, it’s easy for a build script to process the file ahead of time and be confident that you’ll get exactly the same result.

by Adrian Sutton at January 11, 2012 09:37 PM

January 03, 2012

Cross Pairing

This evening I stumbled across two interesting posts about alternate layouts for pairing rather than sitting side by side. Firstly Tom Dale talking about Tilde’s corner desk pairing setup (and some of the software setup they use) and that was inspired by Josh Susser’s face to face pairing setup at Pivotal Labs.

Both approaches require more floor space which makes them difficult to setup but I would expect the face to face pairing to be a heck of a lot better if done well. I’ve always preferred having separate screens in mirror configuration as well as separate keyboards and mice to allow the developers to sit a little further apart to be comfortable and to be able to look straight ahead at the screen. That said, I quite like having a second monitor for spreading out windows as we have at LMAX so it’s not clear cut which is better.

It’s also interesting to note the popularity of the flat screen iMacs as opposed to Mac Pros or laptops. The former being too expensive for extra power and extensibility that generally isn’t required and the latter often being a bit too individualised to be good as pairing machines. Plus laptops, while amazingly powerful these days still have less bang for the buck and the reduction in performance is just enough to matter for development.

by Adrian Sutton at January 03, 2012 09:43 PM

January 01, 2012

Bottlenecks in Programmer Productivity

Yan Pritzker recently posted “5 ways to get insane productivity boosts for coders” which provides some good tips on how to improve your usage of tools when doing technical work. To summarise:

  • Never look when you can search
  • Don’t repeat yourself (by setting up shortcuts and command line aliases)
  • Learn a scripting language
  • Learn an editor (and use that editor everywhere)
  • Learn regular expressions

However, nearly all of these tips really boil down to how to be more productive at writing text and the mechanics of writing code – editing actual source code files, jumping to different places in those files, executing commands more efficiently etc. Are these really the tasks that consume the vast majority of a developers time?

While often it feels like they are, after all we spend all day working with a text editor or the command line, are we really spending all that time struggling to keep up with the stream of code our brains are trying to output? Personally, I spend a lot more time thinking about what the best algorithm is, what direction we should be pushing the design in and how to do that and so on. Rarely do I need to type at full speed for extended periods of time.

Given that, I suspect the real bottlenecks in developer productivity are more along the lines of:

  • Comprehending an area of code and its design
  • Identifying, evaluating and selecting potential solutions to a problem (be that choice of algorithm, design choices or choice of classes/libraries etc)
  • Understanding and sharing a design vision within the team
  • Understanding and sharing requirements within the team

This isn’t an exhaustive list but it should give an idea for the types of things that really limit programmer productivity. In other words, someone who types with two fingers can be much more productive than someone who knows every shortcut in their editor if they are able to identify a simpler solution to the problem.

For example, recently LMAX needed to make a large and potentially quite risky change to how our system worked. The change affected a core concept in the system and so could have knock-on effects to a wide range of components and be very hard to test effectively. At the start of our two week iteration we already had a potential solution planned out but the developers were uncomfortable with the amount of risk involved with that plan. So we spent a full week discussing options, exploring the code and experimenting, plus talking with business experts to get a better understanding of the domain model. All that discussion led us to find a much simpler solution which, if something was missed, would be sure to cause failures in our existing acceptance tests. Even counting that initial week of discussion, finding the right solution meant we could deliver the change in about half the time we had expected.

So by all means, spend time learning to use your tools better, it avoids a lot of tedious boring work, but if you really want a step change improvement in productivity, work to improve you communication, design and thinking skills.

by Adrian Sutton at January 01, 2012 08:40 PM

Searching on Google Maps Does Nothing In Chrome

If you’re experiencing an issue when using Chrome with Google Maps where typing in a location and hitting enter either does nothing or just says “Loading…” forever, you’ve very likely hit upon a bug in the Skype click-to-call extension.

Skype automatically installs this “add-on” when you install Skype and it adds links to phone numbers so that you can click them to call on Skype. Unfortunately, Google Maps seems to be triggering a bug in the extension and so it is either corrupting the data returned to the Google Maps javascript or just preventing the request from ever returning.

To solve the issue, just uninstall the Skype click to call add-on. I did that by right clicking the Skype icon in the toolbar and selecting “Uninstall” but it should be possible to remove it from the about:extensions page.

I also found a Skype click-to-call plugin in about:plugins that I disabled as well – I’m not sure if it’s removed along with the extension.

by Adrian Sutton at January 01, 2012 07:58 PM

December 31, 2011

Broken Post

Some observant readers noticed that I had a put up a post yesterday and it was gone by the time they went to read it.  This was because I accidentally pressed the publish button instead of save when I was only half way through writing the post.  I've since deleted it and I'm working on finishing it off in the next day or so.  Watch this space.

by Michael Barker (noreply@blogger.com) at December 31, 2011 08:04 PM

December 30, 2011

Concurrent Sequencing

A few weeks ago one of the users of the Disruptor posted some worrying benchmarks:

ThreePublisherToOneProcessorSequencedThroughputTest
run 0: BlockingQueue=645,161 Disruptor=1,772 ops/sec
run 1: BlockingQueue=1,250,000 Disruptor=20,000,000 ops/sec
run 2: BlockingQueue=1,250,000 Disruptor=56 ops/sec

It appears under heavy contention with fewer available cores than busy threads the Disruptor can perform terribly. After a bit of investigation I managed to isolate the problem.

One of the most complex parts of the Disruptor is the multi-threaded claim strategy. It is the only place in the Disruptor where - out of necessity - we break the single-writer principal. The approach that we used was very simple. Each thread claims a slot in the ring buffer using AtomicLong.incrementAndGet(). This ensures that each claim will return a unique sequential value. The complexity arrives when the multiple thread publish their sequences. We require all event placed in the ring buffer must be made available to event processors in a strictly sequential order. To ensure this behaviour we have a method called serialisePublishing(). Our simple implementation would have the thread that is publishing busy spin until the last published sequence (the cursor) is one less the value being published.

    public void serialisePublishing(final long sequence,
final Sequence cursor,
final int batchSize)
{
final long expectedSequence = sequence - batchSize;
while (expectedSequence != cursor.get())
{
// busy spin
}

cursor.set(sequence);
}

This works because each sequence published is unique and strictly ascending. For example if one thread wants to publish the value 8, it will spin until the cursor value reaches 7. Because no other thread will be trying publish value 8 it can make progress and ensuring the sequential publishing behaviour is the process. A problem arrises

by Michael Barker (noreply@blogger.com) at December 30, 2011 11:18 PM

December 23, 2011

More Complexity and Fork/Join

Firstly an apology.  On my previous blog, I mentioned that a string splitting algorithm implemented in Scala had a complexity of $O(n^2)$.  One commenter mentioned that they did not understand how I came to that calculation.  I though I should revisit my guess work and actually do a more thorough analysis.  What I found was interesting, I had overstated the complexity, in reality it was $O(n.log_{2}(n))$.  I've included my working here.

Complexity of the Scala Implementation

In the Scala implementation the list concatenation operation is $O(n)$.  I've simplified the model such that the factors applied are different, but that shouldn't have a bearing on the complexity.  Fork/Join algorithms work on a divide and conquer model, in its simplest form looks much like a binary tree.  To calculate the complexity of the reduction part of a Fork/Join algorithm we need to sum the the cost of all of operations to reduce the dataset to a single result.  If we start with the base set of partial results, for the sake of an example assume there are 8, then the first step it to reduce them to 4.  Then second step takes the 4 partial results to create 2.  The third and final step takes the 2 results to create the final solution.


So if we have a dataset of size $n$ and were are using Scala's default list implementation, the cost to perform the reduction is:
$$1\frac{n}{2} + 2\frac{n}{4} + 4\frac{n}{8} + ... + \frac{2^k}{2}.\frac{n}{2^k}$$
where $k = \log_{2}(n)$.  At step k, $\frac{n}{2^k}$ represents the number of operations, $\frac{2^k}{2}$ is the cost of each operation. We can eliminate $2^k$ and express the sum using sigma notation:
$$\sum_{k = 1}^{\log_{2}(n)} \frac{n}{2}$$
Applying the sum we get:
$$\frac{n}{2}.\log_{2}(n)$$
This gives a complexity of $O(n.\log_{2}(n))$.  It is not nearly as bad as the $O(n^{2})$ I mentioned the previous blog.  It is still worth avoiding as the benefit of applying a fixed number of multiple cores (i.e applying a constant factor to the cost) will be outweighed by the non-linear increase in cost as the dataset increases. 

Complexity of the Fortress Implementation

However, the Fortress version presented by Guy Steele doesn't have $O(n)$ complexity for each of the reductions.  It uses a PureList based on finger trees which has $O(\log_{2}(n))$ mutation operations.


The cost of the computation at each step breaks down differently, summing the cost of the computation looks like the following:
$$log_{2}(1)\frac{n}{2} + log_{2}(2)\frac{n}{4} + log_{2}(4)\frac{n}{8} + ... log_{2}(2^{k - 1})\frac{n}{2^k}$$
where $k = \log_{2}(n)$.  At step k, $\frac{n}{2^k}$ represents the number of operations, $log_{2}(\frac{2^k}{2})$ is the cost of each operation, give the sum of:
$$T(n) = \sum_{k = 1}^{\log_{2}(n)} log_{2}(2^{k - 1})\frac{n}{2^k}$$
Fortunately this simplifies:
$$T(n) = \sum_{k = 1}^{\log_{2}(n)} (k - 1)\frac{n}{2^k}$$
The following series $\displaystyle S = \sum_{k >= 1}^{\infty} \frac{k - 1}{2^k}$ converges to 1.  So as $ n \rightarrow \infty$, $ T(n) \rightarrow nS$.  There is some math (see below) to show that this translates into $O(n)$ complexity, however I find that it is easier to represent visually.


The pink link is the upper bound on the complexity $nS$ and the blue line is the sum of $T(n)$.  So, contra to my statement in the previous post using an $O(log_{2}(n))$ operation to perform the reduction part of a Fork/Join algorithm won't introduce any complexity issues.

I learnt quite a lot in working through the complexity calculations, the most important of which, is not to jump to making a statement about the complexity of an algorithm.  Often it's not a simple as you may think and it requires a little bit of rigour.

As to the original problem, I'm continuing to experiment with different parallel implementations to see how they perform on a couple of different multi-core systems.  The fact remains that the simple imperative version is still much faster than the parallel implementation.  I am working a couple of different approaching using mutable intermediate results.

For those who are interested, here is the math.  I would like to say I did this all myself, but had a lot of help from the Internet elves at http://math.stackexchange.com/.


$T_n=nS_i(x)$ for $i=\lfloor \log_2(n)\rfloor$ and $x=\frac12$, where, for every $i$ and $x$,

$$S_i(x)=\sum_{k=1}^i(k-1)x^k=\sum_{k=0}^{i-1}kx^{k+1}=x^2U'_i(x),\quad U_i(x)=\sum_{k=0}^{i-1}x^{k}.$$
The function $x\mapsto U_i(x)$ is the sum of a geometric series, hence, for every $x\ne1$,
$$U_i(x)=\frac{1-x^i}{1-x},\qquad U'_i(x)=\frac1{(1-x)^2}(1+ix^i-x^i-ix^{i-1}).$$
Using $x=\frac12$, this yields
$$T_n=n(1-(i+1)2^{-i}).$$
Since $i=\lfloor \log_2(n)\rfloor$, one gets
$$n-2(\log_2(n)+1)<T_n<n-\log_2(n),$$
hence $T_n=n-O(\log_2(n))$. The sequence of general term $(T_n-n)/\log_2(n)$ has the interval $[-2,-1]$ as limit set.




by Michael Barker (noreply@blogger.com) at December 23, 2011 11:56 AM

Blog Rename and Video Links

I've decided to rename my blog.  I plan to focus my blog efforts more on concurrency than any other particular subject and I thought it would be fun to pay homage to the bad science blog by Ben Goldacre.

It will probably kill my traffic for a while, until I get the few blog aggregators that carry my blog to update their feeds.  So welcome to my rebranded blog, I'll start posting again in the next few weeks.  In the mean time here are some links to videos from the various conferences that I spoke at over the past month or 2.


  • LJC @Playfish - Beginner's Guide to Concurrency (the trial run)
  • JAX London - Beginner's Guide to Concurrency
  • Devoxx - A tools in action session on the Disruptor (this was a experiment that didn't work very well, an attempt at live coding)

by Michael Barker (noreply@blogger.com) at December 23, 2011 07:15 AM

December 22, 2011

Video: Why we shouldn't target women

If you have a Parleys subscription, you can watch the whole "Why we shouldn't target women" panel from Devoxx 2011 a month or so ago.  Watch me attempt to monopolise the whole panel as if it was my idea or something...

by Trisha (noreply@blogger.com) at December 22, 2011 03:53 PM

Order and Execution Events

Quite recently we had a customer come to us with a question regarding the Execution events received on the API.  The question was: "How do a I recognise the last execution event for a given order event?"  This is a little bit tricky in the Java API as it does not behaviour exactly the same way that FIX does.  E.g. If an individual order with a quantity of 30 aggressively matches multiple price points (quantity of 10 at each) on the exchange, a fix user would expect would expect:

MsgType(35)=8, ExecType(150)=New(0), CumQty(14)=0
MsgType(35)=8, ExecType(150)=Trade(F), CumQty(14)=10
MsgType(35)=8, ExecType(150)=Trade(F), CumQty(14)=20
MsgType(35)=8, ExecType(150)=Trade(F), CumQty(14)=30

As our API is based upon our XML protocol the information that we can return on the API is restricted to what we receive in those events. For a similar scenario over our XML protocol only emits a single order event at the end of the matching cycle, therefore the data output would be:


So the Java API does not have the information about the individual filled quantities as a result of each execution only the final state of the order. This can make it a little tricky to find which execution event represents the end of the matching cycle. The information seen by a user of the Java API for the same scenario would be:

getQuantity() = 10, getOrder.getFilledQuantity() = 30
getQuantity() = 10, getOrder.getFilledQuantity() = 30
getQuantity() = 10, getOrder.getFilledQuantity() = 30

However, it is possible to use some of the additional events that are available on the Java API to derive the same behaviour. By listening to both Execution and Order events we can track the cumulative quantities as we go. It does require a little bit more state management by the client, but the logic is fairly straight forward.

by Michael Barker (noreply@blogger.com) at December 22, 2011 03:35 PM

December 21, 2011

How to make your CV Not Suck

When you're applying for a job at LMAX, your CV (or résumé, for our American readers) usually comes through me and I decide whether to call you for a technical phone screen.

I'm going to let you into a secret.

I'm going to tell you the criteria I use when judging your CV.

Now, you could say this is a foolish thing for me to do, because now when you apply you'll be "cheating" and writing your CV to pass these guidelines.

Good.

LMAX isn't the only company that's going to judge your CV based on these criteria. I firmly believe that an increase in quality of the CVs in our industry can only be A Good Thing.  An increase in the quality of your CV is definitely A Good Thing for you.

Even more importantly, if I get CVs that do not pass these basic criteria, now I know you either don't read the LMAX blogs (shame on you), or you're not able to follow simple instructions (bodes poorly for your ability to learn within the company).

The thing that you have to keep in mind when you're writing your CV is that the reader really does spend less than a minute reading it.  It's not fair, true.  But it's the way humans are. I'm not in HR or recruitment, I have a proper job as a software developer, and I need to get back to that as soon as I can.  When I get CVs in batches of up to 12, as I regularly do, I'm not free to spend more than 10 minutes going through all of them.

The Easy Stuff
You must be able to spell
You really must.  There are things called Spell Checkers and they are amazing.  Some of these new-fangled pieces of software even show you your errors in this cool squiggly red underline in your document.

I'm reading your CV in Open Office, and if I see red squigglies under words that aren't technologies or acronyms I'm going to wonder how good your attention to detail is.

You must use capital letters in the appropriate places
It's traditional to start a sentence with a capital.  It's also traditional to use a capital "I" not "i" when referring to oneself.  We're not 14 years old, we're not writing an SMS to our mates.  We're applying for a proper job paying proper money.

Correct grammar is appreciated
Whether you're a native English-speaker or not, you need to get someone else who is a native English-speaker to check the prose in your CV to see if it scans correctly.  For me, it's not about being prejudiced against you because you're not a natural author, it's a) attention to detail again and b) your ability to make yourself understood.  If your sentence construction, choice of words or simple comma placement is off, I'll have to read that sentence a couple of times to parse it and it's going to trip me up and ruin my flow.  I want to get a good feel for you from reading your CV, so if I stumble a few times I'm not going to feel like I connected with you.

Harder and fluffier
I don't care which versions of Spring you've worked with
I know you need a checklist of technologies on your CV so it gets past the non-technical recruitment agents and get picked up via automated searches.  This is a bigger problem with our industry than one I want to tackle right now.  So I'll let you off having buzzword bingo on your CV.  However, your CV needs to be more than just a list of technologies you have used vaguely, or perhaps once read about.

It's useful to me if a) you put the technology check list in a single place on your CV, b) you give an indication of your level of proficiency in that technology (novice/competent/master) or length of time you've used it in a commercial environment, and c) you organise them in some useful fashion - preferably the ones that are appropriate to the job you're applying for near the top, or at least those you're happiest with at the top.  Alternatively put the checklist of technologies next to the role you used them in.

Often I will completely ignore this section because I'm more interested in your ability to learn and your passion for what you do.

I want to know about your passions
In the old days I used to fast forward to your hobbies and interests, but these days we're encouraged not to put those on the CV in case you're judged against them.  Which seems like political correctness gone crazy, but then when you think about it you can infer a lot about a person from their hobbies and interests, and therefore you could be pre-judging them based on some criteria that is not at all associated with their ability to do the job.  For example, if they have hobbies that take them all over England I might infer they have a car and can drive - OK, it's a dumb example, but you get the idea.

These days, given that I'm trying to find great team members to work with me at LMAX, I'm looking for things like: your blog; any contributions to open source software; your involvement in a Java User Group (or other extra-curricular activity).  I'm not going to discard you if you don't have any of these things, but if you do it's definitely extra brownie points for you.

I want to know if you worship at the altar of technology, or if you're business-value driven
Either of these things is fine - we need people who are very business-focussed and people who are rabid about technology, as well as all those in between, to build a good team.  Another axis of interest is people/process - are you passionate about people, about building a good team, about helping them to deliver?

Getting a feel for where you sit on these axes is not for me to discard you, but if you look like you're strongly in one of these camps and I feel like we need a team member to really push that area, then you stand a much better chance of getting a phone interview.

I'll get an indication of where you are by the way you talk about your roles and your achievements.  This does not help me:
Senior Developer on a web administration application.  Product was implemented using JavaScript, HTML, Spring, Hibernate, JMS, and MySQL.
This is much more useful:
I was part of a team of four developers implementing a web based administration application, commissioned to enable internal users to update the settings of our reporting tool.  This saved the support staff approximately 4 hours every week, as they no longer needed to manually update the database. We used agile techniques such as daily standups and weekly iterations in order to provide quick feedback to the business.
(I made both of those up, by the way, before anyone starts trying to sue me for stealing something off their CV).

Here I can see:
  1. The size of the team, and your ability to work in a team
  2. You understood the business need you were trying to fulfill
  3. You have worked in an agile environment and at least pay lip service to why you were working that way.
I don't really care about the specific technologies you used, the fact that you mentioned web-based and database gives me enough of a feel.

Sometimes prospective employers really do stalk you
Personally I think claims that prospective employers will check every facet of your web presence are somewhat over-exaggerated.  If I barely have 60 seconds to read your CV, I'm not going to check you out on Facebook, my life is too short.

However, if you claim to have written a book I will look it up on Amazon.  If you have a publication or example code, I will glance at those.  If you've worked for a company I've worked for in the past, I'll look you up on LinkedIn to see if we have any common connections (or worse, to see if I should remember you and simply don't).  I'll also use LinkedIn if your CV is not screaming yes or no, to see if there's an extra dimension in your profile which will tip me one way or the other.

So be aware of your web presence, particularly something that is aimed at your professional image like LinkedIn, and make sure it represents you the way you want it to.

In Conclusion
This post might be simply a good way to increase my own workload - every CV I get from now on may be an automatic pass, and then I have to call all of you before I can start weeding you out.

But I don't mind too much about that.  I get concerned sometimes that good people are not getting the interviews they deserve, not just at LMAX but across the industry, because they get almost no good CV advice.  Frequently the people who are the first to read CVs are agencies who are not technologists.  By all means, have words on there that will make your CV appear on their search results.  But you need to put something on there for me, a real developer, because strings of keywords tell me nothing about you.

If I can improve the quality of just one person's CV with this post, I'm happy.  If I have given you that first step towards that job you really want, then that's even better.

by Trisha (noreply@blogger.com) at December 21, 2011 05:46 PM

December 18, 2011

Might as well Jump!

Some years ago I was on a training course on lean thinking and one particular exercise has stuck in my mind ever since; Teams of six or so were given a piece of A4 paper and were told to arrange themselves so that none of them were in contact with the floor. The solution most people took was simply to get all of the team to stand on the paper at the same time. The tutor then folded the paper in half and told us to repeat the exercise. It was still possible on one foot for the team to stand on the paper. But the paper was folded again and again until people were resorting to climbing on one another or declaring the task impossible. Finally the tutor took the piece of paper away entirely. Suddenly one team realised that the paper was completely surplus to requirements and the goal could be achieved even with no paper at all by having the whole team jump on the count of three. The point of this exercise was to demonstrate “Muda” or waste and how the presence of a tool or resource tends to make people use it even if in practice it is not needed at all.

One of the tenets of lean production, which as you will know is one of the foundations of the approaches to software development collectively know as agile, is the reduction of waste in all its forms. Inventory that is not being used and work in progress (which is product that is not yet ready) is an example of one of seven “wastes”. In software development code is inventory, if this code is not usefully in production it is my conjecture that this is waste and we need to find ways to minimise it.

One way of addressing this waste will be familiar to any practitioner of agile development; namely the use of an iterative approach, one of the benefits this gives is that we deliver useful features into the production code every iteration hence reducing the wasteful inventory of code that is work in progress. We also strive to prioritise our work so that the most important feature is delivered first and we do not put wasteful effort into less important features and to ensure that we do not “over deliver” excess inventory that is not yet needed..

You’ll hear it stated frequently that having a single product backlog that is ordered as a single queue is essential to a successful agile project. The idea being that work is done in strict priority order. In practice what often then happens is that each pair will grab the top available card in the backlog and begin work on it, the next pair will grab the second and so on. What this means is that in a team with six developers you suddenly have three cards running in parallel typically all placing conflicting demands on your QA’s, BA’s and the customer at the same points in an iteration which then leads to features being held up or bugs being found late in the day. Worse still it could be your top priority card gets held up because of some problem with a lower priority card that conflicts with (either through necessity or some error in the solution).

A consequence of which is that some or all (and not necessarily the lower priority ones) end up being incomplete at the time of cutting a release. Subsequently juggling of features which are ready or perhaps not wanted yet has to occur. Methods to tackle this include “feature branching”, “branch-by-feature” and “feature switches”. These allow us to switch on or off particular features either by clever use of version control systems or with a code switch to “turn off” code that is not ready. You will hear proponents of both approaches. But it seems to me that whichever approach is used the wrong problem is being addressed in that they are all ways of dealing with excess code inventory be it code that is not ready to put into production at the end of an iteration or code that delivers a feature that is not wanted yet.– there are great tools to help with this to be sure but my point is that if this is needed at all;

(a) There is excess inventory either in the form of unfinished code or in the form of features delivered ahead of when they are wanted or needed

(b) A process has been introduced purely to deal with this excess inventory, which in itself is a form of waste.

Swarming

So what else can we do to deal with this excess inventory? Well the thing here is to eliminate the excess in the first place.

An alternative approach to the pair by pair grabbing of stories, and one which is getting some attention, is to have your entire team “swarm” on a story until its done. This basically means that the whole cross functional team (Devs/QA/BA/Customer) works on the top priority item together breaking down the whole feature into small manageable tasks (often each of these takes less than a couple of hours for a pair to complete)

We have been swarming on stories for quite a while now and I would like to share my observations on some of the benefits we have seen;

  1. Our top priority story gets all the attention this means it gets delivered first and sooner than it would if the team was spread out on other stories we have found that even unusually large stories that we would have previously expected to take most of a two week iteration often get completed in a few days.

  2. The team delivers testable functionality early on in an iteration keeping QA’s busy and revealing bugs quickly

  3. The customer is focused on the item most important to them and has a clear view of the progress being made and what the team is working on

  4. More robust to absence whether planned or unplanned as all of your team knows about the feature being developed we have found for example that there is better pair rotation and BA’s and QA’s get included more frequently in those pairs.

  5. The code delivered ends up coherent as the context switching that can occur with pair rotation and multiple stories in play does not occur.

  6. And of course it encourages small frequent commits and CI

  7. We have found that we rarely need to “turn off” features with this approach at the worst case scenario you have a single story in play at the point you want to release and in practice this has not often occurred but where it has a relatively simple approach with feature toggling has sufficed – its only a single feature switched off and it won’t be in production switched off for long after all as its our top priority story for the next release.

In summary I think if you are finding the need to use advanced version control tools to manage feature branching, or regularly using coded feature toggles to manage multiple “in progress” features you are probably suffering from Muda in your process and I would highly recommend that you give swarming a try after all do you really need that piece of paper or can you just jump?


by thinkfoo at December 18, 2011 08:30 PM

December 11, 2011

More Career Advice

Following on from my last post, Jason Adam Young has some excellent advice to help you continually get better and better and building software and perhaps more importantly, be more and more valuable to whoever you happen to be working for.

by Adrian Sutton at December 11, 2011 05:44 PM

A Non-Blocking ConcurrentHash(Map|Trie)

Introducing ConcurrentHashTrie

While spending some time looking at Clojure's concurrency model, I did a little bit of research into their persistent collection implementation that uses Bagwell's Hash Array Mapped Tries.  After a little bit of thought it occurred to me that you could apply the persistent collection model to an implementation of ConcurrentMap.  The simple solution would be to maintain a reference to a persistent map within a single AtomicReference and apply a compareAndSet (CAS) operation each time an update to the map was performed.  In Clojure terms, a ref and swap! operation.  However a typical use of ConcurrentMap is for a simple cache in a web application scenario.  With most web applications running with fairly significant numbers of threads (e.g. 100's) it's conceivable that the level of contention on a single compare and set operation would cause issues.  Non-blocking concurrent operations like compare and set work really well at avoiding expensive user/kernel space transitions, but tend to break down when the number of mutator threads exceeds the number of available cores*.  In fact thread contention on any concurrent access control structure is a killer for performance.  The java.util.ConcurrentHashMap uses lock-striping as a mechanism to reduce contention on writes.  If absolutely consistency across the ConcurrentMap was sacrificed (more information below), then a form of "CAS striping" could be applied to the ConcurrentHashTrie to reduce contention.  This is implemented by replacing the AtomicReference with an AtomicReferenceArray and using a hash function to index into the array.

The Implementation

There are a number of implementations of the Bagwell Trie in JVM languages, however I couldn't find an implementation in plain old Java, so I wrote one myself.  For the persistent collection it follows a fairly standard polymorphic tree design.

The core type of Node has 3 implementations.  The key one is the BitMappedNode, which handles the vast majority of branch nodes and implements the funky hash code partition and population count operations that make up the Bagwell Trie structure.  LeafNode holds the actual key/value pairs and ListNode is there to handle full collisions.  The key mutator methods of ConcurrentMap: put, putIfAbsent, remove and replace are implemented using CAS operations on the rootTable, which is an AtomicReferenceArray.

As mentioned earlier, the ConcurrentHashTrie is not consistent for operations that occur across the entire structure.  While at first look this seems terrible, in practical terms it is not really an issue.  The 2 main operations this impacts are size and iteration.  The size operation in the ConcurrentHashTrie is not O(1), but is O(n) where n is the size of the rootTable (or the number of stripes).  Because it has to iterate across all of the nodes in the rootTable (it doesn't need to traverse down the trees) and add all of their sizes it possible for the size of a Node to change after it has been read but before the result has been calculated.  Anyone that has worked a concurrent structure before has probably found that the size value is basically useless.  It is only ever indicative, as the size could change right after the method call returns, so locking the entire structure while calculating the size doesn't bring any benefit.  Iteration follows the same pattern.  It is possible that a node could have changed after being iterated over or just before being reached.  However, it doesn't really matter as it is not really any different to the map changing just before iteration started or just after it completes (as long as iteration doesn't break part way through).  Note that Cliff Click's NonBlockingHashMap exhibits similar behaviour during iteration and size operations.

Performance, The Good, The Bad and the... well mostly Bad

Cliff Click kindly included a performance testing tool with his high scale library which I've shamelessly ripped off and used to benchmark my implementation.  Apparently he borrowed some code from Doug Lea to implement it.  I changed a sum total of 1 line.  Writing benchmarks, especially for concurrent code, is very tough (probably harder than writing the collection itself), so borrowing from the experts gives me some confidence that the numbers I produce will be useful.

So onto the results:




Do'h!!

Quite a bit slower than the java.util.ConcurrentHashMap.  I didn't even bother comparing to the NonBlockingHashMap from the high scale library, the numbers would be too embarrassing.

The ConcurrentHashTrie2 is an alternative implementation that I experimented with.  I suspected the polymorphic behaviour of the tree nodes was causing a significant performance hit due to a high number of v-table calls.  The alternate implementation avoids the v-table overhead by packing all three behaviours into a single class (in a slight dirty fashion).  I stole a couple of bits from the level variable to store a discriminator value.  The Node class switches on the discriminator to determine the appropriate behaviour.  As the results show, it didn't help much.  I ran a number of other permutations and the results were largely the same.

Conclusion

So a question remains, is the approach fundamentally flawed or is my code just crap?  I suspect the major cost is caused by heavy amount of memory allocation, copying and churn through the CPU cache caused by the path-copy nature of mutation operations.  Also the read path isn't particularly quick.  With a reasonably full map, reads will likely need to travels a couple of levels down the tree.  With the traditional map, its a single hash and index operation.

* I really need a proper citation for this.  However imagine 16 cores and a 100 threads trying to apply a compare and set operation to the same reference.  Unlike a lock which will park the threads that fail to acquire the lock, non-blocking algorithms require the thread that fails to apply its change to discard its result and recompute its result, effectively spinning until is succeeds.  With more CPU bound threads than cores, its possible that the system will end up thrashing.

by Michael Barker (noreply@blogger.com) at December 11, 2011 03:10 PM

December 09, 2011

Don’t Feature Branch

I recently attended the Devoxx conference. One of the speakers was talking on a topic close to my heart, Continuous Delivery. His presentation was essentially a tools demonstration, but one of the significant themes of his presentation was the use … Continue reading

December 09, 2011 03:10 AM