Manuel Aldana http://www.aldana-online.de Software Engineering: blog & .lessons_learned Wed, 30 May 2018 18:51:07 +0000 http://wordpress.org/?v=2.9.2 en hourly 1 Notes on DSVGO / GDPR regarding comment function http://www.aldana-online.de/2018/05/30/notes-on-dsvgo-gdpr-regarding-comment-function/ http://www.aldana-online.de/2018/05/30/notes-on-dsvgo-gdpr-regarding-comment-function/#comments Wed, 30 May 2018 18:48:50 +0000 manuel aldana http://www.aldana-online.de/?p=352 Hi there,

my blog posts are quite old, still think / hope they might be helpful. Therefore I am right now not shutting down the blog. Still to be DSVGO compliant I removed all comments and disabled comment function. Therefore unfortunately good discussion threads are vanished now, a big sorry to all commenters about that! But costs to be DSVGO compliant would have been just too high looking at my available spare time :(

In case you want to get in touch with just contact me directly.

In the long run I will very likely transfer the content to a blog-posts hoster.

]]>
http://www.aldana-online.de/2018/05/30/notes-on-dsvgo-gdpr-regarding-comment-function/feed/ 0
The evil path of Mobile-Apps (vs. Webstandards)… http://www.aldana-online.de/2012/01/26/the-evil-path-of-mobile-apps-vs-webstandards/ http://www.aldana-online.de/2012/01/26/the-evil-path-of-mobile-apps-vs-webstandards/#comments Thu, 26 Jan 2012 00:06:13 +0000 manuel aldana http://www.aldana-online.de/?p=341 The so called Apps are the usual End-User applications running locally on Smartphones and Tablets (similar to Desktop applications). From usability point and hipness factor they offer great moments. But as a Software Developer I am extremely skeptical… Well, what is wrong with all these Apps?

Step backwards from Web-Standards

I think web-standards like HTML, JavaScript, CSS are key technologies for the Internet. The Webapp-Provider can develop in a very agile way as the application is released on servers and ready to be accessed by Webbrowser. The User on the other hand doesn’t need to install anything, the only “application-gate” is the Webbrowser, simply try out a website, toss it away or revisit. No installation/removal necessary. The URL is your application-hook.
On the Mobile App side big step backwards: I now face the pain that every website has its own app which basically doubles content. I need to go through the annoying search/install/upgrade/removal path to try out things. Instead of keeping simple URL-Bookmarks for several websites the Desktop now is packed with tons of distracting App-Icons and alarming Upgrading Notifications.

High Implementation Effort

On top of your HTML/Javascript based Webapplication and optional HTTP API you need to add effort to more apps. Your whole application setup gets fragmented and increases maintenance and developing efforts to a big extent. Today 2 major platforms exist (Android, iOS) and it seems that Windows Mobile platform will follow. Yes, on the Webapplication side you indirectly also need to support multiple Browsers (Safara, FF, IE, Chrome etc.) but this effort is much less as maintaining desktop-lik Mobile Apps.

App-Store Drawbacks

Smartphone Apps are typically distributed through App-Stores (like Apple’s App-Store or Google’s Android Market). These App-Stores and their review process do have their pros: They act as a single entry point for end-users, which can search/browse and rate apps which is really comfortable. Also the review-process and policies can kind of enforce a style guide which can positively influence usabilty. Besides malicious apps are easier to filter and kick-out from App-Store. On top it seems that users tend to be more willing to pay for certain apps in contrast to pay for a website content, which is good for the App-Providers.

Never the less there are a lot of disadvantages which makes Software Development for Mobile-Apps tough:

Missing/Intransparent Auto-Upgrade

It can happen that your current installed app is inconsistent to the latest released one (I see this through circle icon top-left of app or App-Store icon) and isn’t upgraded on the fly. This causes headaches both on user and App-Provider side. The user is annoyed because he/she need to actively upgrade App-versions all the time and at some point will simply not do it. The App-Provider has to invest a lot of developing and testing resources to make all the backend (like APIs) compatible to very old App-releases as there is no guarantee that very old Apps aren’t “out there” any longer. Down compatiblity and supporting multiple application versions is in my view one of the biggest cost drivers in Software Industry.

Inefficient Packaging

When upgrading an App the whole binary package needs to be downloaded. This is extremely inefficient as usually between releases only a smaller part of App changes (source-code diff). Especially in areas where bandwith isn’t good downloading a 5MB upgrade is a big pain. A more sophisticated packaging build tool which makes binary diffs possible would ease a lot and come hand in hand with an Auto-Upgrade feature. To me it is a mistery why none of current App-Stores have such a feature…

Release/Rollout delay

Due to the app-review process there is a delay between your internal approval of app and the final downloadable app inside App-Store. This usual takes a week but also can take longer (e.g. during peak times when many app-providers are offering new apps or versions at the same time). Thinking of being agile and releasing software often (see Continous Delivery) this is a major drawback. You have to invest much more effort on testing your app-package as major production issues could cause an app being unusable without possibilty to react quickly as major bug fix releases need to go through review process again. In such an error-risk environment changes are done much more defensively and you also will meet all the other disadvantages of NOT RELEASING OFTEN.

Distribution dependency

App-Store owners (Apple, Google) have full control whether an app is available or not. Apple even has policies disallowing apps which are implementing similar functions as preinstalled Apple ones (Email-Client, Webbrowser). You are dependent on the good will on the reviewers. These hard restriction aren’t found for typical Desktop or webapplications. In Desktop case you simply directly a bundled application-package. As webapplication provider you simply rollout you app and let users find you by the “application”-gate the Webbrowser.

Future-Hope

My bet and hope (esp. as a Software Developer) is the HTML 5 standard + Responsive-Web-Design movement. It keeps the highly flexible Webapplication oriented approach and offers a single point for both Desktop Webbrowsers and Mobile devices (frontend is “adapting” to end-device). My gut-feeling is undermined as big web-player Google also seems to go this way, e.g. there are hardly any dedicated Google Mobile Apps rather they try to tackle the Mobile Usability problem directly on the Webapplication side.

]]>
http://www.aldana-online.de/2012/01/26/the-evil-path-of-mobile-apps-vs-webstandards/feed/ 0
Complexity drivers of Software-Systems http://www.aldana-online.de/2011/09/30/complexity-drivers-of-software-systems/ http://www.aldana-online.de/2011/09/30/complexity-drivers-of-software-systems/#comments Fri, 30 Sep 2011 15:24:38 +0000 manuel aldana http://www.aldana-online.de/?p=332 More complex software-systems correlate with higher lead-time (time-to-market of initial idea to user-available software) and fragility. They also tend to have negative influence on usability. Therefore it must be a goal to reduce following complexity factors to a lowest possible degree.

Codebase size

Independant of what the codebase does it incorporates maintenance-efforts: On big codebases it takes longer to implement changes, more code needs to be read/comprehended. Codebase size also correlates with longer builds, deployment processes and startup times, which means a latency-increase of feedback loops. Encapsulation/Modularizaton can be applied, but it only weakens the impact: You won’t be able to reduce efforts of a 100M codebase compared to a 10K one by just having good code-quality.

Codebase quality

Decent code-quality is important for implementing changes quickly (readable code, separation of concerns, DRY etc.). Good quality also includes test-coverage so regression-bugs are better caught and there is a safety-net during refactorings. Though business often neglects less visibile code-quality, it is essential to reduce lead-time. In severe cases you’re at a dead end: code is such in a bad shape, that it is impossible to evolve (changes are too side-effect-risky and/or too expensive).

Tools/technology diversity

Itself diversity of tools (programming-languages, frameworks, hardware etc.) is good because you can choose the right one to solve your specific problem. But it also increases risk: single-person-know-how, legacy-tech, learning-curves, beta/buggy-stability, upgrade/patching-efforts, transitive dependencies (see .dll, .jar nightmares).

Integration points

Though distributed systems are necessary (scaleability, reliability, modularization, partner-integrations) they are more fragile: Monitoring and deployment efforts increase and security breaches are more likely. Also tracing, debugging and testing efforts are higher.

Organization Size

One of the biggest factors is the size of your organization because efforts increase squarely (see also Mythical Man Month). Bigger organizations try to fight this by introducing hierachies and heavy-weight processes, but this structure can have negative impact on “short/quick” decisions and slows down speed. In some scenarios political-games emerge, which block progress considerably.

External partners

External-partner-integrations (most likely over APIs, batch processing) need more investment: Different release cycles need to be taken into account, compatibility must be offered and dedicated monitoring needs to be setup. Also communication is less direct (fixed calls, meetings, travelling).

User-Base + Traffic

A bigger User-Base means you have to spend more effort on support. Because of the mass statistically more edge-cases come up and need to be handled by software. Also scalability requirements need to be implemented by more sophisticated production environments. Popular systems also attract criminal activity, which you need to respond with higher security-investments, which again make the system less usable. Big applications also produce more data, which needs to be maintained (compatibility, migrations, analyzation).

Being Complexity aware

Above factors cannot be reduced to Zero (you will always meet an Inherent Complexity), but you should fight back:

  • Prioritize, prioritize, prioritize: Featuritis has bad impact on usability and codebase-size. Implement important features only. Remove unneeded features and cleanup code.
  • Consolidate technologies: Don’t introduce a new technology just because it was praised in the last magazine. Evaluate it and maybe use it privately first.
  • Take Fowler’s advice serious: “Don’t distribute, if you don’t have to”
  • Don’t always increase team-size, rather have a small team with A-players.
  • Quantify: Use metrics for business success, code-quality and production stability. Especially for bigger systems gut-feeling isn’t enough.
]]>
http://www.aldana-online.de/2011/09/30/complexity-drivers-of-software-systems/feed/ 0
Accessing rrdtool-files Data with Java / Scala http://www.aldana-online.de/2011/07/03/accessing-rrdtool-files-data-with-java/ http://www.aldana-online.de/2011/07/03/accessing-rrdtool-files-data-with-java/#comments Sun, 03 Jul 2011 10:52:36 +0000 manuel aldana http://www.aldana-online.de/?p=315 Though rrdtool is widely used especially for monitoring, it took me a while to find a simple and compatible bridge for reading data of rrd-files with Java. Following hopefully helps others to save some time and to workaround some pitfalls.

My requirement was:

  • Monitoring website showing trends of metrics, i.e. compare current with past values, both absolute and relative (hour over hour, day over day, etc.).
  • Data backed is round-robin-database. The rrd-files are generated by munin (with under the hood rrdtool). Data should be accessed read-only.
  • Website’s technology stack is Java based (Play! framework with Scala integration).

Because I use Play! framework I needed a way to access rrd-files with Java-Technology. My first look at rrd4j looked promising but failed, because rrd4j is a port and cannot read rrd-files created by original rrdtool. After some time-consuming research and Google-digging I finally stepped over java-rrd.

Installation Steps java-rrd

I couldn’t find the library built and distributed in any Maven-repository, so you have to download + build yourself:

# download + unpack
wget http://oss.stamfest.net/java-rrd-hg/archive/tip.tar.gz
tar xfz <downloaded-tarball.tar.gz>
# build .jar library with Maven
cd <untarred-directory>
mvn package
cp target/*.jar <your-target-lib-folder>

Instead of dump-copying and to be more clean with your build-system you might want to deploy the .jar file to a repository like Nexus.

Read-Access Usage java-rrd

As I use Play! framework with Scala integration, Scala was the integration way:

import net.stamfest.rrd._
….
val rrd = new RRDp(“/tmp”, “55555″)
val command = Array(“fetch”, “your-rrd-file.rrd”, “MAX”, “-r”, “1800″, “-s”, “-1d”)
val result = rrd.command(command)

if (!result.ok)
  println(result.error)
else
  println(result.output)

For completeness another snippet in Java source language:

import net.stamfest.rrd.CommandResult;
import net.stamfest.rrd.RRDp;
….
RRDp rrd = new RRDp(“/tmp”, “55555″);
String[] command = {“fetch”, “your-rrd-file.rrd”, “MAX”, “-r”, “1800″, “-s”, “-1d”};
CommandResult result = rrd.command(command);

if (!result.ok)
    System.out.println(result.error);
else
    System.out.println(result.output);

With java-rrd you can also access rrdtool over sockets/network.

Parsing rrdtool output

rrdtool output after fetch is plain text, something like:

        speed


920804700: nan
920805000: 4.0000000000e+02
920805300: 2.0000000000e+03
920805600: 0.0000000000e+00
920806800: 3.3333333333e+01

Above needs to be processed further to be “structured enough” for your code. For example do following (I only show Scala, I skipped Java iterating syntax hell…, did I mention that I love passing functions ;):

def outputToNumberPairs(rrdFetchOutput: String) = {
    // filtering unneeded + empty lines
    val list = rrdFetchOutput.trim().split(\n).filter((n) => n.contains(“:”) && !n.contains(“nan”))
    // parsing strings to numeric values and combine to pair
    for (i <- list) yield i.split(“:”)(0).trim().toLong -> i.split(“:”)(1).trim().toFloat.round
  }
GIVES BACK:
===
res34: Array[(Long, Int)] = Array((920805000,400), (920805300,2000), (920805600,0), (920806800,33))
]]>
http://www.aldana-online.de/2011/07/03/accessing-rrdtool-files-data-with-java/feed/ 0
‘Embrace Failure’ approaches in Software Development http://www.aldana-online.de/2011/05/15/embracing-failure-softwareapproaches/ http://www.aldana-online.de/2011/05/15/embracing-failure-softwareapproaches/#comments Sun, 15 May 2011 17:22:22 +0000 manuel aldana http://www.aldana-online.de/?p=301 Software-Developers (and humans generally) must not try to be 100% perfect but should take the imperfection as granted and “embrace” it. Following gives some thoughts on how to approach the ‘Embracing failure’ principle in the field of Software Development.

Log thoughtfully

Proper and insightful logging cannot be overstated, you will praise the developer who provides good logs, i.e. you can easily search for problem-symptoms inside logs. Inside team find an appropriate logging convention for log-patterns + semantics of level. Some short and abbreviated ideas:

  • DEBUG: Detailed information while going through integration points (like printing payload of XML for api call). Quantified information like statistics while refreshing/invalidating cache.
  • INFO: Bootstrapping/startup information, initialization routines, reoccurring asynchronous actions (including elapsed time).
  • WARN: Logging suspicious/errornous application state, where fallback is still possible and routine/call can be proceeded.
  • ERROR: No fallback possible. For the user/customer this most likely means a crashed call/request (in HTTP jargon this would be a 500 status). Examples for such failures could be NullPointers or database unavailability.

NEVER EVER swallow away Exceptions, I’ve seen several empty catch{} blocks which later caused major headaches during production, on user-level the system crashed on many calls but nothing could be seen inside logs. At topmost application layer log uncaught Exceptions as ERROR.

Also have a look at a longer blog-entry about appropriate logging.

Make deployment + upgrade easy

When a critical failure occurs you want to be able to react quickly. Reaction-time lasts between the time of knowing the issue until the deployment is finished and the customers sees the change. This includes getting the code (thank a fast SCM), understanding the problem, fixing/extending the code, verifying change on local and/or qa-environment and finally push to production. For building deployment package and executing deployment automate and speed up as much as possible.

If you are lucky your app is a typical browser-server based webapp. It has the highest potential regarding instant deployment: As soon as the new app version has been rolled out to your server machine (or server farm, in case you scaled out) the upgrade is complete and will be instantly seen inside the client’s browser. Yes, webapp deployments have their own pitfalls (high traffic site’s rollout isn’t atomic, cache settings, browser state inside DOM or cookies), but other application classes like embedded-systems, desktop-app or mobile-apps have much tougher deployment requirements and upgrade process is more complicated. But Google’s Chrome Webbrowser shows that even for these applications a well designed and transparent upgrade mechanism is possible.

Monitor your application

At some point the system will fail, therefore you have to focus on notfications of respective problems. A typical and annoying scenario is a ‘false negative’ production system: You think everything is alright but in fact the production behaviour is unstable and unreliable. Unstable production behaviour can be infrastructure or application related. Infrastructure problems include network problems, disk crashes, external-system/integration-points downtime or not enough capacity/resources to handle the load. Application related are more “real” usual application bugs like unchecked null-references, dead-locks, infinity-loops, thread-unsafety or simply wrong implemented requirements.

To sense + sanity-check such problems you need system-hooks which deliver data (e.g. Servlet-Container built-in monitoring, JMX). Sometimes you will also write little extension to fulfill a specific monitoring requirement (e.g. ERROR log counter). Apart from data-provision you also need an entity which processes the numbers. To not hire monkeys you should use existing tools. For instance munin + rdd-tool tracks the data over time and plots nice graphs and Nagios alerts you actively in case of a peak of a metric and a potential incident.

Make it visible

We as humans are also far from perfect at number crunching and intepreting masses of detailed numeric data. Often application “health” degrades over time so you should deliver comparison of ‘now’ to historic data (like hour over hour, day over day, week over week). Try to abstract away numbers behind meaningful graphs or charts. The signal light analogy (green, orange, red) also helps a lot to classify the criticality of numbers.

Enabling live diagnostics

Sometimes app behaviour isn’t well enough understood and you won’t be able to reproduce a scenario in local isolated environment. You then have to try to get more insight of production system. In case resources are the problem (like CPU, memory) several profilers support live production machine-instance attachements to get some metrics like running threads, heap-stats or garbage collector info.

If you need more semantically application insights and followed log conventions, you should also make it possible to enable DEBUG or TRACE logs for a certain code sections. Make your logging-levels configurable during runtime. If you have to restart your app when changing log-settings you can wipe away the current application’s state you want to analyse (issues are often not reproducible straight after restart).

Another helpful option is live debugging, where you debug remotely and direcetly on the problem-server instance. You can directly sense the concrete problem as is and don’t need to waste effort for local reproduction. One of your users will have bad luck and will hit a connection timeout, but the request anyway would have caused an error page.

Use Users/Customers as feedback provider

Even if you are running under a diagnostic friendly webapp (server is under app-provider’s control) you may end-up with very weird cases, which happen on the client’s side. Therefore make it very easy for customers to give you feedback, either provide free-phone-contact or a easy to find feedback form. In case you don’t run as webapp an option would also be that in case of an error the user just needs to click a button and some diagnostic data is sent to the server (I remember, when reporting a bug Ubuntu OS fetched a good deal of OS-configuration-data and attached it to the bug-report). Some embedded systems don’t need user action but automatically dump application-state to a persistent storage before a reset. This data then can be retrieved later from a mechanic inside a car repair-shop.

Conclusion

Your production system will fail at some point. You can try to fight this by getting your QA as close as possible to production and invest A LOT into testing. Still you have to find the sweet spot of cost/benefit ratio, because this effort increases exponentially (effort of providing same hardware, 100% test-coverage + test-data, simulate real-user behaviour, etc.). There are software-system classes where this effort must be pushed to the limits like in money + life critical intense systems, but most applications we are building aren’t that sensitive where damage cannot be tolerated or reversed. Therefore build in as many diagnostics as sensible and have good tooling + process around a quick alert/fix/deploy roundtrip.

]]>
http://www.aldana-online.de/2011/05/15/embracing-failure-softwareapproaches/feed/ 0
Top-Tools for Log-Analysis (CLI based) http://www.aldana-online.de/2011/03/13/top-tools-for-logging-analysis-cli-based/ http://www.aldana-online.de/2011/03/13/top-tools-for-logging-analysis-cli-based/#comments Sun, 13 Mar 2011 11:21:55 +0000 manuel aldana http://www.aldana-online.de/?p=269 After all the years it is still remarkable how powerful plain text is for software development (human readability, advanced SCM tools, expressiveness, no GUI necessary, etc.). This applies also on logging, where plain text makes a very nice source for doing adhoc analysis. Following samples show how bash, its interprocess-communication piping and the usual suspect tools (awk, grep, sed, tr, etc.) give you great control when doing analysis.

Format

Logs should have a standard format, so analysis can rely on structured output and operate on it, programs are less flexible as humans and need strict patterns. Most often such logs are line-based and white-space separate based. Fortunately most logging frameworks make this a non brainer (e.g apache log-format settings, log4j).

Usual log-pattern, rows/lines form single datasets and are most often ordered by time, because they are appended by the software over time, they often contain also timestamps. The lines most often are independant of each other, but for each line data needs to follow strict order (apart from loose text like error-messages or furhter stack traces):

dataset1(line1): data1 data2 data3
dataset2(line2): data1 data2 data3
dataset3(line3): data1 data2 data3

To better understand examples I refer to apache-logging pattern (HTTP based web-traffic). Still all the example adhoc queries can be applied to all other log-output target/formats:


127.0.0.1 – - [29/Feb/2011:13:58:50 +0100] “GET /api/image/status/ok HTTP/1.1″ 200 476 “-” “Jakarta Commons-HttpClient/3.1″

Log analysis in Action

Quick overview for further analysis steps:

# using ‘less’ so reading longer output is easier
grep ” 200 “ log-file | less

# only direct stuff to ‘less’ which is of interest, in this case show relative URI path (7th position)
grep ” 200 “ log-file | awk ‘{print $7}’ | less

# plain text is good candidate for compressing therefore on production
# environment files are often zipped to save disk space, therefore use zgrep
zgrep * | awk ‘{print $7}’ | less

Counts:

# count of HTTP calls which had 404 status
grep -c ” 404 “ log-file
# more verbose equivalent
cat log-file | grep ” 404 “ | wc -l

# count of all other but 404 HTTP status (inverse grep)
grep -v -c ” 404 “ log-file

# piping (filtering step-by-step)
# showing number of failed (non 201 status) image uploads (POST)
cat log-file | grep -v ” 201 “ | grep POST | grep image | wc -l

Filtering:

# spit out all respone times of HTTP 200 status
# response time is whitespace-separated on 10th position
grep ” 200 “ log-file | awk ‘{print $10}’

# show number of  unique called URLs (stand on 7th position)
cat log-file | awk ‘{print $7}’ | uniq

# show slowest response time
grep ” 200 “ log-file | awk ‘{print $10}’ | sort -n | tail -n1
# show fastest response time
grep ” 200 “ log-file | awk ‘{print $10}’ | sort -n | head -n1

# filter lines ‘from,to’ (e.g. analyse critical time section)
cat log-file | sed -n ‘20,30′

Transformations:

# chain response times with ‘,’ instead of linebreaks ‘\n’
grep ” 200 “ log-file | awk ‘{print $10}’ | tr -d \n “,”

# delete/omit lines 10-20
cat log-file | awk ‘{print $10}’ | sed ‘20,30d’
# slightly shorter
awk ‘{print $10}’ <  log-file | sed ‘20,30d’

Side-Note: While running adhoc log-files queries, often it is enough to go the pragmatic way, e.g. above I used the ” 404 ” in grep command which is not correct (404 could occur inside the response time part also). Never the less it is quicker and shorter to write and collision is unlikely.

Server farm approach (horizontal scaled)

How do you run log-analysis in server-farm environments (lots of machines are used to scale up horizontally)? In this situation logs are distributed around on each machine’s log directory. An option here is to have a dedicated more storage focused server which runs a batch/cron-job over night copying all the server logs to a single place. This of course is practicable if the files aren’t too huge and can be processed in reasonable time by single script. If this is not the case you may want to use a more scalable sophisticated solution (like MapReduce with Hadoop). Another alternative is to analyse log on single machine and extrapolate. Most likely you use a load-balancer which causes the traffic to be distributed equally on each machine, i.e. running the same log-query on each machine would give back similar results. The big advantage of the single machine is simplicity: no infrastructure necessary (terminal and log-file sufficient), commands are simple one-liners and speed (only one machine log-file needs to be analysed).

Summary

The CLI triangle (bash, its utilities, piping) form a very convenient ad-hoc query-interface for logs. At first glance they look a bit cryptic, but you get used to it very quickly. Also you already achieve a lot by only knowing a small subset of functionality (e.g. I use only a minimum subset of ‘awk’ and ’sed’).

Am happy about any comments/hints of other lightweight tooling for log-analysis.

]]>
http://www.aldana-online.de/2011/03/13/top-tools-for-logging-analysis-cli-based/feed/ 0
Unit-Testing: Situations when NOT to do it http://www.aldana-online.de/2011/02/06/major-unit-testing-pitfalls-and-anti-patterns/ http://www.aldana-online.de/2011/02/06/major-unit-testing-pitfalls-and-anti-patterns/#comments Sun, 06 Feb 2011 11:45:03 +0000 manuel aldana http://www.aldana-online.de/?p=262 I am a big fan and practioner of automated unit-testing, but throughout the years I took my lessons. Starting with “everything has to be automated tested” throughout years I experienced situations where doing unit-testing is not optimum approach.

The presented sections go along with my favorite test-smells:

  1. Brittle tests: Though functionality hasn’t been changed the test fails. Test should show green but in fact shows red (false positive).
  2. Inefficient tests: The effort of writing automated tests doesn’t pay out at all. The benefit/cost ratio (short + long term) is extremely low.

Unit-Test little scripts/tools

There is often no sense to write unit-tests for little scripts or tools, which are one or two-liners. The script content is already so “declaritive”, short and compact that the code is too simple to break. Further more often stubbing or mocking the dependencies is tough (e.g. writing to stdout/file, shutdown machine, doing an HTTP call). You can end up writing a external system emulator which is overkill in this situation. Surely testing is important but for that I go the manual way (executing script, and smoke-test sanity check the outcome).

Unit-Test high level orchestration services

Orchestration services have many dependencies and chain-call lower services. The effort of writing such unit-tests is very high: Stubbing/Mocking all these outgoing dependencies is tough, test setup logic can get very complex and make your test-code hard to read and understand. Further more these tests tend to be very brittle, e.g. minor refactoring changes to production code will break them. Main reason is that inside test-code you have to put a lot of implementation detail knowledge to make stubbing/mocking work. You can argue having many fan-out/outgoing dependencies is a bad smell and you should refactor from start on. This is true in some cases but higher order service often have the nature to orchestrate lower ones, so refactoring won’t change/simplify much and make design even more complicated. In the end for such high level services I much prefer to let test-cover them by automated or non-automated acceptance tests.

Test-first during unclear Macro-Design

When implementing a feature or something from scratch often the macro-design is blurry, I like to call this “diving-in”. For diving-in development or quick prototyping you get a feeling which design fits or not. During this phase class structures/interactions change a lot, sometimes even big chunks of code are thrown away and you restart again. Such wide code changes/deletions often will break your tests and you have to adapt or even delete them. In these situations test-first approach doesn’t work for me, writing test-code even distracts me and slows me down. Yes, unit-tests and test-first approach can and should guide your design but I experienced this counts more when the bigger design decisions have been settled.

100% Code-Coverage

I can’t overstate this: Code-Coverage != Test-Coverage. The Code-Coverage of unit-tests is a nice metric to see untested spots, but it is by far not enough. It just tells you that the code has been executed and simply misses the assert part of your test. Without proper asserts, which check the side-effect of your production code and expected behaviour, the test gives zero value. You can reach 100% code-coverage without having tested anything at all. In the end this wrong feeling of security is much worse as having no test at all! Further more 100% code-coverage is inefficient because you will test a lot of code, which is “too simple to break” (e.g. getters/setters, simple constructors + factory-methods).

Summary

Above points shouldn’t give you the impression that I do speak against automated unit-tests, I think they are great: They can guide you to write incremental changes and help you to focus, you get affinity for green colors ;), they are cheap to execute and regression testing gives you security of not breaking things and more courage to refactor. Still going with the attitude that you have to go for 100% code-coverage and to test every code snippet will kill testing culture and end up in Red-Green color blindness.

]]>
http://www.aldana-online.de/2011/02/06/major-unit-testing-pitfalls-and-anti-patterns/feed/ 0
Static typed programming/languages won’t die! http://www.aldana-online.de/2011/01/09/static-typed-programminglanguages-wont-die/ http://www.aldana-online.de/2011/01/09/static-typed-programminglanguages-wont-die/#comments Sun, 09 Jan 2011 18:27:44 +0000 manuel aldana http://www.aldana-online.de/?p=253 In recent years dynamic typed languages like Ruby, Groovy, JavaScript or Python rightly gained more popularity. Some even said they will soon replace their static typed counterparts. Though I am a big fan of dynamic typed and intepreted languages (for smaller tools/tasks they make life so much easier) my current bet is that the language-future it is not a question of either/or but a gain of diversity of your toolset. Static typed languages just have this big diverse plus when it comes to maintainability of big codebases.

Overview of differences

Because there still is some confusion about dynamic vs. static typing here a short overview of my perception:

  1. Dynamic typed programming: The types or variables are bound during runtime. This happens if you use languages which have dynamic typing in their core design (PHP, Ruby, Python). Be aware that you can still do dynamic typing with languages focusing more on static typing (like C# or Java): Simply have a look at the reflection APIs… Further more type saftey is deferred to runtime, if a programmer introduces typing errors they will pop up during runtime (e.g. ClassCastException, MethodNotFoundException, TypeError).
  2. Static typed programming: The types are resolved/bound during compile time. For this check to be possible your code needs to have typing information (e.g. when defining a variable or method signature). If you have obvious type errors in your code the compilation result will catch it and inform you.

Pros dynamic typing

If designed correctly into the language, dynamic typing offers a lot of advantages:

  • Code is much more compact. Small tools and scripts are a charm to write and some nice syntactic sugar constructs are possible. Programmers need less time to understand and extend code. Further more the proportion of functionality vs. lines-of-code is greater, you need to produce less code to solve a problem. Smaller codebases are a plus for maintenance.
  • Sometimes dynamic typing is a necessity of solving a problem (e.g. the calculated return value is dependant on the parameter’s type, which can differ for every caller).
  • Development feedback cycle tends to be shorter. Most languages offering dynamic typing are intepreted and a simple ‘Save file’ will do. In most cases you instantly will see the effect. For bigger server applications less restarts of the server are necessary.

Pros static typing: Maintainability

Above I mentioned compactness of code as a maintenance plus for dynamic typing. Still, on the same topic static typing scores significantly.

Analyzation of codebases

Typing information bloats code but also serves documentation. In many cases explicit typing helps a lot in reasoning the code’s semantics. Also the IDE can help to easily jump throughout the codebase. It doesn’t have to reason or guess but can directly guide you to the target code. Explicit typing information also does help tools like Structure101 or JDepend to automatically show you relationships, which can point to design or layering flaws.

Feedback of errors

You get faster feedback when you did a typing related coding-error, e.g. wrongly related a function/method or did a typo. The compliation phase does this for granted. Some argue that you anyway should do cover anything by automated tests and therefore can instantly catch typing related programming errors. This theoretically is true but in practice doesn’t hold, I’ve never seen convincing test suites which execute every code-path instantly uncovering such errors. Often you get such errors later by accident, in testing phase or worse during production. Compile time type safety is nothing to be neglected…

Safe refactorings/tooling

It is the “nature” of dynamic typing that you can’t find out typing just looking at the structure of the code. The opposite is true for static typing, therefore you can create tools for safe and automated refactorings (e.g. Rename method/function, Add/Remove parameters, general restructuring of classes/modules). If many refactorings wouldn’t be automated I would lose a lot(!) of time. Being able to refactor codebases in an efficient sensible time is a must have for me. The greater the codebase the greater the cost for unsupported automated/unsafe refactorings. Some people again argue that 100% test-coverage safety-net would suffice, for my point of view see above ;)

I purposely discarded the old performance comparison. It states that compiled binary-code is faster as interpreted code. It may have been true in older times, but nowadays, also with all the JIT compiling features, I haven’t seen convincing benchmarks that static typed languages binaries are better in performance in general. This doesn’t imply that you won’t hit scaling problems which are indeed bound to a language (see also podcast about facebook getting problems with PHP at some point)!

Summary

Dynamic typed languages are great but for bigger codebases I still would go for languages like Java or C# (haven’t tried out yet, but Scala seems to be a new alternative here). For very specific modules solving highly generic/dynamic problems you still can code in the “right” language. Modern runtime environments (like JVM or CLR) give you possibility to deploy common-libraries/APIs, which are written in dynamic typed language.

If you have case studies or other resources investigating dynamic vs. static typing and overall productivity or providing different opinions please comment.

]]>
http://www.aldana-online.de/2011/01/09/static-typed-programminglanguages-wont-die/feed/ 0
IntelliJ IDEA rocks (revisited)! http://www.aldana-online.de/2010/12/12/intellij-idea-rocks-revisted-for-10/ http://www.aldana-online.de/2010/12/12/intellij-idea-rocks-revisted-for-10/#comments Sun, 12 Dec 2010 11:41:32 +0000 manuel aldana http://www.aldana-online.de/?p=235 Many people ask me why I prefer IntelliJ over any other IDE. I switched to IntelliJ about 3 years ago so I cannot compare current IntelliJ 10 vs. other current IDEs Eclipse 3.6 or Netbeans 6.9. Still pair programming with colleagues who use different IDE and sometimes having to skip back to Eclipse I feel confirmed that IntelliJ is best IDE on market (my opinion is primarily based on Java-Apps). IntelliJ has recently released version 10 and improved a lot of things.

Performance

IntelliJ strikes in performance. It keeps all the files in an index. Access to and searching through all files is extremely fast, also compilation is instant (you don’t even sense it). Only the initial indexing process sometimes feels a bit slow, but version 10 shows big performance improvements on the initial index-process. I subjectively think that version 10 feels faster and UI is better responding.

Automatic in-memory compilation

Whereas for Eclipse you have to do a manual for saving a file, IntelliJ is doing it for you. Some people say, that this is “just” another shortcut to press, but I remember that it was a big relief not to do it. It really makes your fingers faster to go on with another task.

Auto-Completion + Intentions

The autocompletions and intentions are very clever (general code improvement, refactorings, type-completion, varables names, refactorings etc.). I sometimes feel that they read my mind. Since version 10 another major improvement got live: Instant auto-completion, you get suggestions as you type. I like this very much on Microsoft Visual-Studio IDEs, now finally there for IntelliJ.

Refactoring Support

Codebases should be continously improved. Structural improvements impose a risk that you break code, therefore automatic safe refactorings are extremely important. Here IntelliJ has the best toolset. It also plays nice with SCM support, when moving around files or renaming packages.

Prepackaged tool support

On other IDEs you have to install many plugins manually. On IntelliJ the most important ones are already there (maven, nearly alls SCMs) and are integrated well. I can’t remember one case where plugins got conflicted with each other.

The small things…

IntelliJ shows many little gimmicks, which put together make “the” big difference. Here an excerpt:

  • Run Unit-Tests on package basis (package focus in Project Window and <Shift>+<Ctrl>+F10)
  • Instant code execution on breakpoint (Debug-Window <Alt>+F8)
  • Instant copy file path so you can quickly jump to path on command-line (focus file/directory and <Ctrl>+C)
  • File comparisons: Excellent Diff-View. Compare with clipboard. Compare with other branch.
  • Coloring of files on tab and lines inside editor if they where changed by you without having committed. Instant notification inside editor, when file got out of synch with SCM repository (shows the commit-time and author of change).
  • Show SCM history of selection/marked codelines.
  • Working with resource-bundles. Inside code hover over a message-key and it shows you the translation instantly (like foo.bar.logout would give you little text-box “Logout”). Also refactoring the message-keys is safe (messages.properties gets upated).
  • Quick jump to Run/Debug settings (<Alt>+<Shift>+F10).
  • Automatic Code quality checks + report before SCM commit.
  • Intention ‘Create Test-Class’
  • Automatic files refresh. When switching to command line and doing SCM or maven actions, switching to IntelliJ back all files are refreshed automatically. No danger of stale data inside IDE.
  • General search for plain text or structural search.
  • Auto collapsing tool windows on losing focus. Very convenient on smaller notebook screens or generally increasing editor space.
  • Stable editor, even for very large files, e.g. it can show 5MB large XML-docs and even diffs between them (Eclipse always crashed here).
  • etc. (list goes on forever) ….

Minor annoyances

Of course with the praise from above there are still some drawbacks. I often had problems with SCM merging facilities (especially subversion), I now always do merging on command-line. When upgrading or changing plugins restart has to be done manually (at least on Linux version). Also some intentions could be added (when adding @Override above a method <Ctrl>+Enter, Pull-Up method, Extract Superclass or Introduce Interface should be suggested). For other IDE converts the different types of autocompletions are a bit confusing (<Ctrl>+Space, <Ctrl>+<Shift>+Space, <Ctrl>+<Shift>+<Alt>+Space).

Price considerations

The Community Edition is free. For Ultimate Edition which I use you have to pay some money, but regarding the productivity boosts this is simply peanuts. Budget people, please do the math: Depending New User vs. Upgrade (~1.50EUR vs. ~0.75EUR per day) how much does a developer cost an hour? Apart from making the developer happier you will also save money if only a fraction of the IDE related idle/waiting time is reduced.

]]>
http://www.aldana-online.de/2010/12/12/intellij-idea-rocks-revisted-for-10/feed/ 0
Top 4 Software-Metrics Antipatterns http://www.aldana-online.de/2010/11/14/top-4-software-metrics-antipatterns/ http://www.aldana-online.de/2010/11/14/top-4-software-metrics-antipatterns/#comments Sun, 14 Nov 2010 18:14:37 +0000 manuel aldana http://www.aldana-online.de/?p=227 Metrics are a way to quantify a specific view of a system. They occur in several areas like in source-code (e.g. LOC), process (e.g. number of production issues) or business (e.g. website page-views). Followings lists my “most-favorite” Metrics Antipatterns.

1. Wrong target audience

Metrics don’t act as a feedback cycle for the people who produced the results, but merely end up in top-management number crunching reports. These bottom-up-only metrics don’t add value because they don’t improve the work of the people who “produce” the results. Also unrelated metrics are presented, why should top management be interested in test-coverage? Test-coverage is important, but the related metric “number of regression bugs” would be much more helpful for management audience.

2. No Transparency/Agreement

Team doesn’t understand or hasn’t agreed on metrics setup. The team is controlled by numbers instead that they use them as guidance and feedback loop. Regarding this remember that software developers are specialists in workarounds! In most cases metrics can be bypassed, the number itself looks good but the truth is different, e.g. I can easily increase code-coverage by adding useless unit-tests only executing code without doing any asserts.

3. Holy numbers

All metrics are taken for granted and high and low peaks aren’t questioned. Everyone is panicking because the metrics look so bad. But they were never evaluated as being realistic and never plausible tested. It is not that seldom that numbers were plain wrong and caused not only number but also unnecessary adrenaline peaks…

4. Bad format/Data floods

The metrics aren’t presented well. Even if you have potentially interesting data the insights/implications are just flooded away. For instance instead of using signal-light colours or concentrating on the most-important metrics you are getting overwhelmed by the millions of bare numbers. In the end (as we are humans) everyone will ignore such reports.

Conclusion

Above points seem to be common sense, still in practice metrics are misused a lot. Some observations even lead to the conclusion, that this is done on purpose to fake good results or to give a wrong feeling of control. This bad reputation is a shame because many metrics implemented correctly are an objective, cost-effective and powerful part of your feedback-loop. They also can be highly motivating because you have a target of improvement (e.g. less code-structure warnings, lowering production bugs, increase test-coverage).

]]>
http://www.aldana-online.de/2010/11/14/top-4-software-metrics-antipatterns/feed/ 0