Archive for the ‘Java’ Category
Posted by Jakub Holý on February 18, 2013
I was never sure what resources in JDBC must be explicitely closed and wasn’t able to find it anywhere explained. Finally my good colleague, Magne Mære, has explained it to me:
In JDBC there are several kinds of resources that ideally should be closed after use. Even though every Statement and PreparedStatement is specified to be implicitly closed when the Connection object is closed, you can’t be guaranteed when (or if) this happens, especially if it’s used with connection pooling. You should explicitly close your Statement and PreparedStatement objects to be sure. ResultSet objects might also be an issue, but as they are guaranteed to be closed when the corresponding Statement/PreparedStatement object is closed, you can usually disregard it.
Summary: Always close PreparedStatement/Statement and Connection. (Of course, with Java 7+ you’d use the try-with-resources idiom to make it happen automatically.)
PS: I believe that the close() method on pooled connections doesn’t actually close them but just returns them to the pool.
A request to my dear users: References to any good resources would be appreciate.
Posted in Java | Tagged: best practices, jdbc | 3 Comments »
Posted by Jakub Holý on January 31, 2013
Recommended Readings
Various
- Dustin Marx: Significant Software Development Developments of 2012 – Groovy 2.0 with static typing, rise of Git[Hub], NoSQL, mobile development (iOS etc.), Scala and Typesafe stack 2.0, big data, HTML5, security (Java issues etc.), cloud, DevOps.
- 20 Kick-ass programming quotes – including Bill Gates’ “Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”, B.W. Kernighan’s “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”, Martin Golding’s “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.” (my favorite)
- How to Have a Year that Matters (via @gbrindusa) – do you want to just survive and collect possessions or do you want to make a difference? Some questions everybody should pose to him/herself.
- Expression Language Injection – security defect in applications using JSP EL that can sometimes leads to double evaluation of the expressions and thus makes it possible to execute data supplied by the user in request parameters etc. as expressions, affects e.g. unpatched Spring 2.x and 3.
Languages etc.
- HN discussion about Scala 2.10 – compilation speed and whether it matters, comparison of the speed and type system with Haskell and OCaml, problems with incremental compilation (dependency cycles, fragile base class), some speed up tips such as factoring out subprojects, the pros and cons of implicits etc.
- Blog Mechanical Sympathy – interesting posts and performance tests regarding “writing software which works in harmony with the underlying hardware to gain great performance” such as Memory Access Patterns Are Important and Compact Off-Heap Structures/Tuples In Java.
- Neal Ford: Functional thinking: Why functional programming is on the rise – Why you should care about functional programming, even if you don’t plan to change languages any time soon – N. Ford explains the advantages of FP and why FP concepts are spreading into other languages (higher abstractions enabling focus on the results over steps and ceding control to the language, more reusability on a finer level (higher-order functions etc.), few generic data structures with many operations -> better composability, “new” and different tool such as lazy collections, shaping the language towards the problem instead of vice versa, aligning with trends such as immutability)
- Neal Ford: Java.next: The Java.next languages Leveraging Groovy, Scala, and Clojure in an increasingly polyglot world – a comparison of these languages with focus on what they are [not] suitable for, exploration of their paradigms (static vs. dynamic typing, imperative vs. functional)
SW development
- How to Completely Fail at BDD – a story of an enthusiastic developer who tried to make everyone’s life better by introducing automated BDD tests and failed due to differences in culture (and inability to change thinking from the traditional testing), a surprising lack of interest in the tool and learning how to write good tests: “Culturally, my current team just isn’t ready or interested in something like this.” Morale: It is hard to change people, good ideas are not enough.
- M. Feathers: Refactoring is Sloppy – refactoring is often prioritized out of regular development and refactoring sprints/stories aren’t popular due to past failures etc. An counter-intuitive way to get refactoring in is to imagine, during planning, what the code would need to be like to make it easy to implement a story. Then create a task for making it so before the story itself and assign it to somebody else then the story (to force a degree of scrutiny and communication). “Like anything else in process, this is medicine. It’s not meant to be ‘the way that people do things for all time’ [..]” – i.e. intended for use when you can’t fit refactoring in otherwise. It may also make the cost of the current bad code more visible. Read also the commits (f.ex. the mikado method case).
- Cyber-dojo: A great way to practice TDD together. Compare your read-green cycle and development over time with other teams. Purposefully minimalistic editor, a number of prepared tdd tasks.
- On the Dark Side of “Craftsmanship” – an interesting and provoking article. Some developers, the software labouers, want to get work done and go home, they haven’t the motivation and energy to continualy spend time improving themselves. There is nothing wrong with that and we shouldn’t disparge them because of that. We shouldn’t divide people into craftsmen and the bad ones. A summary of and response to the varied reactions follows up in More on “Craftsmanship”. The author is right that we can’t expect everybody to spend nights improving her/his programming skills. Still they should not produce code of poor quality (with few exceptions) since maintaining such code costs a lot. There should be time for enough quality in a 9-5 day and people should be provided with enough guidance and education to be able to write decent code. (Though I’m not sure how feasible it is, how much effort it takes to become an acceptable developer.) Does the increased cost of writing (an learning to write) good code overweight the cost of working with bad code? That is an eternal discussion.
Cloud, web, big data etc.
- Whom the Gods Would Destroy, They First Give Real-time Analytics (via Leon) – a very reasonable argument against real-time analytics: yes, we want real-time operational metrics but “analytics” only makes sense on a sensible amount of data (for the sake of statistical significance etc.) RT analytics could easily provide misguided results.
CAP Twelve Years Later: How the “Rules” Have Changed (tl;dr, via @_dagi) – an in-depth discussion of the CAP theorem and the simplification (2 out of 3) that it makes; there are many more nuances. By Eric Brewer, a professor of computer science at the University of California, Berkeley, and vice president of infrastructure at Google.
- ROCA: Resource-oriented Client Architecture – “A collection of simple recommendations for decent Web application frontends.” Server-side: true REST, no session state, working back/refresh etc. Client: semantic HTML independent of layout, progressive enhancement (usable with older browsers), usable without JS (all logic on the server) etc. Certainly not suitable for all types of apps but worthwile to consider the principles and compare them with your needs.
Clojure Corner
Tools
- Vaurien, the Chaos TCP Proxy (via @bsvingen) – an extensible proxy that you can control from your tests to simulate network failure or problems such as delays on 20% of the requests; great for testing how an application behaves when facing failures or difficulties with its dependencies. It supports the protocols tcp, http, redis, memcache.
- Wvanbergen’s request-log-analyzer for Apache, MySQL, PostgreSQL, Rails and more (via Zarko) – generates a performance report from a supported access log to point out requests that might need optimizing
- Working Effectively With iTerm2 (Mac) – good tips in the body and comments
Favorite Quotes
A very good (though not very scientific) definition of project success applicable for distinguishing truly agile from process-driven projects:
[..] a project is successful if:
- Something was delivered and put to use
- The project members, sponsors and users are basically happy with the outcome of the project
- Johannes Brodwall in “How do we become Agile?” and why it doesn’t matter, inspired by Alistair Cockburn
(Notice there isn’t a single word about being “on time and budget”.)
Posted in General, Java, Testing, Tools, Top links of month | Tagged: bigdata, clojure, cloud, fun, life, performance, SbE, scala, security, tdd, Testing | Leave a Comment »
Posted by Jakub Holý on December 31, 2012
Recommended Readings
Software development
- Kent Beck: When Worse Is Better: Incrementally Escaping Local Maxima – Kent reintroduces his Sprinting Centipede strategy (“reduce the cost of each change as much as possible so as to enable small changes to be chained together nearly continuously” => “From the outside it is clear that big changes are happening, even though from the inside it’s clear that no individual change is large or risky.”) and advices how to deal with situations where improvements have reached a local maxima by making the design temporarily intentionally worse (f.ex. by inlining all the ugly methods or writing to both the old and the new data store); strongly recommended
- Related: Efficient Incremental Change – transmute risk into time by doing small, safe steps, then optimize your ability to make these steps quickly and thus being able to achieve large changes
- Researchers: It is not profitable to outsource development – the Scandinavian research organisation SINTEF ICT has studied the effects of outsourcing and discovered that often it is more expensive than in-country development due to hidden costs caused by worse communication and cultural differences (f.ex. Indians tend not to ask questions and work based on their, often incomplete, understanding) and very high people turn-over; even after the true cost is discovered, companies irrationally stay there. However it is possible to succeed, in some cases.
- Bjørn Borud: Tractor pulling and software engineering – very valuable and pragmatic advices on producing good software (i.e. avoiding accumulating so much crap that the software just stops progressing). Don’t think only about the happy path. Simplify. Write for other developers, i.e. avoid too “smart” solutions, test & document, dp actually think about design and its implication w.r.t performance etc. Awake the scientist in you: “Do things because you know they work, not because it happens to be the hip thing to do.”
(Note: I see the good intention behind “design for the weakest programmer you can think of” but plase don’t take it too far! Software should be primarily simple, not necessarily easy.
- Know your feedback loop – why and how to optimize it – to succeed, we need to learn faster; the only way to do that is to optimize our feedback loops, i.e. shorten the path our assumptions travel before they are (in)validated, whether about our code, business functionality, or the whole project idea. Conscise, valuable.
- Code quality is the least important reason to pair program – the author argues, based on his experience, other benefits of pair programming are more important than code quality: “[..] the most important reasons why we pair: it contributes to an amazing company culture, it’s the best way to bring new developers up to speed, and it provides a great way to share knowledge across the development team.”
- You Can’t Refactor Your Way Out of Every Problem – refactoring can’t help you if the design is fundamentally wrong, you need to rewrite it; know when it can or cannot help and act accordingly (related to how much design is needed upfront since some design decision cannot be reverted/improved upon)
Languages
- Josh Bloch: Java – the good, bad and ugly parts (video, 15 min); summary: right design decisions (VM, GC, threads, dynamic linking, OOP, static typing, exceptions, …), some bad details (signed byte, lossy long-> double, == doesn’t cal .equals, ability to call overriden methods from constructors, …); Mr. Bloch has also given a longer talk examining the evolution of Java from 1.0 to 1.7 in The Evolution of Java: Past, Present, and Future.
- True Scala complexity – a thoughtful criticism of the complexity of Scala, based on code samples; “[it is true that] Scala is a language with a smaller number of orthogonal features than found in many other languages. [...] However, the problem is that each feature has substantial depth, intersecting in numerous ways that are riddled with nuances, limitations, exceptions and rules to remember. It doesn’t take long to bump into these edges, of which there are many.”; however, its possible to avoid many of the problems mentioned by resorting to less smart, more clumsy and verbose Java-like code; also, the author still likes Scala.
- Scala or Java? Exploring myths and facts (3/2012) – a balanced view of Scala’s strengths and weaknesses; “[..] the same features that makes Scala expressive can also lead to performance problems and complexity. This article details where this balance needs to be considered.” Topics: productivity, complexity, concurrency support, language extensibility, Java interoperability, quality of tooling, speed, backward compatibility. Plenty of useful links.
Big data & Cloud:
- Dean Wampler’s slides from Beyond Map Reduce – 1) Hadoop Map Reduce is the EJB 2 of big data but there are better APIs such as Cascading with Scala/Clojure wrappers; there are also “alternative” solutions like Spark and Storm; 2) functional/relational programming with simple data structures (lists, sets, maps etc.) is much more suitable for big data than OOP (for we do mostly stateless data transformations)
- Apache HBase vs Apache Cassandra – comparison sheet – if you want to decide between the two
- Optimizing MongoDB on AWS – 20 min talk about the current state of the art. Simplicity: Mongo AMIs by 10gen, Cloudformation template etc. Stability & perf.: new storage options – EBS with provisioned IOPS volumes (high I/O) + EBS Optimized Instances (dedicated throughput to EBS), High IO instances (hi1.4xlarge – SSD)); comparison of throughput (number of operations, MBs) of these storages; tips for filesystem config. Scalability: scale horizontally and vertically, shrink as needed.
- Getting Real About Distributed System Reliability by Jay Kreps, the author of the Voldemort DB: distributed software is NOT somehow innately reliable; a common mistake is to consider only probability of independent failures but failures typically are dependent (e.g. network problems affect the whole data center, not a single machine); the theoretical reliability “[..] is an upper bound on reliability but one that you could never, never approach in practice”; “For example Google has a fantastic paper that gives empirical numbers on system failures in Bigtable and GFS and reports empirical data on groups of failures that show rates several orders of magnitude higher than the independence assumption would predict. This is what one of the best system and operations teams in the world can get: your numbers may be far worse.” The new systems are far less mature (=> mor bugs, worse monitoring, less experience) and thus less reliable (it takes a decade for a FS to become mature, likely similar here). Distributed systems are of course more complex to configure and operate. “I have come around to the view that the real core difficulty of these systems is operations, not architecture or design.” Some nice examples of failures.
Other
- Talks To Help You Become A Better Front-End Engineer In 2013 (tl;dr) – topics such as mobile web development, modern web devel. workflow, current/upcoming featrues of CSS3, ECMAScript 6, CSS preprocessors (LESS etc.), how to write maintainable JS, modular CSS, responsive design, JS debugging, offline webapps, CSS profiling and speed tracer, JS testing
- On Being A Senior Engineer – valuable insights into what makes an engineer “senior” (i.e. mature; from the field of web operations but applies to IT in general): mature engineers seek out constructive criticism of their designs, understand the non-technical areas of how they are perceived (=> assertive, nice to work with etc.), understand that not all of their projects are filled with rockstar-on-stage work, anticipate the impact of their code (on resource usage, others’ ability to understand & extend it etc.), lift the skills and expertise of those around them, make their trade-offs explicit when making decisions, do not practice “Cover Your Ass Engineering,” are able to view the project from another person’s (stakeholder’s) perspective, are aware of cognitive biases (such as the Planning Fallacy), practice egoless programming, know the importance of (sometimes irrational) feelings people have.
Clojure Corner
- Polymorphism in Clojure – Tim Ewald’s 1h live coding talk at Øredev conference introducing mechanisms for polymorphism (and Java interoperability) in Clojure and explaining well the different use cases for them. Included: why records, protocols & polymorphism with them (shapes, area => open, not explicit switch) (also good for Java interop.: interfaces), reify, multimethods.
- Stuart Sierra: Thinking in Data (1h talk) – Sierra introduces data-oriented programming, i.e. programming with generic, immutable data structures (such as maps), pure functions, and isolated side-effects. Some other points: Records are an optimization, only for perforamnce (rarely) or polymorphism (ot often); the case for composable functions; testing using simulations (generative testing) etc.; visualization of state & process
Tools & Libs
- Netflix’ Hysterix: library to make distributed systems more resilitent by preventing a single slow/failing dependency from causing resource (thread etc.) exhaustion etc. by wrapping external calls in a separate thread with clear timeouts and support for fallbacks, with good monitoring etc. Read “Problem Definition” on the page to understand the problem it tries to solve.
Favorite Quotes
if you build something that is fundamentally broken it isn’t really interesting that you followed the plan or you followed some methodology — the thing you built is fundamentally broken.
- Bjørn Borud, Chief Architect at Comoyo.no, in an email 12/2012
The root of the Toyota Way is to be dissatisfied with the status quo; you have to ask constantly, “Why are we doing this?”
- Katsuaki Watanabe, Tyota President 2005 – 2009 (from the talk Deliberate Practice)
Posted in General, Java, Tools, Top links of month | Tagged: aws, bigdata, clojure, cloud, development, kent beck, methodology, outsourcing, scala, scaling, webapp, xp | Leave a Comment »
Posted by Jakub Holý on November 30, 2012
Recommended Readings
- James Roper: Scaling Scala vs Java (recommended by M. Odersky) – writing scalable apps in Scala is much easier then Java because idiomatic Scala uses immutable structures and lends itself naturally to asynchronous processing while doing these things in Java is possible but very unnatural and laborious. “It [Scala] is biased towards scaling, it encourages practices that help you scale.”
- Exception Handling Antipatterns (2006, still valuable) – Log and Throw, Throwing Exception (instead of a suitable subclass), Throwing the Kitchen Sink (declaring many exceptions in method signature), Catching Exception (instead of a particular subclass), Destructive Wrapping (not including the exception as cause), Log and Return Null, Catch and Ignore (swallowing the exception), Throw from Within Finally, Multi-Line Log Messages (via repeated log calls instead of \n), Ignoring InterruptedException (instead of breaking the loop etc.), Relying on getCause().
- The GitHub way: Your team should work like an open source project – a provocative article about the development process in GitHub that strongly prefers asynchronous and on-line communication over face-to-face meetings and communication, which, according to the author, leads to increased productivity. That is quite the opposite of what is usually practiced. I can think of situation where direct interaction is invaluable but, on the other hand, I could certainly live with less meetings. (Comments on Hacker News)
Clojure Corner
- Chas Emerick’s screencast Starting Clojure is a great example of Clojure development and interactive Clojure web development without restarts, with live code changes and direct access to the running app via REPL. It makes also a good job of introducing the Eclipse Clojure plugin Counterclockwise and the popular web framework Compojure with the template engine Enlive and HTTP abstraction Ring. Highly recommended! (I would however recommend to already know a little about the language.)
- Results of the 2012 State of Clojure survey (and, for comparison, 2010 results) – some interesting facts are what people use Clojure for (math / data analysis 35%, web development 70%), 60% people evaluating ClojureScript, answers to “What have been the biggest wins for you in using Clojure?”, the fact that ~ 20% use Eclipse, around 60% Emacs, only 10% IntelliJ, 23% vim. Also interesting is “What has been most frustrating for you in your use of Clojure” (with 30% mentions of documentation, being now improved by clojure-doc.org, 23% “future stuffing concerns”)
Favorite Quotes
You can reach a point with Lisp where, between the conceptual simplicity, the large libraries, and the customization of macros, you are able to write only code that matters.
- Rich Hickey in an interview
Lisp was a piece of theory that unexpectedly got turned into a programming language.
- Paul Graham in Revenge of the Nerds
Posted in General, Java, Top links of month | Tagged: clojure, github, logging, scala | Leave a Comment »
Posted by Jakub Holý on October 31, 2012
Recommended Readings
- David Veksler: Some lesser-known truths about programming – things newcomers into the field of IT don’t know and don’t expect, true and an interesting read. Not backed by good data but anyway. F.ex.: “[..] a programmer spends about 10-20% of his time writing code [..] much of the other 90% thinking, researching, and experimenting”. “A good programmer is ten times more productive than an average programmer. A great programmer is 20-100 times more productive than the average [..]” “Bad programmers write code which lacks conceptual integrity, non-redundancy, hierarchy, and patterns, and so is very difficult to refactor.” “Continuous change leads to software rot, which erodes the conceptual integrity of the original design.” “A 2004 study found that most software projects (51%) will fail in a critical aspect, and 15% will fail totally.”
- Brett L. Schuchert: Modern Mocking Tools and Black Magic – An example of power corrupting – interesting for two reasons: a good analysis of a poorly written piece of code and discussion of the code injection black magic (JMockIt) vs. actually breaking dependencies to enable tests. The author presents a typical example of low-quality method (mixing multiple concerns, mixing different levels of abstractions, untestable due to a hardcoded use of an external call) and discusses ways to improve it and to make it testable. Recommended to read.
- It’s Not About the Unit Tests – Learning from iOS Developers: iOS developers don’t do much testing yet they manage to produce high quality. How is that possible? The key isn’t testing itself, but caring for the code. (Of course, iOS is little special: small apps, no legacy, a powerful platform that does lot for the apps, very visual apps.) “It’s not about the practices. It’s about the spirit and intent behind them, and how they are applied.” (M. Fowler had a similar observation about a team that used mock-based testing exclusively and thus lacked integration tests yet all worked. [I've lost the link to the post and would be grateful for it])
- Java Code Quality Tools – Overview – brief descriptions of 44 quality-related tools including some interesting tools and Eclipse plugins I didn’t know or knew but forgot. F.ex. analysis of dependencies with JBoss Tattletale or JarAnalyzer, Clirr to check libraries for source and binary backwards compatibility, JDiff generates JavaDoc-based report of removed/added/changed in an API. Spoon – read and check or transform Java code. Java PathFinder (NASA) – special JVM capable of checking all execution path to discover concurrency defects etc.
Tools
- DirB, Directory Bookmarks for Bash (home) – moving efficiently among favourite directories (
s <name> to create a bookmark for pwd, g <bookmark | relative/abs dir path> to enter a dir (=> works both for bookmarks and as a replacement for cd); also support for relative path bookmarks & more; sl lists bookmakrs in the last used order) (You might also want to check out Autojump, described in Dec 11; bashmarks is another similar project. Another similar project is rupa’s z and j2 and the fish clone z-fish)
Clojure Corner
- Jon Pither: Clojure at a Bank – Moving from Java - the justification (productivity, dynamism, FP a better match for the domain) and process behind moving from Java to Clojure with a monolithic 1M LOC Spring/Hibernate app. (Random quotes: “I had used some dynamical languages before and it was quite obvious that we were essentially forcing lots of schema and type definition on to a problem domain that just didn’t want or need it.” “[..] it [dependency injection] just looks redundant in retrospect now that I’m working 95% with FP code.”) There is also a EuroClojure talk about their experiences one year later (35 min).
- Prismatic’s “Graph” at Strange Loop – an interesting desing problem, its solution, and a resulting OSS library. The problem: How to break a large function into independently usable small ones that might depend on each other without ever needing to recompute a value once the function producing is called. The solution: Graph – “Graph is a simple, declarative abstraction to express compositional structure.” (Enabling explicit declaration of data dependencies and pluging in different implementations.)
- The Oblong: Blog about 2/3 D game programming in Clojure, starting from scratch (w/o an engine); interesting experiences
- Ironclad: Steam Legions – Clojure game development battle report (the game on Github)
- Building the Wishlisted.org webapp in Clojure – experiences from learning Clojure for real by building a webapp in Noir
- Clojure vs. Scala smackdown (“Just kidding with the title of this post
“) – a short post with interesting discussion. Dmitri Sotnikov’s opinion resonates with me: “I found that for me Clojure wins on simplicity and consistency. While it looks more alien initially, once you learn the basics, you just reuse the same patterns everywhere.” Some more comments: “One major concern was maintainability, since it’s fairly easy to write very dense code. This turned out to not be a problem in practice. Because Clojure code is written as a tree, refactoring it is very easy.” REPL seems to be a big win (applies to Scala too). Scala’s type system might get tedious and learning its quirks takes time but there is lot of potential and both have they strong sides.
- Code Fatigue – discussion of the advantages of learning, using, and combining the (many) standard Clojure functions instead of a “basic solution” using recursion etc. The argument is in favor of higher-level code with less complexity in the form of branching, recursion, nested expressions etc. and thus less mental fatigue.
Favorite Quotes
A classic test only cares about the final state – not how that state was derived. Mockist tests are thus more coupled to the implementation [emphasis mine] of a method. Changing the nature of calls to collaborators usually cause a mockist test to break.
- Martin Fowler in his classical Mocks Aren’t Stubs
I’m afraid of code. When I see a big pile of code, I get scared
. Some classes and method make me cry. I had troubles explaining why I prefer short pieces of code keeping the same level of abstraction, cohesive and loosely coupled. The following quote captures the essence – improved communication.
One way to improve communication is to reduce the need for it and the same can be said for code. [...] Since we tend to read code more than write it, anything we can do to reduce the need to read code is time well invested in the life of a project.
- Brett L. Schuchert in Modern Mocking Tools and Black Magic – An example of power corrupting justifying extraction of code into another class or method
Posted in General, Java, Testing, Tools, Top links of month | Tagged: bash, clojure, legacy, productivity, quality, scala | Leave a Comment »
Posted by Jakub Holý on October 13, 2012
|
(By kristobalite)
Clojure:
Clean
Structured
Focused |
(By agiamba)
Scala:
Adorned
Overflowing
Magnificent
|
Clojure has a zen-like quality to it. There is extreme focus on simplicity, on defining few elementary orthogonal concepts that can be combined in powerful ways. For example it took 3 years for Clojure to get named parameters – but the result, destructuring, is something much richer and more applicable, that fits the language perfectly and has become a core part of idiomatic Clojure.
Scala feels as if trying to empower the programmer in any possible way, throwing in shortcuts, syntactic sugar, and plenty of methods so that anything can be done with the minimal amount of code. It is overwhelming.
Disclaimer: I have only a few weeks of experience with Scala and are in no position to judge it. This is just an impression I have gained so far and nothing more. I’m sure it is a great language.
Posted in General, Java | Tagged: clojure, opinion, scala | 1 Comment »
Posted by Jakub Holý on October 9, 2012
To load all fields from a tab-separated text file in Cascalog we need to use the generic hfs-tap and specify the “scheme” (notice that loading all fields and expecting tab as the separator is the default behavior of TextDelimited):
(hfs-tap
(cascading.scheme.hadoop.TextDelimited.)
"hdfs:///user/hive/warehouse/playerevents/epoch_week=2196/output_aewa-analytics-ada_1334697041_1.log")
With a custom separator and fields:
(hfs-tap
(cascading.scheme.hadoop.TextDelimited. (cascalog.workflow/fields ["?f1" "?f2"]) "\t") ; or cascading.tuple.Fields/ALL inst. of (fields ...)
"hdfs:///user/hive/warehouse/playerevents/epoch_week=2196/output_aewa-analytics-ada_1334697041_1.log")
Hadoop doesn’t manage to load data files from nested sub-directories (for example from a Hive partitioned table). To load them, you need to use a “glob pattern” to turn the standard Hfs tap into a GlobHfs tap. This is how we would match all the subdirectories (Hadoop will then handle loading the files in them):
(hfs-tap
(cascading.scheme.hadoop.TextDelimited.)
"hdfs:///user/hive/warehouse/playerevents/"
:source-pattern "epoch_week=*/")
Enjoy.
Posted in General, Java, Tools | Tagged: bigdata, cascalog, clojure | Leave a Comment »
Posted by Jakub Holý on September 25, 2012
This is a summary of the excellent JavaZone 2012 talk Going Native (vimeo) by Brian McCallister. Content: Using native libraries in Java and packaging them with Java apps, daemonization, trully executable JARs, powerful CLI, creating manpages, packaging natively as deb/rpm.
1. Using Native Libs in Java
Calling Native Libs
Calling native libraries such as C ones was hard and ugly with JNI but is very simple and nice with JNA (GPL) and JNR (Apache/LGPL)
Read the rest of this entry »
Posted in Java | Tagged: daemon, deployment, linux, native, ops | Leave a Comment »
Posted by Jakub Holý on September 21, 2012
(Disclaimer: Based on personal experience and little research, the information might be incomplete.)
VisualVM is a great tool for monitoring JVM (5.0+) regarding memory usage, threads, GC, MBeans etc. Let’s see how to use it over SSH to monitor (or even profile, using its sampler) a remote JVM either with JMX or without it.
This post is based on Sun JVM 1.6 running on Ubuntu 10 and VisualVM 1.3.3.
Read the rest of this entry »
Posted in General, Java, Tools | Tagged: jmx, monitoring, ops | 1 Comment »
Posted by Jakub Holý on September 12, 2012
Republished from blog.iterate.no with the permission of my co-authors Stig Bergestad and Krzysztof Grodzicki.
Three of us, namely Stig, Krzysztof, and Jakub, have had the pleasure of spending a week with Kent Beck during Iterate Code Camp 2012, working together on a project and learning programming best practices. We would like to share the valuable lessons that we have learnt and that made us better programmers (or so we would like to think at least).
Read the rest of this entry »
Posted in General, Java, Testing | Tagged: best practices, CleanCode, craftsmanship, kent beck, opinion, tdd | Leave a Comment »