<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[avanwyk]]></title><description><![CDATA[Deep Learning, Software Engineering, Machine Learning, Data Science]]></description><link>https://www.avanwyk.com/</link><image><url>https://www.avanwyk.com/favicon.png</url><title>avanwyk</title><link>https://www.avanwyk.com/</link></image><generator>Ghost 4.7</generator><lastBuildDate>Fri, 23 Feb 2024 08:06:35 GMT</lastBuildDate><atom:link href="https://www.avanwyk.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Revisiting Java in 2021 - II]]></title><description><![CDATA[A look at how Java 17 stacks up against Kotlin and Scala for development teams in 2021, and an overview of popular JVM technologies.]]></description><link>https://www.avanwyk.com/revisiting-java-in-2021-ii/</link><guid isPermaLink="false">61276cc90c9ab8157bcf270d</guid><category><![CDATA[software engineering]]></category><category><![CDATA[programming]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Sun, 19 Sep 2021 14:14:09 GMT</pubDate><media:content url="https://www.avanwyk.com/content/images/2021/09/charles-forerunner-gapYVvUg1M8-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.avanwyk.com/content/images/2021/09/charles-forerunner-gapYVvUg1M8-unsplash.jpg" alt="Revisiting Java in 2021 - II"><p><em>Cover image by <a href="https://unsplash.com/@charles_forerunner?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Charles Forerunner</a> on <a href="https://unsplash.com/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a>.</em></p><p><a href="https://www.avanwyk.com/revisiting-java-in-2021-i/">In Part I</a>, I gave an overview of the major language features introduced between Java 11 and Java 17 - showing us what Java looks like in 2021.</p><p>I also argued that even if Java doesn&apos;t have feature parity with some of the more sophisticated JVM languages (and it definitely doesn&apos;t), Java is very deliberately moving forward and definitely in the right direction.</p><p>But that still leaves the question, where does Java fit in the modern JVM landscape? Are the newer, more feature-rich languages not the obvious choice for all development teams?</p><p>Below, I attempt to address these questions. I also discuss the JVM and give an overview of popular JVM technologies in a variety of contexts.</p><h2 id="goliath-vs-the-davids">Goliath vs the Davids</h2><p>I might make it sound like Kotlin or Scala are nipping at Java&apos;s heals, but assuredly, that is not the case. The <a href="https://insights.stackoverflow.com/survey/2021#most-popular-technologies-language">2021 Stack Overflow Survey results</a> again showed the continued popularity of Java: for every 1 reported Kotlin developer, there are almost 4 Java developers. It&apos;s much worse for Scala, with the ratio being 11:1 in favour of Java (Clojure has it the hardest, but <a href="https://insights.stackoverflow.com/survey/2021#technology-top-paying-technologies">at least Clojure developers are paid well for their efforts</a>).</p><p>Java similarly dominates the TIOBE index, <a href="https://www.tiobe.com/tiobe-index/">sitting pretty at third place</a>, an enviable spot compared to Scala&apos;s 32nd place and Kotlin&apos;s 37th.</p><p>What does this mean? Besides fuel for your favourite language flame-war, not much? There are, however, advantages to being popular.</p><p>Building large teams (think hundreds or thousands of developers) requires talent, and talent is easier to find if more people know the language (although, if you want the absolute best people, you might be better off choosing a more niche language).</p><p>Similarly, with so many people working in the language, the amount and quality of related resources are much higher. There are many excellent Java libraries, many well-written books on how to program it effectively and many high profile architects, thought leaders and evangelists that continue to refine and redefine how Java should be programmed.</p><p>Java also arguably has the best tooling in the business. I&apos;d argue that IntelliJ IDEA (and family) is the best IDE out there, more so if you are coding Java. Even if you prefer VS Code, <a href="https://code.visualstudio.com/docs/languages/java">Java is well supported</a>. Build tools such as Maven and Gradle are fast, stable and effective. There is also the JVM platform itself to consider, and more recently <a href="https://www.graalvm.org/docs/getting-started/">GraalVM</a>, but more on that later.</p><h2 id="is-less-more">Is Less More?</h2><p>When it comes to programming languages, is less, more? Go(lang) certainly thinks so. Go famously rejects complexity to serve its <a href="https://talks.golang.org/2012/splash.article">design goal</a> of <a href="https://golang.org/doc/faq#Why_doesnt_Go_have_feature_X">keeping things simple and fast</a>. The Go language designers deem it necessary to achieve scale in terms of program size and execution and to <a href="https://talks.golang.org/2012/splash.article/#TOC_6.">enable large teams of programmers</a>.</p><p>So in terms of Kotlin and Scala, is Java perhaps a better choice precisely because it&apos;s a more straightforward language?</p><p>Scala specifically has so many features it&apos;s considered somewhat of an &apos;everything and the kitchen sink&apos; language, including <a href="https://dzone.com/articles/scala-ad-hoc-polymorphism-explained">ad-hoc polymorphism</a>, <a href="https://docs.scala-lang.org/overviews/macros/overview.html">macros</a>, <a href="https://nrinaudo.github.io/scala-best-practices/definitions/adt.html">ADTs</a>, a <a href="https://github.com/scalaz/scalaz">purely functional programming model</a> and much more.</p><p>Scala&apos;s additional complexity indeed enables powerful and elegant new solutions to programming problems. <a href="https://scalac.io/blog/typeclasses-in-scala/">Ad-hoc polymorphism and type classes</a> lead to beautifully flexible code. The benefits of functional programming have been widely evangelised and form the core of the <a href="https://blog.danlew.net/2017/07/27/an-introduction-to-functional-reactive-programming/">reactive programming model</a>. All of this is why Scala remains one of my favourite languages.</p><p>However, powerful new tools aren&apos;t free, and with the additional complexity, several ancillary issues are introduced downstream, such as <a href="https://stackoverflow.com/questions/30005124/why-is-compilation-very-slow-for-scala-programs">slow compilation</a> and <a href="https://stackoverflow.com/questions/39483325/why-scalas-sbt-is-too-slow">tooling</a> and the <a href="https://www.reddit.com/r/scala/comments/kjcwgf/why_is_scala_considered_hard/">steep learning curve</a>. These issues have nothing to do with the language features or how effectively and elegantly you can solve problems with Scala. Instead, they mar the developer experience, which, at least in my opinion, is perhaps the primary reason Scala has not achieved dominance on the JVM. <a href="https://docs.scala-lang.org/scala3/new-in-scala3.html">Scala 3 overhauls the language</a>, and I am curious to see how that pans out.</p><p>Like Scala, Kotlin also gives us new ways of solving problems that commonly occur in Java codebases or require third-party libraries to solve: compared to Java, Kotlin has better support for <a href="https://kotlinlang.org/docs/lambdas.html#instantiating-a-function-type">functional programming</a>, <a href="https://kotlinlang.org/docs/null-safety.html">null-safety</a>, an <a href="https://kotlinlang.org/api/latest/jvm/stdlib/">expanded standard library</a>, and has <a href="https://kotlinlang.org/docs/coroutines-overview.html">lightweight concurrency constructs</a> similar to Go.</p><p>However, although many see Kotlin as nearly a drop-in replacement for Java (which, at a technical level, it can be), it too has its <a href="https://medium.com/pinterest-engineering/the-case-against-kotlin-2c574cb87953">own learning curve</a>, a fact that is oft poorly acknowledged in my experience.</p><p>This is especially true when we move beyond basic knowledge of Kotlin (how to program in Kotlin) and start considering <em>idiomatic</em> and <em>effective</em> use of Kotlin (how you <em>should </em>program in Kotlin). In the absence of a Kotlin expert on your team, or standard, <a href="https://www.oreilly.com/library/view/effective-java/9780134686097/">widely accepted guidelines</a>, effective use of a language takes experience and is achieved through trial and error. Even experts will need to make choices specific to their team and team size.</p><figure class="kg-card kg-code-card"><pre><code class="language-kotlin">fun usingScoped() {
    val numVowels = getDTO()?.let { dto -&gt;
        dto.string?.let {
            countVowels(it)
        }
    } ?: 0
}

fun usingIf() {
    val dto = getDTO()
    val numVowels = if (dto?.string != null) countVowels(dto.string) else 0
}

fun usingWhenAndScoped() {
    val numVowels = when (val dto = getDTO()) {
        null -&gt; 0
        else -&gt; dto.string?.let { countVowels(it) } ?: 0
    }
}</code></pre><figcaption>Which of these is the best approach to deal with the nullability of a returned value and its properties? It could depend on your team.</figcaption></figure><p>Certainly, it&apos;s here that Java holds some advantage over the more powerful languages. Java, of course, has a learning curve, but it is flatter than either Scala or Kotlin since it&apos;s a simpler language. There are also many resources on how to program it effectively. It might not be exciting, but Java is a known quantity.</p><h2 id="the-jvm-platform">The JVM Platform</h2><p>Of course, Java is both a language and a platform through the JVM. James Ward recently gave <a href="https://jamesward.com/2021/03/16/the-modern-java-platform-2021-edition/">an excellent overview of the modern Java platform</a>, and I encourage you to check it out; I&apos;ll highlight some of it below.</p><p>As we saw above, three of the top 20 programming languages are JVM based. It doesn&apos;t really matter what your <a href="https://clojure.org/about/lisp">preferred style of programming is</a>; the JVM has you covered.</p><p>The ecosystem is also rich with frameworks and libraries to develop just about anything. Frameworks such as <a href="https://spring.io/projects/spring-boot">Spring Boot</a>, <a href="https://micronaut.io/">Micronaut</a> and <a href="https://quarkus.io/">Quarkus</a> are de facto standards for creating web applications, especially backends and microservices. More Rails or Django like full-stack and UI rich options are available in the <a href="https://www.playframework.com/">Play Framework</a>, <a href="https://vaadin.com/">Vaadin</a> (with Boot) and Spring Boot itself.</p><p>Another area where the JVM has a significant presence is in Big Data and data streaming communities. Many of the dominant frameworks in this area are written in Java or Scala, including <a href="https://kafka.apache.org/">Apache Kafka</a>, <a href="https://hadoop.apache.org/">Hadoop</a>, <a href="https://spark.apache.org/">Spark</a>, <a href="https://beam.apache.org/">Beam</a>, <a href="https://flink.apache.org/">Flink</a>, and <a href="https://nifi.apache.org/">NiFi</a>. Of course, the implementation language might not be the language used to interface with the framework, but Java is always supported, and in many cases, the default API.</p><p>A lot of progress has also been made in the Deep Learning space on the JVM over the last couple of years. There are two major Java-based Deep Learning frameworks: <a href="https://djl.ai/">DJL (Deep Java Library)</a> from Amazon and <a href="https://deeplearning4j.org/">DL4J (Deep Learning for Java)</a>, now with the Eclipse foundation. The libraries differ somewhat in their approach. DJL provides interfaces to <a href="https://djl.ai/docs/engine.html">&apos;Engines&apos;, which provide the lower level n-dimensional arrays and automatic differentiation functionality</a>. Supported engines include MXNet, Pytorch and Tensorflow.<br>DL4J&apos;s approach is more fundamental <a href="https://deeplearning4j.konduit.ai/">with its own implementation of n-dimensional arrays</a> (<a href="https://github.com/deeplearning4j/nd4j">ND4J</a>) and related functionality.</p><p><a href="https://www.reactivemanifesto.org/">Reactive programming</a> is one of the hallmarks of modern systems. Most major JVM frameworks <a href="https://spring.io/reactive">support</a> <a href="https://quarkus.io/guides/getting-started-reactive">reactive</a> coding, with full end-to-end reactive stacks now being possible, including the database layer via <a href="https://r2dbc.io/">R2BC</a>.<br>It&apos;s also worth mentioning <a href="https://akka.io/">Akka</a>, an implementation of the Actor Model on the JVM, which is extremely mature and supports writing highly reactive, distributed systems.</p><p>Within the Cloud and Cloud-Native contexts, the JVM also excels. JVM based applications are well-supported by all <a href="https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Java.html">major</a> <a href="https://cloud.google.com/appengine/docs/standard/java11/testing-and-deploying-your-app">cloud</a> <a href="https://azure.microsoft.com/en-us/develop/java/">providers</a>. The JVM is also easy to host in a <a href="https://cloud.google.com/blog/topics/developers-practitioners/comparing-containerization-methods-buildpacks-jib-and-dockerfile">container</a>, and containers are well <a href="https://docs.spring.io/spring-boot/docs/current/reference/html/features.html#features.container-images">supported by major frameworks</a> such as Spring. Applications run well regardless of whether they are developed in Java, Kotlin or Scala.</p><p>However, when running a Cloud-Native or Serverless Java application, there is some concern regarding JVM overhead. Quarkus addresses this<a href="https://quarkus.io/vision/container-first"> directly</a>; however, another major piece of the puzzle here is the <a href="https://www.graalvm.org/">GraalVM</a>.</p><h2 id="graalvm">GraalVM</h2><p>For those unfamiliar with the GraalVM project, it&apos;s a significant and impressive piece of engineering with several cutting edge features (such as <a href="https://www.graalvm.org/reference-manual/polyglot-programming/">polyglot programming</a>). Pertinently, it also includes <a href="https://www.graalvm.org/reference-manual/native-image/">GraalVM Native Image</a>, which enables the ahead-of-time compilation of Java applications to native binaries.</p><p>Having native binaries is, of course, game-changing in the Serverless and Cloud-Native contexts, with start-up time seeing reported improvements of <a href="https://medium.com/graalvm/lightweight-cloud-native-java-applications-35d56bc45673">50x</a>, and memory footprints decreasing by as much as <a href="https://medium.com/graalvm/lightweight-cloud-native-java-applications-35d56bc45673">5x</a>. I&apos;ve used Quarkus before, and though I didn&apos;t run benchmarks, I can report startup time was on the order of microseconds.</p><p>This does come with some <a href="https://www.graalvm.org/reference-manual/native-image/Limitations/">restrictions and caveats</a>; for example, it&apos;s not as straightforward to use Reflection in your Java code.</p><p>It may also <a href="https://www.youtube.com/watch?v=k2X1Rk1jk-E">not yet be that straightforward</a> to get your <a href="https://quarkus.io/guides/building-native-image">application working on the GraalVM</a> alongside your favourite framework. But progress is being <a href="https://spring.io/blog/2021/03/11/announcing-spring-native-beta">made rapidly</a>.</p><h2 id="the-future-of-java">The Future of Java</h2><p>There are several high profile Java projects in the pipeline aiming to further modernize the language. <a href="https://wiki.openjdk.java.net/display/loom">Project Loom</a> aims to introduce new lightweight concurrency constructs &#xE0; la coroutines.</p><p><a href="https://wiki.openjdk.java.net/display/valhalla">Valhalla</a> is introducing new inline types by updating the memory model, allowing better utilization of modern hardware architectures.</p><p>Finally, Project <a href="https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html">Leyden</a> aims to address startup time and time to peak performance for Java.</p><p>Besides Java, the issues mentioned above regarding Kotlin will be addressed over time, and the language itself <a href="https://kotlinlang.org/docs/roadmap.html">continues to evolve</a>. I also mentioned <a href="https://www.scala-lang.org/blog/2021/05/14/scala3-is-here.html">Scala 3</a>, which has a lot of potential.</p><h2 id="conclusion">Conclusion</h2><p>Whether Java is the right choice for you, I believe, depends on context.</p><p><a href="https://www.avanwyk.com/revisiting-java-in-2021-i/">Considered, predictable changes</a> are a significant advantage, if not an outright necessity, in specific contexts. Huge teams (thousands of developers) running projects for decades, or projects that have upgrade cycles measured in years, or codebases that work in an environment that necessitates safety and predictability (e.g. finance or medicine) thrive on this kind of roadmap.</p><p>Further, as detailed above, Java, as a platform, is certainly not lacking in capability. Regardless of your chosen JVM language, you&apos;re unlikely to find yourself in a corner where the isn&apos;t a framework, library, or tool to support and accelerate your development.</p><p>Am I advocating for Java? Maybe a little. Would I start a new project in Java? Probably not; I&apos;d prefer Kotlin. However, does it make sense if other teams choose Java instead in 2021? Absolutely.</p><p></p><p>Follow me on <a href="https://twitter.com/avanwykai">Twitter</a>.<br>Or email me at: <code>interesting</code> at <code>avanwyk.com</code></p>]]></content:encoded></item><item><title><![CDATA[Revisiting Java in 2021 - I]]></title><description><![CDATA[What does Java look like in 2021? An overview of Java 17, the latest LTS Java release, including Records, Sealed Classes, and Pattern Matching.]]></description><link>https://www.avanwyk.com/revisiting-java-in-2021-i/</link><guid isPermaLink="false">611f7a4f0c9ab8157bcf24a4</guid><category><![CDATA[software engineering]]></category><category><![CDATA[programming]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Wed, 01 Sep 2021 21:09:17 GMT</pubDate><media:content url="https://www.avanwyk.com/content/images/2021/09/michael-dziedzic-qDG7XKJLKbs-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.avanwyk.com/content/images/2021/09/michael-dziedzic-qDG7XKJLKbs-unsplash.jpg" alt="Revisiting Java in 2021 - I"><p><em>Cover image by <a href="https://unsplash.com/@lazycreekimages">Michael Dziedzic</a> on <a href="https://unsplash.com">Unsplash</a>.</em></p><h2 id="introduction">Introduction</h2><p>September marks the release of Java 17, the latest LTS Java release. Java 17 is also the culmination of many language and platform improvements that have steadfastly been introduced with every Java release since Java 11.</p><p>I&apos;ve been working on the JVM platform for well over a decade at this stage, and, as I am sure is the case for many others, this does not necessarily mean I have been programming Java.</p><p>I fully embraced Scala, an excellent language with a flawed developer experience, when it came along and drastically improved my functional and reactive programming skills in the process.</p><p>Recently, however, Kotlin has been my go-to language on the JVM. Although <a href="https://techcrunch.com/2019/05/07/kotlin-is-now-googles-preferred-language-for-android-app-development/">synonymous with Android development</a>, Kotlin has also been embraced by the JVM backend community and works well with several popular frameworks such as <a href="https://spring.io/guides/tutorials/spring-boot-kotlin/">Spring Boot</a>, <a href="https://quarkus.io/guides/kotlin">Quarkus</a>, and <a href="https://github.com/micronaut-projects/micronaut-kotlin">Micronaut</a>.</p><p>With many excellent languages to choose from on the JVM (shoutout to Clojure), where does Java itself fit in? If you haven&apos;t been keeping an eye on Java, what have you missed? Is Java still a viable, modern option when programming on the JVM?</p><h2 id="the-java-17-language">The Java 17 language</h2><p>To many (if not most) developers, <a href="https://snyk.io/jvm-ecosystem-report-2021/">Java still refers to Java 8</a>. So let&apos;s quickly recap the major features added in Java 9, 10, and 11 (the previous LTS release) for those that may not be familiar.</p><h3 id="catching-up-java-8-to-11">Catching-up: Java 8 to 11</h3><ul><li>Java 9 brought modularity to the Java platform. The Modules system (aka Project Jigsaw) was a significant change and affected much of the underlying systems and middleware running Java applications. I encourage you to <a href="https://openjdk.java.net/projects/jigsaw">dive deeper</a> for <a href="https://openjdk.java.net/projects/jigsaw/quick-start">yourself</a>.</li><li>Java 10 brought local type inference, allowing the use of <code>var</code> in declaring local variables. There are some examples shown in the code below.</li><li>Java 11 didn&apos;t introduce any major language features but came along with the notorious <a href="https://www.oracle.com/za/java/technologies/javase/jdk-faqs.html">Oracle Java license change</a>.</li></ul><p>The above list is by no means comprehensive, and a plethora of JDK improvements, including new GCs, and more minor language changes, were also introduced. <a href="https://www.baeldung.com/new-java-9">Complete</a> <a href="https://www.baeldung.com/java-10-overview">lists</a> are widely <a href="https://www.baeldung.com/java-11-new-features">available</a>.</p><h3 id="java-17">Java 17</h3><p>Back to Java 17 then, as mentioned above, Java 17 really brings several related major features together. These being:</p><ul><li>Switch expressions (in preview since 12, standard since 14)</li><li>Switch with yield (since 13)</li><li>Text blocks (since 12, standardized in 15)</li><li>Records (since 14, standardized in 16)</li><li>Sealed classes (since 15, standardized in 17)</li><li>Pattern matching using <code>instanceof</code> (since 14, standard in 16)</li><li>Pattern matching in switch statements (still in preview in 17).</li></ul><p>Again, the list is not comprehensive; notably, the ZGC and Shenandoah garbage collectors have also since been standardized (since 15), but the features listed above have a significant impact on how we will program Java going forward, which is why I want to focus on them. Let&apos;s discuss each of these in more detail.</p><p>The code examples shown below are also <a href="https://github.com/avanwyk/java-17-examples">available on Github</a>.</p><p><strong>Switch Expression</strong><br>The <code>switch</code> can now be used as both a statement and an expression (meaning it returns a value). &#xA0;<code>case</code> labels are now also available in two forms: <code>case VALUE: ...</code> which works the same as always (allowing fall through) and the new arrow syntax: <code>case VALUE -&gt; ...</code> which does not fall through.<br><code>yield</code> has also been introduced as a new keyword. <code>yield</code> allows you to specify the &apos;return&apos; value of a case block but is not required if the case is a single expression. The code sample below illustrates the new <code>switch</code> expression.</p><figure class="kg-card kg-code-card"><pre><code class="language-java">public static String switchExpression(String day) {
  var dayType = switch (day) {
    case &quot;MON&quot;, &quot;TUE&quot;, &quot;WED&quot;, &quot;THUR&quot;, &quot;FRI&quot; -&gt; { // the arrow means we need no break, it will only match this case.
      System.out.println(&quot;Checking Week Day&quot;);
      yield &quot;Work day&quot;;  // if the case is a block, we can use yield to supply the value.
    }
    case &quot;SAT&quot;, &quot;SUN&quot; -&gt; &quot;Weekend day&quot;; // yield is not required for a single expression.
    default -&gt; throw new IllegalArgumentException(&quot;Unknown day&quot;);
  };
  return dayType;
}</code></pre><figcaption>A Java Switch Expression, yielding a value. Also shown is local type inference (the <code>var</code> keyword) introduced in Java 10.</figcaption></figure><p><strong>Text Blocks</strong><br>Text Blocks are a straightforward but exceedingly helpful feature. Text Blocks allow you to specify <code>Strings</code> over multiple lines, avoiding the need to awkwardly concatenate single-line strings and improving readability.</p><figure class="kg-card kg-code-card"><pre><code class="language-java">var json = &quot;&quot;&quot;
    {
      &quot;name&quot;: &quot;ext&quot;,
      &quot;systemKey&quot;: &quot;1234568&quot;,
      &quot;owner&quot;: {
        &quot;name&quot;: &quot;admin&quot;,
        &quot;adminCredentials&quot;: &quot;abcdef&quot;
      }
    }
    &quot;&quot;&quot;;</code></pre><figcaption>A Java Text Block. The JSON is now easy to read and update. Note, there is no need to escape quotation marks within the string.</figcaption></figure><p><strong>Records</strong><br>Records are immutable data classes. Using the new <code>record</code> keyword, only the data type and name of fields are specified, and Java generates a constructor, accessors, equals/hashcode, and toString methods. This should be familiar to you if you know <a href="https://kotlinlang.org/docs/data-classes.html">Kotlin</a> or <a href="https://docs.python.org/3/library/dataclasses.html">Python</a> data classes, Scala <a href="https://docs.scala-lang.org/tour/case-classes.html">case classes</a>, or use <a href="https://projectlombok.org/features/Data">Lombok&apos;s @Data annotation</a>.</p><p>Records remove the need to code boilerplate, verbose immutable data classes by hand, are especially well suited as DTOs and, work well to <a href="https://github.com/FasterXML/jackson-databind/issues/2709">represent JSON objects</a>.</p><figure class="kg-card kg-code-card"><pre><code class="language-java">record Admin(String name, String adminCredentials) implements Principal { }

record ExternalSystem(String name, String systemKey, Admin owner) implements Principal { }

static ExternalSystem readJSON() throws JsonProcessingException {
  var objectMapper = new ObjectMapper();
  var json = &quot;&quot;&quot;
      {
        &quot;name&quot;: &quot;ext&quot;,
        &quot;systemKey&quot;: &quot;1234568&quot;,
        &quot;owner&quot;: {
          &quot;name&quot;: &quot;admin&quot;,
          &quot;adminCredentials&quot;: &quot;abcdef&quot;
        }
      }
      &quot;&quot;&quot;;
  return objectMapper.readValue(json, ExternalSystem.class); // Record support needs Jackson 2.12+
}

// ...

final var externalSystem = readJSON();
System.out.println(&quot;System: &quot; + externalSystem.name()); // System: ext
System.out.println(&quot;Owner: &quot; + externalSystem.owner()); // Owner: Admin[name=admin, adminCredentials=abcdef]</code></pre><figcaption>Java Records implementing an interface <code>Principal</code> . Constructors, accessors, equals, hashcode and toString are generated by the compiler. Records make it trivial to write JSON DTOs.</figcaption></figure><p><strong>Sealed Classes and Interfaces</strong><br>Sealed Classes introduce the concept of fixed-sized class hierarchies to Java. Previously, similar functionality could have been achieved using Enum classes and package-private class hierarchies, both with their own caveats. Sealed Interfaces and Classes specify permitted sub-classes using the new <code>permits</code> keyword. <a href="https://openjdk.java.net/jeps/409">Three constraints are imposed on Sealed Classes</a>:</p><!--kg-card-begin: markdown--><ol>
<li>The sealed class and permitted sub-classes must be in the same module or, if declared in an unnamed module, in the same package.</li>
<li>Permitted sub-classes must directly extend the sealed class.</li>
<li>Every permitted sub-class must include a modifier that specifies how the seal is propagated:
<ul>
<li><code>final</code> to terminate the hierarchy.</li>
<li><code>sealed</code> thereby permitting further sub-classes.</li>
<li><code>non-sealed</code> thereby again opening up the sub-hierarchy for extension by unknown classes; the superclass cannot prevent this.</li>
</ul>
</li>
</ol>
<!--kg-card-end: markdown--><p>Records also work well as leaf nodes (values) in sealed hierarchies as they are implicitly final, terminating the hierarchy. An example is given below.</p><figure class="kg-card kg-code-card"><pre><code class="language-java">public sealed interface Principal permits User, Admin, ExternalSystem { // Sealed interface
  String name();
}

record Admin(String name, String adminCredentials) implements Principal { } // Records are implicitly final (required to implement sealed interface)

record ExternalSystem(String name, String systemKey, Admin owner) implements Principal { } // Implicitly implements name() from interface

abstract sealed class User implements Principal permits AnonymousUser, RegisteredUser {

  private final String username;

  protected User(String username) {
    this.username = username;
  }

  @Override
  public String name() {
    return username;
  }
}

final class AnonymousUser extends User {

  public static final String ANONYMOUS = &quot;ANONYMOUS&quot;;

  AnonymousUser() {
    super(ANONYMOUS);
  }
}</code></pre><figcaption>A sealed Java interface and class hierarchy. Sealed interfaces and classes limit the possible sub-classes: the permitted sub-classes are explicitly listed (`RegisteredUser` is omitted for brevity).</figcaption></figure><p><strong>Pattern Matching for <code>instanceof</code></strong><br>Java 16 introduces pattern matching via <code>instanceof</code>. In its current state, pattern matching is convenient but basic. However, it lays the foundation for more sophisticated pattern matching in future releases. With <code>instanceof</code>, Pattern Matching tests a type pattern against an object and automatically casts to the type while introducing a local variable. The code below illustrates this and highlights the convenience. I&apos;ll discuss Pattern Matching further in the context of <code>switch</code>.</p><figure class="kg-card kg-code-card"><pre><code class="language-java">if (p instanceof Admin a) { // matches type pattern (Admin a), and casts with a local variable
  return checkCredentials(a);
}

// ...
@Override
public boolean equals(Object o) {
  return (o instanceof RegisteredUser r) &amp;&amp;
      name().equals(r.name()) &amp;&amp;
      password().equals(r.password());
}</code></pre><figcaption>Pattern Matching for <code>instanceof</code>. The second example illustrates the convenience of the cast and local variable that is automatically created.</figcaption></figure><p><strong>Pattern Matching for <code>switch</code> expressions</strong><br>Finally, we have pattern matching for <code>switch</code> expressions. Pattern matching for <code>switch</code> allows checking expressions against multiple patterns similar to those seen with <code>instanceof</code>. Two examples are given below, one of which uses the sealed class hierarchy defined above to illustrate that the default case is not required since the hierarchy can be listed exhaustively. &#xA0;The feature is still in preview in Java 17 and is also limited in scope; future work includes the possibility of introducing destructuring in patterns and case guards akin to what&apos;s possible with <a href="https://docs.scala-lang.org/tour/pattern-matching.html">Scala&apos;s <code>match</code> based pattern matching</a>.</p><figure class="kg-card kg-code-card"><pre><code class="language-java">public static String switchPatterns(Object o) {  // matching type patterns
  return switch (o) {
    case Integer i -&gt; &quot;Integer type: &quot; + i;
    case Boolean b -&gt; &quot;Boolean type: &quot; + b;
    case String s -&gt; &quot;String type: &quot; + s;
    default -&gt; &quot;Unknown type&quot;;
  };
}

static boolean authenticate(Principal p) {
  return switch (p) { // pattern matching switch
    case AnonymousUser u -&gt; true;
    case RegisteredUser r -&gt; checkCredentials(r); // no casting required
    case Admin a -&gt; checkCredentials(a);
    case ExternalSystem s -&gt; checkCredentials(s);
  }; // compiler knows cases are exhaustive due to sealed interface/class
}</code></pre><figcaption>Pattern Matching in Java <code>switch</code> expressions.</figcaption></figure><h2 id="how-far-have-we-come-with-java-17">How far have we come with Java 17?</h2><p>Clearly, Java has made significant progress with the road to Java 17. The features shown above are certainly useful, well-engineered additions to the language. Records alone will save thousands of lines of code, and the enhancements to the <code>switch</code> statement is immediately useful and clearly lays the foundation for further, powerful data-driven programming via pattern matching.</p><p>However, the natural question is whether it has been enough to allow Java to &quot;catch up&quot; to its more feature-rich JVM competitors, aka Kotlin and Scala.<br>That answer would clearly be no in terms of language features, but are we asking the right question?</p><p>Instead, I would like to note two things that are clear to me with the changes between 11 and 17.</p><p>First, Java is absolutely leveraging its last-movers advantage: the features shown above are well understood in other languages and have a proven track record of being useful. Developers already know how to use them effectively, with minimal foot-gunning risks. This keeps the learning curve low and gives Java projects an easy upgrade path. Java projects have powerful new tools without new threats to productivity.</p><p>Second, if Java seemed stagnant, it should be clear that the language is indeed moving forward, but in a predictable, well-considered way. The 6-monthly release cycle is working and is certainly a vast improvement on the multi-year releases of years past.</p><p>In Part II (coming soon), I take a harder look at where Java fits in relative to Scala and Kotlin, delve into the other half of the equation: the JDK platform, and attempt to summarize where I see Java in 2021.</p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Super-convergence in Tensorflow 2 with the 1Cycle Policy]]></title><description><![CDATA[Implementing super-convergence for deep neural network training in Tensorflow 2 with the 1Cycle learning rate policy.]]></description><link>https://www.avanwyk.com/tensorflow-2-super-convergence-with-the-1cycle-policy/</link><guid isPermaLink="false">5d6e36601ead39070efa3842</guid><category><![CDATA[deep learning]]></category><category><![CDATA[tensorflow]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Mon, 02 Sep 2019 21:45:00 GMT</pubDate><media:content url="https://www.avanwyk.com/content/images/2019/09/florian-olivo-V9lgdQX_K4I-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.avanwyk.com/content/images/2019/09/florian-olivo-V9lgdQX_K4I-unsplash.jpg" alt="Super-convergence in Tensorflow 2 with the 1Cycle Policy"><p><a href="https://arxiv.org/abs/1708.07120">Super-convergence in deep learning is a term coined by research Leslie N. Smith</a> in describing a phenomenon where deep neural networks are trained an order of magnitude faster then when using traditional techniques. The technique has lead to some <a href="https://www.fast.ai/2018/04/30/dawnbench-fastai/">phenomenal results</a> in the <a href="https://dawn.cs.stanford.edu/benchmark/">Dawnbench project</a>, leading to the cheapest and fastest models at the time.</p><p>The basic idea of super-convergence is to make use of a much higher learning rate while still ensuring the network weights converge.</p><p>The is achieved by through use of the 1Cycle learning rate policy. The 1Cycle policy is a specific schedule for adapting the learning rate and, if the optimizer supports it, the momentum parameters during training.</p><p>The policy can be described as follows:</p><ol><li><a href="https://www.avanwyk.com/finding-a-learning-rate-in-tensorflow-2/">Choose a high maximum learning rate</a> and a maximum and minimum momentum.</li><li>In phase 1, starting from a much lower learning rate (<code>lr_max / div_factor</code>, where <code>div_factor</code> is e.g. <code>25.</code>) gradually increase the learning rate to the maximum while gradually decreasing the momentum to the minimum.</li><li>In phase2, reverse the process: decrease learning rate back to the learning rate minimum while increasing the momentum to the maximum momentum.</li><li>In the final phase, decrease the learning rate further (e.g. <code>lr_max / (div_factor *100)</code>, while keeping momentum at the maximum.</li></ol><p>Work from the FastAI team has shown that the policy can be improved by using just two phases:</p><ol><li>The same phase 1, however cosine annealing is used to increase the learning rate and decrease the momentum.</li><li>Similarly, the learning rate is decreased again using cosine annealing, to a value of approx. 0 while momentum increasing to the maximum momentum.</li></ol><p>Over the course of training this leads to the following learning rate and momentum schedules:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="/content/images/2019/09/one_policy_lr_mom.png" class="kg-image" alt="Super-convergence in Tensorflow 2 with the 1Cycle Policy" loading="lazy"><figcaption>1Cycle learning rate and momentum schedules.</figcaption></figure><p>For a more in depth analysis of the <a href="https://sgugger.github.io/the-1cycle-policy.html">1Cycle policy see Sylvain Gugger&apos;s post on the topic</a>.</p><h3 id="tensorflow-2-implementation">Tensorflow 2 implementation</h3><p>The policy is straightfoward to implement in Tensorflow 2. The implementation given below is <a href="https://docs.fast.ai/callbacks.one_cycle.html">based on the FastAI library implementation</a>.</p><!--kg-card-begin: html--><script src="https://gist.github.com/avanwyk/57724eb3cfff60a1451e4b422c73bfb7.js"></script><!--kg-card-end: html--><h3 id="application">Application</h3><p>Applying the 1Cycle callback is straightforward, simply add it as a callback when calling <code>model.fit(...)</code>:</p><pre><code class="language-python">epochs = 3
lr = 5e-3
steps = np.ceil(len(x_train) / batch_size) * epochs
lr_schedule = OneCycleScheduler(lr, steps)

model = build_model()
optimizer = tf.keras.optimizers.RMSprop(lr=lr)
model.compile(optimizer=optimizer, loss=&apos;sparse_categorical_crossentropy&apos;, metrics=[&apos;accuracy&apos;])

model.fit(train_ds,epochs=epochs, callbacks=[lr_schedule])</code></pre><h3 id="results">Results</h3><p>For a complete example of how the 1Cycle policy is applied, <a href="https://www.avanwyk.com/finding-a-learning-rate-in-tensorflow-2/">including how to find an appropriate maximum learning rate</a>, to two CNN based learning tasks, a <a href="https://www.kaggle.com/avanwyk/tf2-super-convergence-with-the-1cycle-policy">Kaggle notebook has been made available</a>.</p><h3 id="references">References</h3><ol><li><a href="https://arxiv.org/abs/1708.07120">Super-Convergence: Very Fast Training of Neural Networks Using <br>Large Learning Rates, Leslie N. Smith, Nicholay, Topin</a></li><li><a href="https://sgugger.github.io/the-1cycle-policy.html">The 1cycle policy, Sylvain Gugger</a></li><li><a href="https://docs.fast.ai/callbacks.one_cycle.html">FastAI callbacks.one_cycle</a></li><li><a href="https://www.kaggle.com/avanwyk/tf2-super-convergence-with-the-1cycle-policy">https://www.kaggle.com/avanwyk/tf2-super-convergence-with-the-1cycle-policy</a></li></ol>]]></content:encoded></item><item><title><![CDATA[Finding a Learning Rate with Tensorflow 2]]></title><description><![CDATA[Implementing the technique in Tensorflow 2 is straightforward. Start from a low learning rate, increase the learning rate and record the loss. Stop when a very high learning rate is reached. Plot the losses and learning rates choosing a learning rate where the loss is decreasing at a rapid rate.]]></description><link>https://www.avanwyk.com/finding-a-learning-rate-in-tensorflow-2/</link><guid isPermaLink="false">5d3b0ebfc2081e0712959bbf</guid><category><![CDATA[deep learning]]></category><category><![CDATA[tensorflow]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Sun, 28 Jul 2019 11:05:37 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1543286386-2e659306cd6c?ixlib=rb-1.2.1&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ" medium="image"/><content:encoded><![CDATA[<img src="https://images.unsplash.com/photo-1543286386-2e659306cd6c?ixlib=rb-1.2.1&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ" alt="Finding a Learning Rate with Tensorflow 2"><p>Choosing a good learning rate is <a href="https://arxiv.org/pdf/1506.01186.pdf">the most important hyper-parameter</a> choice when training a deep neural network (assuming a gradient based optimization algorithm is used).</p><p>Choosing a learning rate that&apos;s too small leads to extremely long training times. Whereas a learning rate that&apos;s too large might miss the optimum and lead to training divergence.</p><p>Fortunately there is a simple way to estimate a good learning rate. First described by Leslie Smith in <a href="https://arxiv.org/abs/1506.01186">Cyclical Learning Rates for Training Neural Networks</a>, and then popularized by the <a href="https://docs.fast.ai/callbacks.lr_finder.html">FastAI</a> library, which has a <a href="https://docs.fast.ai/callbacks.lr_finder.html">first class implementation of a learning rate finder.</a></p><p>The technique can be described as follows:</p><!--kg-card-begin: markdown--><ol>
<li>Start with a very low learning rate e.g. 1-e7.</li>
<li>After each batch, increase the learning rate and record the loss and learning rate.</li>
<li>Stop when a very high learning rate (10+) is reached, or the loss value explodes.</li>
<li>Plot the recorded losses and learning rates against each other and choose a learning rate where the loss is strictly decreasing at a rapid rate.</li>
</ol>
<!--kg-card-end: markdown--><p>For a more thorough explanation of the technique see <a href="https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html">Sylvain Gugger&apos;s post</a>.</p><h3 id="implementation">Implementation</h3><p>Implementing the technique in <a href="https://www.tensorflow.org/beta/">Tensorflow 2</a> is straightforward when implemented a <a href="https://www.tensorflow.org/beta/guide/keras/custom_callback">Keras Callback</a>. A Tensorflow 2 compatible implementation is given below and is also available on <a href="https://github.com/avanwyk/tensorflow-projects/blob/master/lr-finder/lr_finder.py">Github</a>.</p><!--kg-card-begin: html--><script src="https://gist.github.com/avanwyk/f0f031c537d098e8fe721e952e6823d3.js"></script><!--kg-card-end: html--><p>The implementation uses an exponentially increasing learning rate, which means smaller learning rate regions will be explored more thoroughly than larger learning rate regions.</p><p>The losses are also smoothed using a smoothing factor to prevent sudden or erratic changes in the loss (due to the stochastic nature of the training) from stopping the search process prematurely.</p><h3 id="application">Application</h3><p>In order to use the LRFinder: instantiate and compile a model, adding it as a callback. The model can then be fit as usual. The callback will record the losses and learning rates and stop training when the loss value diverges or the maximum learning rate is reached.</p><pre><code class="language-python">from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout

def build_model():
    return tf.keras.models.Sequential([
        Conv2D(32, 3, activation=&apos;relu&apos;),
        MaxPool2D(),
        Flatten(),
        Dense(128, activation=&apos;relu&apos;),
        Dropout(0.1),
        Dense(10, activation=&apos;softmax&apos;)
    ])

lr_finder = LRFinder()
model = build_model()
model.compile(optimizer=&apos;adam&apos;, loss=&apos;sparse_categorical_crossentropy&apos;)
_ = model.fit(train_ds, epochs=5, callbacks=[lr_finder], verbose=False)

lr_finder.plot()

</code></pre><p>The plot method will produce a graph of the results, allowing visually choosing a learning rate:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.avanwyk.com/content/images/2019/07/lr_finder.png" class="kg-image" alt="Finding a Learning Rate with Tensorflow 2" loading="lazy"><figcaption>The results of the LRFinder. The losses are plotted against the log scaled learning rates. A good learning rate would be in the range where the loss is strictly decreasing at a rapid rate: [1e-3, 1e-2].</figcaption></figure><p>A value should be chosen in a region where the loss is rapidly, but strictly decreasing. Examples of such graphs and how they are interpreted are also available in <a href="https://www.avanwyk.com/african-antelope-fastai-image-classifier/">previous</a> <a href="https://www.avanwyk.com/cdc-mortality-fastai-tabular/">posts</a>.</p><p>It is important to rebuild and recompile the model after the LRFinder is used in order to reset the weights that were updated during the mock training run.</p><p>A complete example of how the LRFinder is applied is available in <a href="https://github.com/avanwyk/tensorflow-projects/blob/master/lr-finder/tensorflow2_lr_finder.ipynb">this Jupyter notebook</a>.</p><h3 id="references">References</h3><ol><li><a href="https://arxiv.org/abs/1506.01186">Cyclical Learning Rates for Training Neural Networks, Leslie N. Smith</a></li><li><a href="https://docs.fast.ai/callbacks.lr_finder.html">https://docs.fast.ai/callbacks.lr_finder.html</a></li><li><a href="https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html">How Do You Find a Good Learning Rate, Sylvain Gugger</a></li></ol>]]></content:encoded></item><item><title><![CDATA[African Antelope: A Case Study of Creating an Image Dataset with FastAI]]></title><description><![CDATA[An end-to-end example of how to create your own image dataset from scratch and train a ResNet50 convolutional neural network for image classification using the FastAI library.]]></description><link>https://www.avanwyk.com/african-antelope-fastai-image-classifier/</link><guid isPermaLink="false">5cb1adfb2f87c3098b88341a</guid><category><![CDATA[fastai]]></category><category><![CDATA[deep learning]]></category><category><![CDATA[data science]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Sun, 14 Apr 2019 11:19:17 GMT</pubDate><media:content url="https://www.avanwyk.com/content/images/2019/04/unsplash-antelope.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://www.avanwyk.com/content/images/2019/04/unsplash-antelope.jpeg" alt="African Antelope: A Case Study of Creating an Image Dataset with FastAI"><p>(Note: this post was updated on 2019-05-19 for clarity.)</p><p>In this post we will look at an end-to-end case study of how to creating and cleaning your own small image dataset from scratch and then train a ResNet convolutional neural network to classify the images using the FastAI library.</p><p>Besides gathering the data, we will also illustrate how to perform model assisted data cleaning to partially automate the cleaning of the data itself.</p><h3 id="contents">Contents</h3><ol><li><a href="#creating-an-image-dataset">Creating an Image Dataset</a><br>i. &#xA0; <a href="#downloading-the-data">Downloading the Data</a><br>ii. &#xA0;<a href="#cleaning-the-data">Cleaning the Data</a></li><li><a href="#training-the-model">Training the Model</a><br>i. &#xA0; <a href="#building-the-dataset">Building the Dataset</a><br>ii. &#xA0;<a href="#creating-the-model">Creating the Model</a><br>iii. <a href="#fitting-the-model">Fitting the Model</a><br>iv. <a href="#initial-results">Initial Results</a></li><li><a href="#model-assisted-data-cleaning">Model Assisted Data Cleaning</a></li><li><a href="#full-model-training">Full Model Training</a><br>i. &#xA0; <a href="#results">Results</a></li><li><a href="#conclusion">Conclusion</a></li></ol><p>The classification problem we will be solving is <em>the classification of major species of African antelope in the wild.</em> The dataset we will create will consist of 13 African antelope species. As we will see this is an interesting challenge, as orientation, colour and very specific features of the antelope (e.g. the horns) are often necessary to distinguish each species. We will also see that it&apos;s not always necessary to have a very large dataset in order to use deep learning.</p><p>A <a href="https://github.com/avanwyk/fastai-projects/blob/master/antelope-image-classification/african_antelope_classification.ipynb">Jupyter notebook</a> and <a href="https://github.com/avanwyk/fastai-projects/blob/master/antelope-image-classification/antelope_classification.py">Python script</a> with the complete code for the example is available on <a href="https://github.com/avanwyk/fastai-projects/tree/master/antelope-image-classification">Github</a>. In order to run the notebook or script, <a href="https://www.avanwyk.com/fastai-installation/">ensure you have a FastAI environment setup.</a></p><h2 id="creating-an-image-dataset">Creating an Image Dataset</h2><p>We will start by downloading the images for our dataset. When creating your own dataset, carefully think of the use case you are building it for, and think of the type of images that are representative of the actual problem you are trying to solve.</p><p>In the case of antelope, there are a few things to consider:</p><ul><li><em>Male </em>and <em>female </em>variants of the species have significant differences.</li><li>We are interested in pictures of the animals in the <em>wild</em> as opposed to captivity.</li><li>The <em>young</em> of each species could be very different from the adult.</li><li>The <em>colour</em> of a species could be a distinguishing factor. For example, photos taken at dawn or dusk might not be appropriate.</li></ul><p>In general try and think of any <em>biases</em> or specific <em>contexts</em> present in your subject matter that might not be applicable to the problem being solved.</p><h3 id="downloading-the-data">Downloading the Data</h3><p>In order to download the actual images, we will use <a href="https://github.com/hardikvasa/google-images-download">google-images-download</a>, an open source tool that can download images from Google Images based on keyword search.</p><p>The code to download the images is as follows:</p><pre><code class="language-python">def download_antelope_images(output_path: Path, limit: int = 50) -&gt; None:
    &quot;&quot;&quot;Download images for each of the antelope to the output path.
    
    Each species is put in a separate sub-directory under output_path.
    &quot;&quot;&quot;
    response = google_images_download.googleimagesdownload()

    for antelope in ANTELOPE:
        for gender in [&apos;male&apos;, &apos;female&apos;]:
            output_directory = str(output_path/antelope).replace(&apos; &apos;, &apos;_&apos;)

            arguments = {
                &apos;keywords&apos;: f&apos;wild {antelope} {gender} -hunting -stock&apos;,
                &apos;output_directory&apos;: output_directory,
                &apos;usage_rights&apos;: &apos;labeled-for-nocommercial-reuse&apos;,
                &apos;no_directory&apos;: True,
                &apos;size&apos;: &apos;medium&apos;,
                &apos;limit&apos;: limit
            }
            response.download(arguments)</code></pre><p>The code above searches for images of each antelope species in the <code>ANTELOPE</code> list. For every species, we perform two searches: one for male examples and one for female examples. We add the keyword <code>wild</code> to look for examples of the antelope in the wild, while excluding the keywords <code>hunting</code> and <code>stock</code> to limit the search to images applicable to our use case. Also be sure to search for images with the appropriate <a href="https://support.google.com/websearch/answer/29508?hl=en">usage rights</a>.</p><p>The images are downloaded putting each species in a separate folder named for the species, thereby building an &apos;<em>Imagenet-style&apos;</em> dataset. This is compatible with <a href="https://docs.fast.ai/vision.data.html#ImageDataBunch.from_folder">FastAI&apos;s <code>ImageDataBunch.from_folder</code> helper</a> which will use to load the dataset for training.</p><p>The download was limited to 50 examples each for the male and female of each species.</p><h3 id="cleaning-the-data">Cleaning the Data</h3><p>Even though Google does a very good job of finding the correct images for keyword searches, we still have to make sure the images are appropriate for our use case.</p><p>Unfortunately this is a time consuming process which is hard to automate (more on that later). Some checks can be automated, for instance, removing duplicates based on MD5 sums, or using the file names to check for labelling errors (as I do in the <a href="https://github.com/avanwyk/fastai-projects/blob/master/antelope-image-classification/antelope_classification.py">Python script</a>). However, I still had to manually inspect the images, removing examples I considered inappropriate. These images mostly involved photos of multiple species in a single example, images of predators hunting or feasting on the antelope, man made illustrations of the antelope or antelopes in captivity.</p><p>After the data cleaning I was left with between 60 and 100 images (with an average of 85) per species. This is not a large dataset - we will however see that the deep learning model still performs very well.</p><h2 id="training-the-model">Training the Model</h2><p>With the data prepared we can now build the training and validation datasets and train our model. We will be using transfer learning to train a <a href="https://arxiv.org/abs/1512.03385">ResNet</a> model that is pre-trained on the ImageNet dataset.</p><h3 id="building-the-dataset">Building the Dataset</h3><p>FastAI makes use of <a href="https://docs.fast.ai/basic_data.html#DataBunch"><code>DataBunch</code> objects</a> to group the training, validation and test datasets. The <code>DataBunch</code> object also makes sure the <a href="https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader">Pytorch <code>DataLoader</code></a> loads to the correct device (GPU/CPU) and supports applying <a href="https://docs.fast.ai/vision.transform.html">image transforms</a> for <a href="https://arxiv.org/pdf/1712.04621.pdf">data augmentation</a>. Further, the DataBunch normalizes the data using the ImageNet statistics, which is necessary, as the model is pre-trained on the ImageNet data. The <code>ImageDataBunch</code> can be created with:</p><pre><code class="language-python">image_data = ImageDataBunch.from_folder(DATA_PATH, valid_pct=VALID_PCT,\
                                            ds_tfms=get_transforms(),\
                                            size=IMAGE_SIZE,\
                                            bs=BATCH_SIZE)\
                                            .normalize(imagenet_stats)</code></pre><p>We specify the percentage of the data to use for the validation set with <code>VALID_PCT</code> ( <code>0.2</code> or 20% in this case), the <code>IMAGE_SIZE</code> (224 for ImageNet trained models) and a <code>BATCH_SIZE</code> (32 for this example, but you can use a smaller or larger batch size, depending on how much VRAM your GPU has).</p><h3 id="creating-the-model">Creating the Model</h3><p>Creating the ResNet model is very straightforward with FastAI. We use the <a href="https://docs.fast.ai/vision.learner.html#cnn_learner"><code>cnn_learner</code> helper method</a>, specifying our <code>ImageDataBunch</code> and chosen ResNet architecture:</p><pre><code class="language-python">learn = cnn_learner(image_data, models.resnet50, metrics=[error_rate, accuracy])</code></pre><p>Here we use a <a href="https://pytorch.org/docs/stable/torchvision/models.html#torchvision.models.resnet50">pre-trained <code>resnet50</code> model from the Pytorch Torchvision library.</a> If you have a smaller GPU, a pre-trained <code>resnet34</code> works equally well.</p><p>We also specify the <code>error_rate</code> as a metric that will be calculated during training.</p><h3 id="fitting-the-model">Fitting the Model</h3><p>We are now ready to fit the model to our data. The initial training will only fine-tune the top fully-connected layers of the model; the other layer weights being frozen.</p><p>Before starting the training, we have to choose an appropriate learning rate, which is perhaps the single most important choice for effective training. FastAI provides the supremely useful <a href="https://docs.fast.ai/basic_train.html#lr_find"><code>lr_find</code> method</a> for this purpose, which is based on the technique discussed in <a href="https://arxiv.org/abs/1506.01186">Cyclical Learning Rates for Training Neural Networks</a>.</p><pre><code class="language-python">learn.lr_find()
learn.recorder.plot()</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.avanwyk.com/content/images/2019/05/fastai-stage1-lr_find-graph.png" class="kg-image" alt="African Antelope: A Case Study of Creating an Image Dataset with FastAI" loading="lazy"><figcaption>Loss vs Learning Rate graph produced by learn.recorder.plot() after lr_find().</figcaption></figure><p>We when simply choose a learning rate (or range) where the loss is strictly decreasing. It&apos;s beneficial to choose the largest learning rate that has a decreasing loss, as this will speed up training.</p><p>Having chosen a learning rate range ( <code>[1e-3, 1e-2]</code> ) , we perform 5 training epochs using the <a href="https://sgugger.github.io/the-1cycle-policy.html">1cycle learning policy</a>.</p><pre><code class="language-python">learn.fit_one_cycle(5, max_lr=slice(1e-3, 1e-2))
learn.save(&apos;stage-1&apos;)</code></pre><pre><code class="language-csv">epoch	train_loss	valid_loss	error_rate	time
0	1.352547	0.909331	0.281369	00:14
1	1.032153	0.774388	0.205323	00:13
2	0.737094	0.570336	0.178707	00:13
3	0.476649	0.451232	0.129278	00:13</code></pre><p>epochtrain_lossvalid_losserror_ratetime01.3525470.9093310.28136900:1411.0321530.7743880.20532300:1320.7370940.5703360.17870700:1330.4766490.4512320.12927800:13</p><h3 id="initial-results">Initial Results</h3><p>After the initial training we reach a validation accuracy of <code>87.07%</code>. We can use <a href="https://docs.fast.ai/vision.learner.html#ClassificationInterpretation">FastAI&apos;s <code>ClassificationInterpretation</code></a> to further interpret the model&apos;s performance:</p><pre><code class="language-python">interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.avanwyk.com/content/images/2019/05/fastai-stage1-confusion-matrix.png" class="kg-image" alt="African Antelope: A Case Study of Creating an Image Dataset with FastAI" loading="lazy"><figcaption>Confusion matrix produced after initial training of the model. Notably, the model struggles with distinguishing a Lichtenstein&apos;s Hartebeest and a Tsessebe, two antelope that are similar in appearance.</figcaption></figure><p>The interpreter also has a very useful feature that allows us to easily plot the the examples that had the largest loss values.</p><pre><code class="language-python">interp.plot_top_losses(9, figsize=(12,12))</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.avanwyk.com/content/images/2019/05/fastai-stage1-top-loss.png" class="kg-image" alt="African Antelope: A Case Study of Creating an Image Dataset with FastAI" loading="lazy"><figcaption>Top losses after the initial training.</figcaption></figure><p>On issue seems to be images of close-up views of the antelope&apos;s face or photos where the antelope is not presented in the typical broadside view. In both these cases, <em>distinguishing features</em> such as patterns on the animal&apos;s coat or it&apos;s horns might be missing from the image. This highlights a potential <strong>flaw in how we gather the data</strong>: having many examples of one perspective of the subject matter, but neglecting other, valid perspectives.</p><h2 id="model-assisted-data-cleaning">Model Assisted Data Cleaning</h2><p>The FastAI library provides an extremely useful Jupyter Notebook widget that aids in automating data clean-up by using the trained model itself: the <a href="https://docs.fast.ai/widgets.image_cleaner.html">ImageCleaner</a>.</p><p>Using an ImageDataBunch, the dataset is then indexed by which images lead to the highest losses using the trained model. The <code>ImageCleaner</code> is then instantiated from the dataset and indices:</p><pre><code class="language-python">from fastai.widgets import *

images = (ImageList.from_folder(DATA_PATH)
                   .split_none()
                   .label_from_folder()
                   .transform(custom_transforms(), size=224)
                   .databunch())

ds, idxs = DatasetFormatter().from_toplosses(learn)

ImageCleaner(ds, idxs, DATA_PATH)</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.avanwyk.com/content/images/2019/05/image-cleaner-example.png" class="kg-image" alt="African Antelope: A Case Study of Creating an Image Dataset with FastAI" loading="lazy"><figcaption>The FastAI ImageCleaner Jupyter notebook widget allowing re-labelling and removal of images.</figcaption></figure><p>The widget then allows the you to remove images from the dataset or re-label the images in the case that images are incorrectly labelled.</p><p>Importantly, the <code>ImageCleaner</code> widget does not modify the data itself but instead creates a <code>.csv</code> file that contains the paths and labels of the cleaned data. We then need to construct an <code>ImageDataBunch</code> from the <code>.csv</code> file:</p><pre><code class="language-python">df = pd.read_csv(DATA_PATH/&apos;cleaned.csv&apos;, header=&apos;infer&apos;)
image_data = (ImageDataBunch.from_df(DATA_PATH, df,
                                     valid_pct=VALID_PCT,
                                     ds_tfms=custom_transforms(),
                                     size=IMAGE_SIZE,
                                     bs=BATCH_SIZE)
              .normalize(imagenet_stats))</code></pre><h2 id="full-model-training">Full Model Training</h2><p>Next we can look at training all the layers of the model instead of just the last, fully-connected layers. This is done by &apos;unfreezing&apos; the other layers of the model before training.</p><p>We also have to find a new learning rate as the optimisation landscape has now completely changed:</p><pre><code class="language-python">learn.unfreeze()

learn.lr_find()
learn.recorder.plot()</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.avanwyk.com/content/images/2019/05/fastai-stage2-lr_find-graph-1.png" class="kg-image" alt="African Antelope: A Case Study of Creating an Image Dataset with FastAI" loading="lazy"><figcaption>Loss vs Learning Rate graph produced by learn.recorder.plot() after lr_find() for full model training.</figcaption></figure><p>Finally, we fit the model again with the 1cycle policy for 20 epochs using a small learning rate:</p><pre><code class="language-python">learn.fit_one_cycle(20, max_lr=7e-5)</code></pre><pre><code class="language-csv">epoch	train_loss	valid_loss	error_rate	time
0	0.154993	0.311427	0.127962	00:14
1	0.185968	0.302535	0.127962	00:14
2	0.167942	0.291734	0.109005	00:14
3	0.181434	0.298713	0.094787	00:14
4	0.190612	0.400196	0.118483	00:14
5	0.209943	0.414060	0.118483	00:14
6	0.226450	0.462790	0.132701	00:14
7	0.248497	0.382834	0.113744	00:14
8	0.189046	0.343103	0.113744	00:14
9	0.141687	0.378920	0.132701	00:14
10	0.133787	0.400326	0.099526	00:14
11	0.136122	0.366274	0.109005	00:14
12	0.114380	0.343331	0.094787	00:14
13	0.091698	0.364937	0.109005	00:14
14	0.083694	0.331757	0.113744	00:14
15	0.069167	0.309694	0.104265	00:14
16	0.064571	0.312528	0.094787	00:14
17	0.057514	0.316830	0.085308	00:14
18	0.060952	0.323746	0.104265	00:14
19	0.057364	0.298466	0.085308	00:14</code></pre><h3 id="results">Results</h3><p>Fitting all the layers of the neural network improves our training loss to <code>0.057</code> and our validation loss to <code>0.298</code> and increases our validation accuracy to <code>91.4692%</code>. </p><p>Similar to earlier, we can create an interpreter and visualise our top losses:</p><pre><code class="language-python">interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(9, figsize=(12,12), heatmap=True)</code></pre><p>Here we pass the parameter <code>heatmap=True</code> to the <code>plot_top_losses</code> method, which will produce <a href="http://openaccess.thecvf.com/content_ICCV_2017/papers/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.pdf">Grad-CAM (Gradient-weighted Class Activation Mapping)</a> heatmaps for the images. Grad-CAM visualisations highlight the important regions in the image used for the prediction.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.avanwyk.com/content/images/2019/05/fastai-stage2-top-loss.png" class="kg-image" alt="African Antelope: A Case Study of Creating an Image Dataset with FastAI" loading="lazy"><figcaption>Top losses after the initial training. Heatmaps show Grad-CAM visualisations of predictions.</figcaption></figure><p>The Grad-CAM visualisations show that the model does however correctly identify the regions containing the antelope and confirms that it tends to focus on regions containing the body and horns of the antelope.</p><p>Finally we can calculate our final <a href="https://en.wikipedia.org/wiki/F1_score">F1 score</a>, also making use of TTA (Test Time Augmentation). TTA applies the same augmenting transforms we used during training when making a prediction. The actual prediction is then the average of the predictions over the transformations of an example, increasing the chance the model makes the correct prediction.</p><pre><code class="language-python">preds, targets = learn.TTA()
predicted_classes = np.argmax(preds, axis=1)

f1_score(targets, predicted_classes, average=&apos;micro&apos;)

0.9004739336492891</code></pre><p>We end up with a final F1 score of <code>0.9</code>.</p><h3 id="further-improvements">Further Improvements</h3><p>There are number of things we can investigate to further improve the model performance:</p><ul><li>More data could be gathered, especially of specific edge cases the model is struggling with: front and rear views of the animals and close-ups of antelope faces.</li><li>Validate the transformations used to augment the dataset, especially colour distortion and image rotation/cropping. Some very specific features are sometimes required to distinguish one species from another and as such we have to ensure the transformations we use doesn&apos;t discard this information.</li><li>Alternative architectures should be investigated that might perform better with the specific use case.</li></ul><h2 id="conclusion">Conclusion</h2><p>In this post we covered an end-to-end example of creating our own image dataset and using transfer learning to create an accurate deep learning image classifier for African antelope species. Our ResNet50 model reached an F1 score of <code>0.9</code> after only 24 epochs of training on roughly 880 examples spread over the 13 classes.</p><p>Unsurprisingly the <a href="https://www.kdnuggets.com/2015/11/hardest-parts-data-science.html">hardest and most time consuming part of the deep learning exercise</a> was not training the model, indeed the FastAI code to do so is only 5 lines long:</p><pre><code class="language-python">image_data = ImageDataBunch.from_folder(DATA_PATH, valid_pct=VALID_PCT,\
                                            ds_tfms=get_transforms(),\
                                            size=IMAGE_SIZE,\
                                            bs=BATCH_SIZE)\
                                            .normalize(imagenet_stats)

learner = cnn_learner(image_data, architecture, metrics=error_rate)
learner.fit_one_cycle(5, max_lr=slice(1e-3, 1e-2))
learner.unfreeze()
learner.fit_one_cycle(5, 1e-4)</code></pre><p>Instead, the most difficult part is gathering and cleaning the data. Manual inspection of the data is tedious and time consuming, and still resulted in some problems slipping through.</p><p>However, we also demonstrated how to use the model itself to aid in cleaning the dataset using the <code>ImageCleaner</code> widget from the FastAI library.</p><p>Furthermore, we found that the dataset is not fully representative of the problem we are trying to solve, as the dataset is missing examples of some valid perspectives we might encounter in the real world.</p><p>There is no simple solution to creating a high quality and error free dataset (which is why open data initiatives are so valuable). However, an alternative to creating your own dataset is to find a dataset similar to a dataset you would need to solve your problem and then modifying it. In this case, we could have started with a dataset such as the <a href="https://www.nature.com/articles/sdata201526">Snapshot Serengeti dataset</a> and used only images of antelope contained therein. An exercise left for next time.</p>]]></content:encoded></item><item><title><![CDATA[FastAI Installation and Setup]]></title><description><![CDATA[Instructions for installing FastAI v1 within a freshly created Anaconda virtual environment.]]></description><link>https://www.avanwyk.com/fastai-installation/</link><guid isPermaLink="false">5bc9f48707c73307530437c8</guid><category><![CDATA[fastai]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Mon, 08 Apr 2019 20:55:00 GMT</pubDate><content:encoded><![CDATA[<p>(Updated April 2019)</p><p><strong>The most up to date installation instructions are available on <a href="https://github.com/fastai/fastai">Github</a> and the <a href="https://docs.fast.ai/install.html">docs</a><a> </a>site, I would recommend starting there. A list of <a href="https://docs.fast.ai/troubleshoot.html">common troubleshooting issues</a> is also available.</strong></p><p>I list the steps I followed for personal reference, which includes solving some minor issues I encountered in setting up a full DL environment on a GPU equipped laptop running Ubuntu 18.04.</p><p>If you are installing FastAI to do one of the deep learning courses, I recommend one of the various <a href="https://course.fast.ai/index.html#using-a-gpu">cloud solutions available</a> instead of setting up a CUDA/Anaconda environment as below.</p><p>The instructions listed below installs FastAI v1 within a freshly created Anaconda virtual environment. The instructions below assume you have <a href="https://www.anaconda.com/download/">Anaconda</a> and <a href="https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html">NVIDIA</a> <a href="https://gist.github.com/zhanwenchen/e520767a409325d9961072f666815bb8">CUDA</a> (along with an appropriate NVIDIA driver) installed.</p><p>First, ensure conda is up to date, otherwise conda might complain about <code>PackagesNotFoundError</code>s.</p><!--kg-card-begin: markdown--><pre><code class="language-bash">conda update conda
</code></pre>
<!--kg-card-end: markdown--><p>I recommend installing into a virtual environment, to prevent interference from other libraries and system packages. You can create a Python 3.6 virtual environment to install FastAI in as follows:</p><!--kg-card-begin: markdown--><pre><code class="language-bash">conda create -n fastai python=3.7 mypy pylint jupyter scikit-learn pandas
source activate fastai
</code></pre>
<!--kg-card-end: markdown--><p>Next, if you are planning on installing the GPU version, verify which CUDA you have installed:</p><!--kg-card-begin: markdown--><pre><code class="language-bash">nvcc --version # Cuda compilation tools, release 10.0, V10.0.130
</code></pre>
<!--kg-card-end: markdown--><p>You can find the corresponding conda package using:</p><!--kg-card-begin: markdown--><pre><code class="language-bash">conda search cuda* -c pytorch
</code></pre>
<!--kg-card-end: markdown--><p>Look for the <code>cudaXX</code> packages that matches your CUDA version as reported by <code>nvcc</code>.</p><p>You can now install <code>pytorch</code> and <code>fastai</code> using conda. </p><!--kg-card-begin: markdown--><pre><code class="language-bash">conda install cudatoolkit=10.0 -c pytorch -c fastai fastai
</code></pre>
<!--kg-card-end: markdown--><p><strong>A note on CUDA versions</strong>: I recommend installing the latest CUDA version supported by Pytorch if possible (10.0 at the time of writing), however, to avoid potential issues, stick with the same CUDA version you have a driver installed for.</p><p>You can verify that the CUDA installation went smoothly and that Pytorch is using your GPU using the following command:</p><!--kg-card-begin: markdown--><pre><code class="language-bash">python -c &quot;import torch; print(torch.cuda.get_device_name(torch.cuda.current_device()))&quot;
</code></pre>
<!--kg-card-end: markdown--><p>It should print the name of the device (GPU) you have attached to the machine.</p><p><strong>Note for NLP (using FastAI v1 for text):</strong> if you plan on using FastAI for NLP, I recommend also downloading the relevant language packages for <code>spacy</code>, otherwise you might hit some obscure errors when attempting to parse textual data.</p><!--kg-card-begin: markdown--><pre><code class="language-bash">python -m spacy download en
</code></pre>
<!--kg-card-end: markdown--><h2 id="cloud-environments">Cloud Environments</h2><p>A number of Cloud services have first class support for FastAI. I&apos;ve personally used <a href="https://www.paperspace.com/">https://www.paperspace.com/</a> a lot and can recommend it. There <a href="https://course.fast.ai/index.html#using-a-gpu">are a number of alternative options</a>. If you are looking for a VM based option (which gives you a little more control over your environment), I recommend <a href="https://course.fast.ai/start_gcp.html">Google Cloud Platform</a><a> </a>or <a href="https://course.fast.ai/start_azure.html">Microsoft Azure</a><a>.</a></p><h2 id="documentation">Documentation</h2><p>The FastAI v1 docs are really great, you can find them here: <a href="http://docs.fast.ai">http://docs.fast.ai</a>.</p>]]></content:encoded></item><item><title><![CDATA[CDC Mortality Prediction with FastAI for Tabular Data]]></title><description><![CDATA[This post will cover getting started with FastAI v1 at the hand of tabular data. It is aimed at people that are at least somewhat familiar with deep learning, but not necessarily with using the FastAI v1 library.]]></description><link>https://www.avanwyk.com/cdc-mortality-fastai-tabular/</link><guid isPermaLink="false">5bc88b8007c733075304361d</guid><category><![CDATA[fastai]]></category><category><![CDATA[deep learning]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Fri, 19 Oct 2018 20:57:06 GMT</pubDate><media:content url="https://www.avanwyk.com/content/images/2018/10/mika-baumeister-703680-unsplash-tabular.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.avanwyk.com/content/images/2018/10/mika-baumeister-703680-unsplash-tabular.jpg" alt="CDC Mortality Prediction with FastAI for Tabular Data"><p>The first major version of the FastAI deep learning library, <a href="http://www.fast.ai/2018/10/02/fastai-ai/">FastAI v1</a>, was recently released. For those unfamiliar with the FastAI library, it&apos;s built on top of Pytorch and aims to provide a consistent API for the major deep learning application areas: vision, text and tabular data. The library also focuses on making state of the art deep learning techniques available seamlessly to its users.</p><p>This post will cover getting started with FastAI v1 at the hand of tabular data. It is aimed at people that are at least somewhat familiar with deep learning, but not necessarily with using the FastAI v1 library. For more technical details on the deep learning techniques used, I recommend <a href="http://www.fast.ai/2018/04/29/categorical-embeddings/">this post</a> by Rachel of FastAI.</p><p>For a guide on installing FastAI v1 on your own machine, or cloud environments you may use, see this <a href="https://www.avanwyk.com/getting-started-with-fastai-v1-installation/">post</a>.</p><h2 id="training-a-model-on-tabular-data">Training a model on Tabular Data</h2><p>Tabular data (referred to as structured data in the library before v1) refers to data that typically occurs in rows and columns, such as SQL tables and CSV files. Tabular data is extremely common in the industry, and is the most common type of data used in <a href="https://www.kaggle.com/">Kaggle competitions</a>, but is somewhat neglected in other deep learning libraries. FastAI in turn provides first class API support for tabular data, as shown below.</p><p>In the example below we attempt to predict mortality using <a href="https://www.kaggle.com/cdc/mortality">CDC Mortality data</a> from Kaggle. The complete notebook which includes data pre-processing of the data is available here: <a href="https://github.com/avanwyk/fastai-projects/blob/master/cdc-mortality-tabular-prediction/cdc-mortality.ipynb">https://github.com/avanwyk/fastai-projects/blob/master/cdc-mortality-tabular-prediction/cdc-mortality.ipynb</a>.</p><h3 id="data-loading">Data loading</h3><p>The FastAI v1 tabular data API revolves around three types of variables in the dataset: categorical variables, continuous variables and the <em>dependent </em>variable.</p><!--kg-card-begin: markdown--><pre><code class="language-python">dep_var = &apos;age&apos;
categorical_names = [&apos;education&apos;, &apos;sex&apos;, &apos;marital_status&apos;]
</code></pre>
<!--kg-card-end: markdown--><p>Any variable that is not specified as a categorical variable, will be assumed to be a continuous variable.</p><p>For Tabular data, FastAI provides a special <a href="http://docs.fast.ai/tabular.data.html#TabularDataset">TabularDataset</a>. The simplest way to construct a <code>TabularDataset</code> is using the <code>tabular_data_from_df</code> helper. The helper also supports specifying a number of transforms that is applied to the dataframe before building the dataset.</p><!--kg-card-begin: markdown--><pre><code class="language-python">tfms = [FillMissing, Categorify]

tabular_data = tabular_data_from_df(&apos;output&apos;, train_df, valid_df, dep_var, tfms=tfms, cat_names=categorical_names)
</code></pre>
<!--kg-card-end: markdown--><p>The <code>FillMissing</code> transform will fill in missing values for continuous variables <em>but not the categorical or dependent variables. </em>By default is uses the median, but this can be changed to use either a constant value or the most common value.</p><p>The <code>Categorify</code> transform will change the variables in the dataframe to <a href="https://pandas.pydata.org/pandas-docs/stable/categorical.html">Pandas category variables</a> for you.</p><p>The transforms are applied to the dataframe before being passed to the dataset object.</p><p>The <code>TabularDataset</code> then does some more pre-processing for you. It automatically converts category variables (which might be text) to sequential, numeric IDs starting at 1 (0 is reserved for NaN values). Further, it automatically normalizes the continuous variables using <a href="https://en.wikipedia.org/wiki/Feature_scaling#Standardization">standardization</a>. You can also pass in statistics for each variable to overwrite the mean and standard deviation used for the normalization, otherwise they will automatically be calculated from the training set.</p><h3 id="learner-and-model">Learner and model</h3><p>With the data ready to be used by a deep learning algorithm, we can create a <a href="http://docs.fast.ai/basic_train.html#Learner">Learner</a>:</p><!--kg-card-begin: markdown--><pre><code class="language-python">learn = get_tabular_learner(tabular_data,
                            layers=[100,50,1],
                            emb_szs={&apos;education&apos;: 6,
                                     &apos;sex&apos;: 5,
                                     &apos;marital_status&apos;: 8})
learn.loss_fn = F.mse_loss
</code></pre>
<!--kg-card-end: markdown--><p>We use a helper function <code>get_tabular_learner</code> to setup the tabular data learner for us. We also have to specify an MSE loss function since we are performing a regression task.</p><p>A FastAI Learner combines a model with data, a loss function and an optimizer. It also does some other work like encapsulate the metric recorder and has API for saving and loading the model.</p><p>In our case, the helper function will build a <a href="http://docs.fast.ai/tabular.models.html#class-tabularmodel">TabularModel</a>. The model will consist of an Embedding Layer for each categorical variable (with optional sizes specified), with each layer having its own Dropout and Batchnormalization. Those results are concatenated with the continuous input variables which is then followed by Linear and ReLU layers of the specified sizes. Batchnormalization is added between each layer pair and the last layer pair only includes the Linear layer.</p><p>By default, an Adam optimizer will be used.</p><p>You can print a summary of the model using:</p><!--kg-card-begin: markdown--><pre><code class="language-python">learn.model
</code></pre>
<!--kg-card-end: markdown--><h3 id="learning-rate">Learning rate</h3><p>Before we can start training the model, we have to choose a learning rate (LR). This is where one of the FastAI library&apos;s more useful and powerful tools come in. The FastAI library has first class support for a <a href="https://arxiv.org/abs/1506.01186">technique</a> to find an appropriate learning rate with <code><a href="http://docs.fast.ai/callbacks.lr_finder.html">lr_find</a></code>.</p><!--kg-card-begin: markdown--><pre><code class="language-python">learn.lr_find()
learn.recorder.plot()
</code></pre>
<!--kg-card-end: markdown--><p>Doing the above will (after some training), produce a graph such as this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="/content/images/2018/10/lr_finder.png" class="kg-image" alt="CDC Mortality Prediction with FastAI for Tabular Data" loading="lazy"><figcaption>Result of running lr_find()</figcaption></figure><p>Another example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="/content/images/2018/10/lr_finder_ex.png" class="kg-image" alt="CDC Mortality Prediction with FastAI for Tabular Data" loading="lazy"><figcaption>Another example of plotting the loss from lr_find()</figcaption></figure><p>An appropriate LR can then be selected <em>by choosing a value that is an order of magnitude lower than the minimum. </em>This learning rate will still be aggressive enough to ensure quick training, but is reasonably safe from exploding. For more details on the technique, see <a href="https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html">here</a> and <a href="http://docs.fast.ai/callbacks.lr_finder.html">here</a>.</p><h3 id="training">Training</h3><p>We are now ready to train the model:</p><!--kg-card-begin: markdown--><pre><code class="language-python">lr = 1e-1
learn.fit_one_cycle(1, lr)
</code></pre>
<!--kg-card-end: markdown--><p>The <code>fit_one_cycle</code> call fits the model for the specified number of epochs using the <a href="http://docs.fast.ai/callbacks.one_cycle.html#OneCycleScheduler">OneCycleScheduler</a> callback. The callback automatically applies a two phase learning rate schedule, first increasing the learning rate to <code>lr_max</code> (which is the learning rate we specify) and then annealing to 0 in the second phase.</p><p>Loss and metrics are recorded by the <a href="http://docs.fast.ai/basic_train.html#class-recorder">Recorder</a> callback and are accessible through <code>learn.recorder</code>. For example, to plot the training loss you can use:</p><!--kg-card-begin: markdown--><pre><code class="language-python">learn.recorder.plot_losses()
</code></pre>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card kg-card-hascaption"><img src="/content/images/2018/10/training_loss.png" class="kg-image" alt="CDC Mortality Prediction with FastAI for Tabular Data" loading="lazy"><figcaption>Training Loss</figcaption></figure><h2 id="initial-thoughts-on-fastai-v1">Initial thoughts on FastAI v1</h2><p>The FastAI v1 experience has so far been really great. The pre-v1 releases were usable, but definitely lacked some polish (particularly the documentation). The new <a href="http://docs.fast.ai">documentation site</a> is great, and thoroughly explains a lot of the API.</p><p>The API itself is incredibly terse and you can do a lot with very few lines of code. I look forward to diving deeper into the API and exploring its flexibility. Another great thing about the API is the consistent use of Python <a href="https://www.python.org/dev/peps/pep-0484/">Type Hints</a> which makes it much easier to deduce what the API expects or does while working in notebook environments, in addition to catching obvious errors.</p><h2 id="references">References</h2><p>The documentation that was released with FastAI v1 is really great, you can check it out here: <a href="http://docs.fast.ai/">http://docs.fast.ai/</a></p><p>Then I also have to mention the really great <a href="https://forums.fast.ai/">FastAI forums</a>, its very possibly the best deep learning forums in existence.</p><p>Lastly, if you haven&apos;t done so already, the <a href="http://course.fast.ai/">FastAI course</a> is strongly recommended. A new version of the course based on v1 of the library will launch in early 2019.</p>]]></content:encoded></item><item><title><![CDATA[How to Read a Paper]]></title><description><![CDATA[<!--kg-card-begin: markdown--><p>A few years ago I came across a method for reading academic papers which I&apos;ve kept coming back to as a reliable systematic approach to efficiently read important papers of varying complexity.</p>
<p>The method itself comes from a <a href="http://ccr.sigcomm.org/online/files/p83-keshavA.pdf">paper</a> by Prof. Srinivasan Keshav, an ACM Fellow and researcher</p>]]></description><link>https://www.avanwyk.com/how-to-read-a-paper/</link><guid isPermaLink="false">5b3cec22aeb9e4517f1355fc</guid><category><![CDATA[research]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Thu, 05 Jul 2018 17:59:15 GMT</pubDate><media:content url="https://www.avanwyk.com/content/images/2018/08/james-sutton-201910-unsplash_tiny.jpg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><img src="https://www.avanwyk.com/content/images/2018/08/james-sutton-201910-unsplash_tiny.jpg" alt="How to Read a Paper"><p>A few years ago I came across a method for reading academic papers which I&apos;ve kept coming back to as a reliable systematic approach to efficiently read important papers of varying complexity.</p>
<p>The method itself comes from a <a href="http://ccr.sigcomm.org/online/files/p83-keshavA.pdf">paper</a> by Prof. Srinivasan Keshav, an ACM Fellow and researcher at the University of Waterloo. I recommend reading his paper, but I summarise the system here.</p>
<h2 id="thethreepasssystem">The Three Pass System</h2>
<p>The system uses a top down three pass approach with each pass delving deeper into the details of the paper. Each pass has a specific goal. Depending on what you need to obtain from the paper, completing all three passes may not be necessary.</p>
<h3 id="thefirstpass">The First Pass</h3>
<p>The goal of the first pass is to get a high level overview of the paper:</p>
<ul>
<li>Read the <strong>title</strong>, <strong>abstract</strong>, <strong>introduction</strong>, <strong>section and subsection <em>headings</em></strong> and the <strong>conclusion</strong>.</li>
<li><strong>Glance</strong> at the <strong>references</strong>, noting whether you might have read any of them.</li>
</ul>
<p>After the first pass you should be able to <em>categorize</em> the paper, understand the paper&apos;s <em>context</em>, validate the basic assumptions for <em>correctness</em>, note the main <em>contributions</em> and be able to determine the paper&apos;s <em>clarity</em>.</p>
<p>The first pass is sufficient to determine whether you are interested in the paper, whether it is relevant to your research area and whether there are any questionable assumptions made which may deter your interest.</p>
<p>Also note, if you are writing a paper, a first pass is perhaps all a reviewer will give you. Pay special attention to the parts mentioned above. Strive to be clear and concise in your headings, introduction, conclusion and abstract.</p>
<h3 id="thesecondpass">The Second Pass</h3>
<p>With the second pass the goal is to understand the content of the paper to the point where you could explain it to someone else:</p>
<ul>
<li><strong>Carefully read the paper</strong>, but ignore details such as proofs or very technical details.</li>
<li><strong>Make comments and notes</strong> on important points.</li>
<li><strong>Study any figures or graphs</strong>, note details such as the axes, labeled points and whether statistical variance is indicated etc.</li>
<li>Note all <strong>unread references</strong> for further reading.</li>
</ul>
<p>Doing a second pass is appropriate for papers that you are interested in, but aren&apos;t necessarily directly related to your work. After the second pass you may or may not understand the paper. If it is critical to understand the work, or you are reviewing the paper, move on to the third pass.</p>
<h3 id="thethirdpass">The Third Pass</h3>
<p>The idea of the third pass is to understand the paper with such detail such that you could <em>re-implement</em> the paper.</p>
<ul>
<li>Read the paper with <strong>great attention to detail</strong>, identifying and <strong>challenging every assumption</strong>.</li>
<li>Given the same assumptions, think about how you would <strong>reproduce and present the result</strong>.</li>
<li>If novel techniques or methods are used, make sure you understand them to the <strong>degree where you could use them yourself</strong>.</li>
</ul>
<p>Comparing your idea of implementing the paper with the actual paper will highlight areas where the paper excels or falls short. After the final pass you should be able to reconstruct the structure of the paper from memory, be familiar with the techniques used and identify implicit assumptions and missing references.</p>
<h2 id="conclusion">Conclusion</h2>
<p>For more detail on the system and its motivations and related work, please read Prof. Keshav&apos;s <a href="http://ccr.sigcomm.org/online/files/p83-keshavA.pdf">paper</a>. It also includes a step based approach for doing a literature survey.</p>
<h3 id="references">References</h3>
<ol>
<li>Keshav, S., 2007. How to read a paper. ACM SIGCOMM Computer Communication Review, 37(3), pp.83-84.</li>
</ol>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[An Overview of LightGBM]]></title><description><![CDATA[This post gives an overview of LightGBM and aims to serve as a practical reference. A brief introduction to gradient boosting is given, followed by a look at the LightGBM API and algorithm parameters.]]></description><link>https://www.avanwyk.com/an-overview-of-lightgbm/</link><guid isPermaLink="false">5aeee670c7e1df0928681e91</guid><category><![CDATA[data science]]></category><category><![CDATA[machine learning]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Wed, 16 May 2018 21:40:00 GMT</pubDate><media:content url="https://www.avanwyk.com/content/images/2018/08/alberto-tondo-328006-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><h2 id="contents">Contents</h2>
<ol>
<li><a href="#lightgbm">LightGBM Introduction</a></li>
<li><a href="#gradientboosting">Gradient Boosting</a><br>
i. <a href="#algorithm">Algorithm</a></li>
<li><a href="#lightgbmapi">LightGBM API</a><br>
i. <a href="#plotting">Plotting</a><br>
ii. <a href="#savingthemodel">Saving the model</a></li>
<li><a href="#lightgbmparameters">LightGBM Parameters</a><br>
i. <a href="#treeparameters">Tree parameters</a><br>
ii. <a href="#tuningforimbalanceddata">Tuning for imbalanced data</a><br>
iii. <a href="#tuningforoverfitting">Tuning for overfitting</a><br>
iv. <a href="#tuningforaccuracy">Tuning for accuracy</a></li>
<li><a href="#resources">Resources</a></li>
</ol>
<img src="https://www.avanwyk.com/content/images/2018/08/alberto-tondo-328006-unsplash.jpg" alt="An Overview of LightGBM"><p>Although maybe not as fashionable as deep learning algorithms in 2018, the effectiveness of tree and tree ensemble based learning methods certainly cannot be questioned. Across a variety of domains (<a href="https://www.kaggle.com/pureheart/1st-place-lgb-model-public-0-470-private-0-502?scriptVersionId=2372967/comments">restaurant visitor forecasting</a>, <a href="https://www.kaggle.com/c/kkbox-music-recommendation-challenge/discussion/45942">music recommendation</a>, <a href="https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/44629">safe driver prediction</a>, and <a href="https://github.com/dmlc/xgboost/blob/master/demo/README.md#machine-learning-challenge-winning-solutions">many more</a>), ensemble tree models - specifically gradient boosted trees - are widely used on Kaggle, often as part of the winning solution.</p>
<p>Decision trees also have certain advantages over deep learning methods: decision trees are more readily interpreted than deep neural networks, naturally better at learning from imbalanced data, often much faster to train, and work directly with un-encoded feature data (such as text).</p>
<p>This post gives an overview of LightGBM and aims to serve as a practical reference. A brief introduction to gradient boosting is given, followed by a look at the LightGBM API and algorithm parameters. The examples given in this post are taken from an end-to-end practical example of applying LightGBM to the problem of <a href="https://www.kaggle.com/mlg-ulb/creditcardfraud">credit card fraud detection</a>: <a href="https://www.kaggle.com/avanwyk/a-lightgbm-overview">https://www.kaggle.com/avanwyk/a-lightgbm-overview</a>.</p>
<h2 id="lightgbm">LightGBM</h2>
<p><a href="https://github.com/Microsoft/LightGBM">LightGBM</a> is an open-source framework for gradient boosted machines. By default LightGBM will train a Gradient Boosted Decision Tree (GBDT), but it also supports random forests, <a href="https://arxiv.org/abs/1505.01866">Dropouts meet Multiple Additive Regression Trees (DART)</a>, and <a href="https://papers.nips.cc/paper/6579-gradient-based-sampling-an-adaptive-importance-sampling-for-least-squares.pdf">Gradient Based One-Side Sampling (Goss)</a>.</p>
<p>The framework is fast and was designed for distributed training. It supports large-scale datasets and training on the GPU. In many cases LightGBM has been found to be more accurate and faster than XGBoost, though this is <a href="https://towardsdatascience.com/catboost-vs-light-gbm-vs-xgboost-5f93620723db">problem dependent</a>.</p>
<p>Both LightGBM and <a href="https://github.com/dmlc/xgboost">XGBoost</a> are widely used and provide highly optimized, scalable and fast implementations of gradient boosted machines (GBMs). I have previously used XGBoost for a number of applications, but have yet to take an in depth look at LightGBM.</p>
<p>The section below gives some theoretical background on gradient boosting. The section <a href="#LightGBM-API">LightGBM API</a> continues with practicalities on using the LightGBM.</p>
<h2 id="gradientboosting">Gradient Boosting</h2>
<p>When considering ensemble learning, there are two primary methods: <em>bagging</em> and <em>boosting</em>. Bagging involves the training of many independent models and combines their predictions through some form of aggregation (averaging, voting etc.). An example of a bagging ensemble is a <a href="https://en.wikipedia.org/wiki/Random_forest">Random Forest</a>.</p>
<p><a href="https://en.wikipedia.org/wiki/Boosting_(machine_learning)">Boosting</a> instead trains models <em>sequentially</em>, where each model learns from the errors of the previous model. Starting with a weak base model, models are trained iteratively, each adding to the prediction of the previous model to produce a strong overall prediction.</p>
<p>In the case of gradient boosted decision trees, successive models are found by applying <a href="https://en.wikipedia.org/wiki/Gradient_descent">gradient descent</a> in the direction of the average gradient, calculated with respect to the error residuals of the loss function, of the leaf nodes of previous models.</p>
<p>An <a href="http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/">excellent explanation</a> of gradient boosting is given by <a href="https://www.linkedin.com/in/ben-gorman-4a6b3650">Ben Gorman</a> over on the Kaggle Blog and I strongly advise reading the <a href="http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/">post</a> if you would like to understand gradient boosting. A summary is given here.</p>
<p>Considering decision trees, we proceed as follows. We start with an initial fit, \(F_0\), of our data: a constant value that minimizes our loss function \(L\):<br>
$$ F_0(x) = \underset{\gamma}{arg\ \min} \sum^{n}_{i=1} L(y_i, \gamma) $$<br>
in the case of optimizing the mean square error, we can take the mean of the target values:<br>
$$ F_0(x) = \frac{1}{n} \sum^{n}_{i=1} y_i $$</p>
<p>With our initial guess of \(F_0\), we can now calculate the gradient, or <em>pseudo</em> residuals, of \(L\) with respect to \(F_0\):<br>
$$ r_{i1} = -\frac{\partial L(y_i, F_{0}(x_i))}{\partial F_{0}(x_i)} $$</p>
<p>We now fit a decision tree \(h_{1}(x)\), to the residuals. Using a regression tree, this will yield the <strong>average gradient</strong> for each of the leaf nodes.</p>
<p>Now we can apply gradient descent to minimize the loss for each leaf by stepping in the direction of the <strong>average gradient</strong> for the leaf nodes as contained in our decision tree \(h_{1}(x)\). The step size is determined by a multiplier \(\gamma_{1}\) which can be optimized by performing a <a href="https://en.wikipedia.org/wiki/Line_search">line search</a>. The step size is further shrinked using a learning rate \(\lambda_{1}\), thus yielding a new boosted fit of the data:<br>
$$ F_{1}(x) = F_{0}(x) + \lambda_1 \gamma_1 h_1(x) $$</p>
<h3 id="algorithm">Algorithm</h3>
<p>Putting it all together, we have the following algorithm. For a number of boosting rounds \(M\) and a differentiable loss function \(L\):</p>
<p>Let \( F_0(x) = \underset{\gamma}{arg\ \min} \sum^{n}_{i=1} L(y_i, \gamma) \)<br>
For m = 1 to M:</p>
<ol>
<li>Calculate the <em>pseudo</em> residuals \( r_{im} = -\frac{\partial L(y_i, F_{m-1}(x_i))}{\partial F_{m-1}(x_i)} \)</li>
<li>Fit decision tree \( h_m(x) \) to \( r_{im} \)</li>
<li>Compute the step multiplier \( \gamma_m \) for each leaf of \( h_m(x) \)</li>
<li>Let \( F_m(x) = F_{m-1}(x) + \lambda_m \gamma_m h_m(x) \), where \( \lambda_m \) is the learning rate for iteration \(m\)</li>
</ol>
<p>One caveat of the above explanation is that it neglects to incorporate a regularization term in the loss function. An overview of the gradient boosting <a href="http://xgboost.readthedocs.io/en/latest/model.html">as given in the XGBoost documentation</a> pays special attention to the regularization term while deriving the objective function.</p>
<p>In terms of LightGBM specifically, a detailed overview of the LightGBM algorithm and its innovations is given in the NIPS <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/11/lightgbm.pdf">paper</a>.</p>
<h2 id="lightgbmapi">LightGBM API</h2>
<p>Fortunately the details of the gradient boosting algorithm are well abstracted by LightGBM, and using the library is very straightforward.</p>
<p>LightGBM requires you to wrap datasets in a LightGBM <a href="https://lightgbm.readthedocs.io/en/latest/Python-API.html#lightgbm.Dataset">Dataset</a> object:</p>
<pre><code class="language-python">lgb_train = lgb.Dataset(X_train, y_train, free_raw_data=False)
lgb_val = lgb.Dataset(X_val, y_val, reference=lgb_train, free_raw_data=False)
</code></pre>
<p>The parameter <code>free_raw_data</code> controls whether the input data is freed after constructing the inner datasets.</p>
<p>LightGBM supports many parameters that control various aspects of the algorithm (more on that below). Some core parameters that should be defined are:</p>
<pre><code class="language-python">core_params = {
    &apos;boosting_type&apos;: &apos;gbdt&apos;, # rf, dart, goss
    &apos;objective&apos;: &apos;binary&apos;, # regression, multiclass, binary
    &apos;learning_rate&apos;: 0.05,
    &apos;num_leaves&apos;: 31,
    &apos;nthread&apos;: 4,
    &apos;metric&apos;: &apos;auc&apos; # binary_logloss, mse, mae
}
</code></pre>
<p>We can then call the <a href="https://lightgbm.readthedocs.io/en/latest/Python-API.html#lightgbm.train">training API</a> to train a model, specifying the number of boosting rounds and early stopping rounds as needed:</p>
<pre><code class="language-python">evals_result = {}
gbm = lgb.train(core_params, # parameter dict to use
                training_set,
                init_model=init_gbm, # enables continuous training.
                num_boost_round=boost_rounds, # number of boosting rounds.
                early_stopping_rounds=early_stopping_rounds,
                valid_sets=validation_set,
                evals_result=evals_result, # stores validation results.
                verbose_eval=False) # print evaluations during training.
</code></pre>
<p>Early stopping occurs when there is no improvement in either the objective evaluations or the metrics we defined as calculated on the validation data.</p>
<p>LightGBM also supports continuous training of a model through the <code>init_model</code> parameter, which can accept an already trained model.</p>
<p>A detailed overview of the Python API is available <a href="https://lightgbm.readthedocs.io/en/latest/Python-API.html">here</a>.</p>
<h3 id="plotting">Plotting</h3>
<p>LightGBM has a built in plotting API which is useful for quickly plotting validation results and tree related figures.</p>
<p>Given the <code>eval_result</code> dictionary from training, we can easily plot validation metrics:</p>
<pre><code class="language-python">_ = lgb.plot_metric(evals)
</code></pre>
<p><img src="/content/images/2018/05/training-1.png" alt="An Overview of LightGBM" loading="lazy"></p>
<p>Another very useful features that contributes to the explainability of the tree is relative feature importance:</p>
<pre><code class="language-python">_ = lgb.plot_importance(model)
</code></pre>
<p><img src="/content/images/2018/05/feature-1.png" alt="An Overview of LightGBM" loading="lazy"></p>
<p>It is also possible to visualize individual trees:</p>
<pre><code class="language-python">_ = lgb.plot_tree(model, figsize=(20, 20))
</code></pre>
<p><img src="/content/images/2018/05/tree-1.png" alt="An Overview of LightGBM" loading="lazy"></p>
<h3 id="savingthemodel">Saving the model</h3>
<p>Models can easily be saved to a file or JSON:</p>
<pre><code class="language-python">gbm.save_model(&apos;cc_fraud_model.txt&apos;)

loaded_model = lgb.Booster(model_file=&apos;cc_fraud_model.txt&apos;)

# Output to JSON
model_json = gbm.dump_model()
</code></pre>
<h2 id="lightgbmparameters">LightGBM Parameters</h2>
<p>A list of more advanced parameters for controlling the training of a GBDT is given below with a brief explanation of their effect on the algorithm.</p>
<pre><code class="language-python">advanced_params = {
    &apos;boosting_type&apos;: &apos;gbdt&apos;,
    &apos;objective&apos;: &apos;binary&apos;,
    &apos;metric&apos;: &apos;auc&apos;,
    
    &apos;learning_rate&apos;: 0.01,
    &apos;num_leaves&apos;: 41, # more increases accuracy, but may lead to overfitting.
    
    &apos;max_depth&apos;: 5, # shallower trees reduce overfitting.
    &apos;min_split_gain&apos;: 0, # minimal loss gain to perform a split.
    &apos;min_child_samples&apos;: 21, # specifies the minimum samples per leaf node.
    &apos;min_child_weight&apos;: 5, # minimal sum hessian in one leaf.
    
    &apos;lambda_l1&apos;: 0.5, # L1 regularization.
    &apos;lambda_l2&apos;: 0.5, # L2 regularization.
    
    # LightGBM can subsample the data for training (improves speed):
    &apos;feature_fraction&apos;: 0.5, # randomly select a fraction of the features.
    &apos;bagging_fraction&apos;: 0.5, # randomly bag or subsample training data.
    &apos;bagging_freq&apos;: 0, # perform bagging every Kth iteration, disabled if 0.
    
    &apos;scale_pos_weight&apos;: 99, # add a weight to the positive class examples.
    # this can account for highly skewed data.
    
    &apos;subsample_for_bin&apos;: 200000, # sample size to determine histogram bins.
    &apos;max_bin&apos;: 1000, # maximum number of bins to bucket feature values in.
    
    &apos;nthread&apos;: 4, # best set to number of actual cores.
}
</code></pre>
<h3 id="treeparameters">Tree parameters</h3>
<p>Both LightGBM and XGBoost build their trees <a href="https://github.com/Microsoft/LightGBM/blob/master/docs/Features.rst#leaf-wise-best-first-tree-growth">leaf-wise</a>.<br>
<img src="/content/images/2018/05/leaf-wise-1.png" alt="An Overview of LightGBM" loading="lazy"></p>
<p>Building the tree leaf-wise results in faster convergence, but may lead to overfitting if the parameters are not tuned accordingly. Important parameters for controlling the tree building are:</p>
<ul>
<li><code>num_leaves</code>: the number of leaf nodes to use. Having a large number of leaves will improve accuracy, but will also lead to overfitting.</li>
<li><code>min_child_samples</code>: the minimum number of samples (data) to group into a leaf. The parameter can greatly assist with overfitting: larger sample sizes per leaf will reduce overfitting (but may lead to under-fitting).</li>
<li><code>max_depth</code>: controls the depth of the tree explicitly. Shallower trees reduce overfitting.</li>
</ul>
<h3 id="tuningforimbalanceddata">Tuning for imbalanced data</h3>
<p>The simplest way to account for imbalanced or skewed data is to add a weight to the positive class examples:</p>
<ul>
<li><code>scale_pos_weight</code>: the weight can be calculated based on the number of negative and positive examples: <code>sample_pos_weight = number of negative samples / number of positive samples</code>.</li>
</ul>
<h3 id="tuningforoverfitting">Tuning for overfitting</h3>
<p>In addition to the parameters mentioned above the following parameters can be used to control overfitting:</p>
<ul>
<li><code>max_bin</code>: the maximum numbers bins that feature values are bucketed in. A smaller <code>max_bin</code> reduces overfitting.</li>
<li><code>min_child_weight</code>: the minimum sum hessian for a leaf. In conjuction with <code>min_child_samples</code>, larger values reduce overfitting.</li>
<li><code>bagging_fraction</code> and <code>bagging_freq</code>: enables bagging (subsampling) of the training data. Both values need to be set for bagging to be used. The frequency controls how often (iteration) bagging is used. Smaller fractions and frequencies reduce overfitting.</li>
<li><code>feature_fraction</code>: controls the subsampling of features used for training (as opposed to subsampling the actual training data in the case of bagging). Smaller fractions reduce overfitting.</li>
<li><code>lambda_l1</code> and <code>lambda_l2</code>: controls L1 and L2  regularization.</li>
</ul>
<h3 id="tuningforaccuracy">Tuning for accuracy</h3>
<p>Accuracy may be improved by tuning the following parameters:</p>
<ul>
<li><code>max_bin</code>: a larger <code>max_bin</code> increases accuracy.</li>
<li><code>learning_rate</code>: using a smaller learning rate and increasing the number of iterations may improve accuracy.</li>
<li><code>num_leaves</code>: increasing the number of leaves increases accuracy with a high risk of overfitting.</li>
</ul>
<p>A great overview of both XGBoost and LightGBM parameters, their effect on various aspects of the algorithms and how they relate to each other is available <a href="https://sites.google.com/view/lauraepp/parameters">here</a>.</p>
<h2 id="resources">Resources</h2>
<ol>
<li>LightGBM project: <a href="https://github.com/Microsoft/LightGBM">https://github.com/Microsoft/LightGBM</a></li>
<li>LightGBM paper: <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/11/lightgbm.pdf">https://www.microsoft.com/en-us/research/wp-content/uploads/2017/11/lightgbm.pdf</a></li>
<li>Documentation: <a href="https://lightgbm.readthedocs.io/en/latest/index.html">https://lightgbm.readthedocs.io/en/latest/index.html</a></li>
<li>Parameters: <a href="https://lightgbm.readthedocs.io/en/latest/Parameters.html">https://lightgbm.readthedocs.io/en/latest/Parameters.html</a></li>
<li>Parameter explorer: <a href="https://sites.google.com/view/lauraepp/parameters">https://sites.google.com/view/lauraepp/parameters</a></li>
</ol>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Encoding Cyclical Features for Deep Learning]]></title><description><![CDATA[A key concern when dealing with cyclical features is how we can encode the values such that it is clear to the deep learning algorithm that the features occur in cycles.

This post looks at a strategy to encode cyclical features in order to clearly express their cyclical nature.]]></description><link>https://www.avanwyk.com/encoding-cyclical-features-for-deep-learning/</link><guid isPermaLink="false">5ad0d1b950937206f0fc69bd</guid><category><![CDATA[deep learning]]></category><category><![CDATA[data science]]></category><category><![CDATA[feature engineering]]></category><dc:creator><![CDATA[Andrich van Wyk]]></dc:creator><pubDate>Fri, 13 Apr 2018 15:50:24 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><h4 id="anupdatedversionofthisarticleisavailableathttpsmlengineersubstackcompencodingcyclicalfeaturesfordeeplearning"><em>An updated version of this article is available at <a href="https://mlengineer.substack.com/p/encoding-cyclical-features-for-deep-learning">https://mlengineer.substack.com/p/encoding-cyclical-features-for-deep-learning</a></em></h4>
<p>Many features commonly found in datasets are cyclical in nature. The most common of which are time attributes: months, days, weekdays, hours, minutes and seconds all occur in specific cycles. Other examples might include features such as seasonal, tidal or astrological data.</p>
<p>A key concern when dealing with cyclical features is how we can encode the values such that it is clear to the deep learning algorithm that the features occur in cycles. This is of particular concern in deep learning applications as it may have a significant effect on the convergence rate of the algorithm.</p>
<p>This post looks at a strategy to encode cyclical features in order to clearly express their cyclical nature.</p>
<p>A complete example of using the encoding on weather data, which includes illustrating the effect on a three layer deep neural network, is available as a <a href="https://www.kaggle.com/avanwyk/encoding-cyclical-features-for-deep-learning">Kaggle Kernel</a>.</p>
<h2 id="theproblemwithcyclicaldata">The Problem with Cyclical Data</h2>
<p>The data used below is hourly weather data for the city of Montreal. A complete description of the data is available <a href="https://www.kaggle.com/avanwyk/encoding-cyclical-features-for-deep-learning/data">here</a>. We will be looking at the <code>hour</code> attribute of the datetime feature to illustrate the problem with cyclical features.</p>
<pre><code data-language="python">
data[&apos;hour&apos;] = data.datetime.dt.hour
sample = data[:168] # the first week of the data
ax = sample[&apos;hour&apos;].plot()
</code></pre>
<p><img src="https://www.avanwyk.com/content/images/2018/04/hour-unencoded.png" alt="hour-unencoded" loading="lazy"></p>
<p>Here we can see exactly what we would expect from an hour value for a week: a cycle between 0 and 24, repeating 7 times.</p>
<p>This graph illustrates the problem with presenting cyclical data to a deep learning algorithm: there are jump discontinuities in the graph at the end of each day when the hour value overflows to 0.</p>
<p>From 22:00 to 23:00 one hour has passed, which is adequately represented by the current unencoded values: the absolute difference between 22 and 23 is 1. However, when considering 23:00 and 00:00, the jump discontinuity occurs, and even though the difference is one hour, with the unencoded feature, the absolute difference in the feature is of course 23.</p>
<p>The same will occur for seconds at the end of each minute, for days at the end of each year and so forth.</p>
<h2 id="encodingcyclicalfeatures">Encoding Cyclical Features</h2>
<p>One method for encoding a cyclical feature is to perform a sine and cosine transformation of the feature:<br>
$$x_{sin} = \sin{(\frac{2 * \pi * x}{\max(x)})}$$<br>
$$x_{cos} = \cos{(\frac{2 * \pi * x}{\max(x)})}$$</p>
<pre><code data-language="python">
data[&apos;hour_sin&apos;] = np.sin(2 * np.pi * data[&apos;hour&apos;]/24.0)
data[&apos;hour_cos&apos;] = np.cos(2 * np.pi * data[&apos;hour&apos;]/24.0)
</code></pre>
<p>Plotting this feature we now end up with a new feature that is cyclical, based on the sine graph:</p>
<pre><code data-language="python">
sample = data[:168]
ax = sample[&apos;hour_sin&apos;].plot()
</code></pre>
<p><img src="https://www.avanwyk.com/content/images/2018/04/hour-encoded-sin.png" alt="hour-encoded-sin" loading="lazy"></p>
<p>If we only use the sine encoding we would still have an issue, as two separate timestamps will have the same sine encoding within one cycle (24 hours in our case), as the graph is symmetrical around the turning points. This is why we also perform the cosine transformation, which is phase offset from sine, and leads to unique values within a cycle in two dimensions.</p>
<p>Indeed, if we plot the feature in two dimensions, we end up a perfect cycle:</p>
<pre><code data-language="python">
ax = sample.plot.scatter(&apos;hour_sin&apos;, &apos;hour_cos&apos;).set_aspect(&apos;equal&apos;)
</code></pre>
<p><img src="https://www.avanwyk.com/content/images/2018/04/hour-encoded-two-dims.png" alt="hour-encoded-two-dims" loading="lazy"><br>
The features can now be used by our deep learning algorithm. As an added benefit, it is also scaled to the range [-1, 1] which will also aid our neural network. A comparison of the effect of the encoding on a simple deep learning model is given in the <a href="https://www.kaggle.com/avanwyk/encoding-cyclical-features-for-deep-learning#Learning-from-Encoded-Data">Kaggle Kernel</a>.</p>
<h2 id="summary">Summary</h2>
<p>Other machine learning algorithms might be more robust towards raw cyclical features, particularly tree-based approaches. However, deep neural networks stand to benefit from the sine and cosine transformation of such features, particularly in terms of aiding the convergence speed of the network.</p>
<h2 id="furtherreading">Further Reading</h2>
<ol>
<li><a href="https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time">https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time</a></li>
<li><a href="https://datascience.stackexchange.com/questions/5990/what-is-a-good-way-to-transform-cyclic-ordinal-attributes">https://datascience.stackexchange.com/questions/5990/what-is-a-good-way-to-transform-cyclic-ordinal-attributes</a></li>
<li><a href="https://stats.stackexchange.com/questions/126230/optimal-construction-of-day-feature-in-neural-networks">https://stats.stackexchange.com/questions/126230/optimal-construction-of-day-feature-in-neural-networks</a></li>
</ol>
<!--kg-card-end: markdown-->]]></content:encoded></item></channel></rss>