Travel <code> Music

A place where I share my experiences with Travel, Programming and Music

Explanation of ExaFMM Learning Codes.

In this file I will write descriptions of the exafmm ‘learning’ codes and my understanding of them. I have been tasked with understanding the code and porting it to Ruby, my favorite language. We shall start from the first tutorial, i.e. 0_tree. You can find the full Ruby code here.

  • 0_tree
    • step1.cxx
    • step2.cxx
    • step3.cxx
    • step04.cxx
  • 1_traversal
    • step1.cxx
    • step2.cxx
  • 2_kernels
    • kernel.h
    • vector.h
    • exafmm.h
    • exafmm2d.h and step1.cxx
    • step2.cxx

0_tree

step1.cxx

This program simply populates some bodies with random numbers, creates a hypothetical X and Y axes and figures out the quadrant of each of the bodies.

Each of the nodes of the tree have a maximum of 4 immediate children. We first initialize 100 struct Body objects, and then set the X and Y co-ordinates of each of them to a random number between 0 and 1.

In order to actually build the tree we follow the following steps:

  • First get the bounds between which the random numbers lie. That is, we figure out the min and max random number that is present in the bodies.
  • We then get a ‘center’ and a ‘radius’. This is useful for creating ‘quadrants’ and partitioning points into different quandrants in later steps. The center is calculated by adding the min and max numbers (which we treat as the diameter) and dividing by 2. This step is necessary since there is no ‘square’ space that can be partitioned into multiple spaces like there was in the lecture series. The way of calculating the radius r0 is a little peculiar. It does not use the distance formula, its main purpose is….
  • And then simply count the bodies in each quadrant and display them.

Ruby code: The body is represented as the Ruby class Body:

1
2
3
4
5
6
7
class Body
  attr_reader :x

  def initialize
    @x = [0.0, 0.0]
  end
end

There is an interesting way of knowing the quadrant in this code. It goes like this:

1
2
3
a = body.x[0] > x0[0] ? 1 : 0
b = body.x[1] > x0[1] ? 1 : 0
quadrant = a + (b << 1)

Above code basically plays with 0 and 1 and returns a number between 0 and 3 as the correct quadrant number.

step2.cxx

This code basically takes the bodies created in the previous step, counts the number of bodies in each quadrant and sorts them by quadrant.

The new steps introduced in this program can be summarized as follows:

  • Count the bodies in each quadrant and store the count in an array. The size array in case of the Ruby implementation.
  • In the next step we successively add the number of elements in each quadrant so that it gives us the offset value at which elements from a new quadrant will start in the bodies Array (of course, after it is sorted).
  • We then sort the bodies according to the quadrant that they belong to. Something peculiar that I notice about this part is that counter[quadrant] also gets incremented after each iteration for sorting. Why is this the case even though the counters have been set to the correct offsets previously?

step3.cxx

This program introduces a new method called buildTree, inside of which we will actually build the tree. It removes some of the sorting logic from main and puts it inside buildTree. The buildTree function performs the following functions:

  • Most of the functions relating to sorting etc are same. Only difference is that there is in-place sorting of the bodies array and the buffer array does not store elements anymore.
  • A new function introduced is that we re-calculate the center and the radius based on sorted co-ordinates. This is done because we want new center and radii for the children.
  • The buildTree function is called recursively such that the quadrants are divided until a point is reached where the inner most quadrant in the hierarchy does not contain more than 4 elements.

Implementation:

There is an interesting piece of code in the part for calculating new center and radius:

1
2
 # i is quadrant number
center[d] = x0[d] + radius * (((i & 1 << d) >> d) * 2 - 1)

In the above code, there is some bit shifting and interleaving taking place whose prime purpose is to split the quadrant number into X and Y dimension and then using this to calculate the center of the child cell.

Another piece of code is this:

1
2
3
4
5
6
7
8
9
10
11
12
counter = Array.new 4, start
1.upto(3) do |i|
  counter[i] = size[i-1] + counter[i-1]
end

# sort bodies and store them in buffer
buffer = bodies.dup
start.upto(finish-1) do |n|
  quadrant = quadrant_of x0, buffer[n]
  bodies[counter[quadrant]] = buffer[n]
  counter[quadrant] += 1
end

In the above code, the counter variable is first used to store offsets of the elements in different quadrants. In the next loop it is in fact a counter for that stores in the index of the body that is currently under consideration.

step04.cxx

In this step we use the code written in the previous steps and actually build the tree. The tree is built recursively by splitting into quadrants and then assigning them to cells based on the quadrant. The ‘tree’ is actually stored in an array.

The cells are stored in a C++ vector called cells.

In the Cell struct, I wonder why the body is stored as a pointer and not a variable.

Implementation in the Ruby code, like saving the size of an Array during a recursive call is slightly different since Ruby does not support pointers, but the data structures and overall code is more or less a direct port.

1_traversal

These codes are for traversal of the tree that was created in the previous step. The full code can be found in 1_traversal.rb file.

step1.cxx

This step implements the P2M and M2M passes of the FMM.

One major difference between the C++ and Ruby implementation is that since Ruby does not have pointers, I have used the array indices of the elements instead. For this purpose there are two attributes in the Cell class called first_child_index that is responsible for holding the index in the cells array about the location of the first child of this cell, and the second first_body_index which is responsible for holding the index of the body in the bodies array.

This step does this by introducing a method called upwardPass which iterates through nodes and thier children and computes the P2M and M2M kernels.

step2.cxx

This step implements the rest of the kernels i.e. M2L, L2L, L2P and P2P. It also introduces two new methods downward_pass that calculates the local forces from other local forces and L2P interactions and horizontal_pass that calculates the inter-particle interactions and m2l.

No special code as such over here, its just the regular FMM stuff.

2_kernels

This code is quite different from the previous two. While the previous programs were mostly retricted to a single file, this program substantially increases complexity and spreads the implementation across several files. We start using 3 dimensional co-ordinates too.

In this code, we start to make a move towards spherical co-ordinate system to represent the particles in 3D. A few notable algorithms taken from some research papers have been implemented in this code.

Lets describe each file and see what implementation lies inside

kernel.h

The kernel.h header file implemenets all the FMM kernels. It also implements two special functions called evalMultipole and evalLocal that evaluate the multipoles and local expansion for spherical co-ordinates using the actual algorithm that is actually used in exafmm. An implementation of this algorithm can be found on page 16 of the paper “Treecode and fast multipole method for N-body simulation with CUDA” by Yokota sensei. A preliminary implementation of this algorithm can be found in “A Fast Adaptive Multipole Algorithm in Three Dimensions” by Cheng.

The Ruby implementation of this file is in kernel.rb.

I will now describe this algorithm here best I can:

Preliminaries

Ynm vector

This is a vector that defines the spherical harmonics of degree n and order m. A primitive version for computing this exists in the paper by Cheng and a newer, faster version in the paper by Yokota.

Spherical harmonics allow us to define series of a function in 3D rather in 1D that is usually the case for things like the expansion of sin(x). They are representations of functions on the surface of a sphere instead of on a circle, which is usually the case with other 2D expansion functions. They are like the Fourier series of the sphere. This article explains the notations used nicely.

The order (n) and degree (m) correspond to the order and degree of the Legendre polynomial that is used for obtaining the spherical harmonic. n is an integer and m goes from 0..n.

For causes of optimization, the values stored inside ynm are not the ones that correspond to the spherical harmonic, but are values that yield optimized results when the actual computation happens.

Historical origins of kernel.h

This file is a new and improved version of the laplace.h file from the exafmm-alpha repo. Due to the enhacements made, the code in this file performs calculations that are significantly more accurate than those in laplace.h.

laplace.h consists of a C++ class inside which all the functions reside, along with a constructor that computes pre-determined values for subsequent computation of the kernels. For example, in the constructor of the Kernel class, there is a line like so:

1
Anm[nm] = oddOrEven(n)/std::sqrt(fnmm*fnpm);

This line is computing the value of as is given by Cheng’s paper (equation 14). This value is used in M2L and L2L kernels later. However, this value is never directly computed in the new and optimized kernel.h file. Instead, it modifies the computation of the Ynm vector such that it no longer becomes necessary to involve the Anm term in any kernel computation.

Functions

cart2sph

This function converts cartesian co-ordinates in (X,Y,Z) to spherical co-ordinates involving radius, theta and phi. radius is simply the square root of the norm of the co-ordinates (norm is defined as the sum of squares of the co-ordinates in vec.h).

evalMultipole simple implementation

This algorithm calculates the multipole of a cell. It uses spherical harmonics so that net force of the forces inside a sphere and can be estimated on the surface of the sphere, which can then be treated as a single body for estimating forces.

The optimizations that are presented in the kernel.h version of this file are quite complex to understand since they look quite different from the original equation.

For code that is still sane and easier to read, head over to the laplace.h file in exafmm-alpha. The explanations that follow for now are from this file. We will see how the same functions in kernel.h have been modified to make computation faster and less dependent on large number divisions which reduce the accuracy of the system.

The evalMultipole function basically tries to populate the Ynm array with data that is computed with the following equation:

It starts with evaluating terms that need not be computed for every iteration of n, and computes those terms in the outer loop itself. The terms in the outer loop correspond to the condition m=n. The first of these is the exponential term .

After this is a curious case of computation of some indexes called npn and nmn. These are computed as follows:

1
2
npn = m * m + 2 * m # case Y n  n
nmn = m * m         # case Y n -n

The corresponding index calculation for the inner loop is like this:

1
2
npm = n * n + n + m # case Y n  m
nmm = n * n + n - m # case Y n -m

This indexes the Ynm array. This is done because we are visualizing the Ynm array as a pyramid whose base spans from -m to m and who height is n. A rough visualization of this pyramid would be like so:

   -m ---------- m
n  10 11 12 13  14
|    6  7  8  9
|     3  4   5  
|      1   2
V        0

The above formulas will give the indexes for each half of the pyramid. Since the values of one half of the pyramid are conjugates of the other half, we can only iterate from m=0 to m<P and use this indexing method for gaining the index of the other half of the pyramid.

Now let us talk about the evaluation of the Associated Legendre Polynomial , where m is the order of the differential equation and n is the degree. The Associated Legendre Polynomial is the solution to the Associated Legendre Equation. The Legendre polynomial can be expressed in terms of the Rodrigues form for computation without dependence on the simple Legendre Polynomial . However, due to the factorials and rather large divisions that need to be performed to compute the Associated Legendre polynomial in this form, computing this equation for large values of m and n quickly becomes unstable. Therefore, we use a recurrence relation of the Polynomial in order to compute different values.

The recurrence relation looks like so:

This is expressed in the code with the following line:

1
p = (x * (2 * n + 1) * p1 - (n + m) * p2) / (n - m + 1)

It can be seen that p is equivalent to , p1 is equivalent to and p2 is equivalent to . This convention is followed everywhere in the code.

Observe that the above equation requires the value of P for n-1 and n+1 to be computed so that the value for P at n can be computed. Therefore, we first set m=m+1 and then compute which can be expressed like this:

The above equation is expressed by the following line in the code:

1
p = x * (2 * m + 1) * p1

If you read the code closely, you will see that just at the beginning of the evalMultipole function, we initialize p1 = 1 the first time the looping is done. This is because when p1 at the first instance is identified with m = 0, and we substitute m=0 in this equation:

We will get .

When you look at the code initially, there might be some confusion regarding the significance of having to rho terms, rhom and rhon. This is written because each term of Ynm depends on a particular power of rho raised to n. So just before the inner loop, you can see the line rhon = rhom, which basically reduces the number of times that rho needs to be multiplied since the outer loop’s value of rho is already set to what it should be for that particular iteration.

Finally, see that there is a line right after the inner loop which reads like this:

1
pn = -pn * fact * y

This line is for calculating the value of p1 or after the first iteration of the loop. Since the second factorial term in the equation basically just deals with odd numbers, the calculation of this term can be simplified by simply incrementing by 2 with fact += 2. The y term in the above equation is in fact sin(alpha) (defined at the top of this function). This is because, if you see the original equation, you will see that the third term is , and x is in fact cos(alpha). Therefore, using the trigonometric equation, we can say simply substitute the entire term with y.

evalMultipole optimized implementation

Now that a background of the basic implementation of evalMultipole has been established, we can move over to understanding the code that is placed inside the kernel.h file of the exafmm/learning branch. This code is more optimized and can compute results with much higher accuracy than the code that is present in the exafmm-alpha repo that we previously saw. The main insipiration for this code come’s from the Treecode paper posted above.

In this code, most of the stuff relating to indexing and calculation of the powers of rho is pretty much the same. However, there are some important changes with regards to the computation of the values that go inside the Ynm array. This change is also reflected in the subsequent kernels.

The simplication in computation is basically based on the notion that a P2M kernel will eventually be expanded to M2M and therefore it makes sense to compute some terms that are required for M2M inside P2M itself. In order to see how exactly this will work, consider the line in laplace.h that is used for computing the M2M:

M += Cj->M[jnkms] * std::pow(I,real_t(m-abs(m))) * Ynm[nm] * real_t(oddOrEven(n) * Anm[nm] * Anm[jnkm] / Anm[jk]);

The above line computes the M2M as given by eq.13 in Cheng’s paper. Now, division and multiplication of such large numbers makes the M2M calculation very unstable if the order and/or degree of the equations is large. Therefore, the new evalMultipole simplifies this computation by computing some terms in the P2M stage itself.

In order to understand this, let us see the equation given by Cheng. Let us call this equation 1:

In the new M2M kernel, the A terms are clubbed together with other terms such that no actual division or multiplication involving these terms takes place the way it does in the laplace.h code. In this regard, we club together and . You will notice that O is the actually the multipole that is computed in the P2M stage (the Cj->M[jnkms] term in the code sample above). Therefore, the actual equation of the P2M kernel using the new evalMultipole method becomes like this:

The part in eq. (1) will be clubbed with the spherical harmonic Y and will be calculated inside the evalMultipole method for every particle in case of P2M and every multipole in case of M2M. Thus since the P2M and M2M have similar behaviour (i.e. grouping of many particles to lesser particles) we can use the same function for both.

In retrospect, inside the evalMultipole method, the part of the above P2M equation after the q is calculated. This equation, upon expansion of spherical harmonics into its consitituents and cancellation of terms with , can be simplified as the following equation. Note that the computed value of this equation is what gets stored inside the Ynm array.

The implementation of this equation inside evalMultipole is a little funny. It has been optimized in such a way that the division operations never happen between numbers that are too big. The factorials are calculated on the fly while the loop is in progress. This can get a little confusing at first since it is not very obvious. The factorial is mainly calculated in two lines of code, here and here.

The first line of code reads rhon /= -(n + m). The origin of this is obvious as can be seen from the above equation that is being calculated inside evalMultipole. The division happens after each iteration and there is no stored factorial that is used the way it was in laplace.h for reducing the number that needs to be used in the division.

The second line of code reads like rhom /= -(2 * m + 2) * (2 * m + 1). In this case, notice that the LHS has the variable rhom. This variable is used only in the outer loop for computation of Ynm (i.e. the case of n=m). In order to explain this, consider that the inner loop starts from n=m+1 and can be rewritten as rhon /= -(2*m+1) (at least for the first iteration). When the Ynm value needs to be calculated for the outer loop, we must have the current value of m and the value for the next iteration in the rhom, therefore we use 2*m+2 as well.

To understand in somewhat more detail, see the following Ruby code that recreates the values of the above indices:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
P = 5
m = 0
n = m+1

prod1 = 1
0.upto(P-1) do |m|
  prod = prod1
  (m+1).upto(P-1) do |n|
    puts "m: #{m} n: #{n} 2m+1: #{m+n}"
    prod *= (m + n)
  end
  prod1 *= (2*m + 1)*(2*m + 2)
  puts "P1: #{prod1} 2m+1: #{2*m+1} 2m+2: #{2*m+2}"
end

The above code produces the follwoing output:

m: 0 n: 1 2m+1: 1
P: 1
m: 0 n: 2 2m+1: 2
P: 2
m: 0 n: 3 2m+1: 3
P: 6
m: 0 n: 4 2m+1: 4
P: 24
P1: 2 2m+1: 1 2m+2: 2
m: 1 n: 2 2m+1: 3
P: 6
m: 1 n: 3 2m+1: 4
P: 24
m: 1 n: 4 2m+1: 5
P: 120
P1: 24 2m+1: 3 2m+2: 4
m: 2 n: 3 2m+1: 5
P: 120
m: 2 n: 4 2m+1: 6
P: 720
P1: 720 2m+1: 5 2m+2: 6
m: 3 n: 4 2m+1: 7
P: 5040
P1: 40320 2m+1: 7 2m+2: 8
P1: 3628800 2m+1: 9 2m+2: 10

If you observe the output of code, you can see that the indices being calculated during P1 phase are exactly one greater than the previous phase, which indicates that the factorial value is being calculated properly, incrementally.

vector.h

This file defines a new custom type for storing 1D vectors called vec as a C++ class. It also defines various functions that can be used on vectors like norm, exp and other simple arithmetic.

The Ruby implementation of this file is in vector.rb.

exafmm.h

exafmm2d.h and step1.cxx

Shows a very simple preliminary implementation of the actuall exafmm code. Mostly useful for understanding purpose only.

step2.cxx

GSOC 2016 Wrap Up for SciRuby

In the summer of 2016 I was chosen by the SciRuby core team to be admin for SciRuby for Google Summer of Code 2016. GSOC is an important yearly event for us as an organization since it provides a great platform for an upcoming organization like SciRuby and helps us get more users and contributors for the various libraries that we maintain.

This blog post is meant to be a summary of the work that SciRuby did over the summer and also of my experience at the GSOC 2016 mentor’s summit.

GSOC student work

For the 2016 edition of GSOC we had 4 students - Lokesh Sharma, Prasun Anand, Gaurav Tamba and Rajith Vidanaarachchi. All four were undergraduate computer engineering students from colleges in India or Sri Lanka at the time of GSOC 2016.

Lokesh worked on making improvements to daru, a Ruby DataFrame library. He made very significant contributions to daru by adding functionality for storing and performing operations on categorical data, and also significantly sped up the sorting and grouping functionality of daru. His work has now been successfully integrated into the main branch and has also been released on rubygems. Lokesh has remained active as a daru contributor and regularly contributes code and replies to Pull Requests and issues. You can find a wrap up of the work he did throughout the summer in this blog post.

Prasun worked on creating a Java backend for NMatrix, a Ruby library for performing linear algebra operations similar to numpy in Python. This project opened the doors for scientific computation on JRuby. Prasun was able to complete all his project objectives, and his work is currently awaiting review because of the sheer size of the Pull Request and the variety of changes to the library that he had to make in order to accomplish his project goals. You can read about his summer’s work here. Prasun will also be speaking at Ruby Conf India 2017 about his GSOC work and scientific computing on JRuby in general.

Gaurav worked on creating a Ruby wrapper for NASA’s SPICE toolkit. A need for this was felt since Gaurav’s mentor John is a rocket scientist and was keen having a Ruby wrapper for a library that he used regularly in his work. This resulted in the spice_rub gem. It exposes a very intuitive Ruby interface to the SPICE toolkit. Gaurav also gave a lightning talk about his work at Deccan Ruby Conf (Pune, India). Blog posts summarizing his work can be found here, here and here.

Rajith worked on growing the Ruby wrapper over symengine. His mentor Abinash was a student with SciRuby for GSOC 2015 and volunteered to mentor Rajith so that Rajith could build upon the work that he had done the previous summer. This resulted in a huge increase in functionality for the symengine.rb ruby gem.

To summarize, all four of our students could execute their chosen tasks within the stipulated time and we did not have to fail anyone. All in all, we mentors had a great time working with the students and hope to keep doing this year on year!

GSOC 2016 mentor’s summit

The GSOC 2016 mentor’s summit was fantastic. It was great meeting all the contributors and listening to ideas from projects that I had never heard about previously. I also had the opportunity to conduct an unconference session and share my ideas on Scientific Computation in Ruby with like minded people from other organizations.

Here are some photos that I took at the summit:

'ID card'

'A visit to the Computer History Museum'

'The (now discontinued) self driving car'

'Chocolate table at the GSOC summit'

'Attendees from India'

Advice for Future GSOC Students and Mentors Based on My Experience.

This year I was the admin for the GSOC projects of the SciRuby foundation. It’s also the first time I mentored a student, having been a student myself last year. Being a mentor is pretty tough task, and for some reason is underestimated by many. I was lucky to have the experience and support of co-admins @mohawkjohn and @pjotrp throughout the GSOC period.

GSOC has now come to a close. I have learned a great deal myself in the past 3 months, and thought I would share some of my learnings in this blog post in the interest of future GSOC students and mentors.

Advice for future students

Writing a proposal

Research your ideas at least for a day before asking your first question. Mentors are volunteers and it’s important to respect the time and effort that they’re putting into FOSS. When you do propose an idea, you should also have a good knowledge of why you’re working on that idea in the first place and what kind of long term impact the realization of that idea can have. Putting this across through your proposal can have a positive on your selection. Know how to ask questions on mailing lists. A properly researched question should show that you have first taken an effort to understand the topic and are asking the question only because you got stuck somewhere.

Community bonding

Make sure you figure out what exactly you have to research and learn during the community bonding phase. There’s a lot of things out there that can be learned, but only a few specific things will helpful for your project. Focus on them only. Ask specific questions to your mentor.

Coding

Setup a daily schedule for coding and stick to it. Constantly keep in touch with your mentor and make sure they know your progress as well as you do. If you run into previously unseen problems (frequent in programming), tell mentor about this ASAP and work out a solution together.

Don’t burn yourself out in your enthusiasm. Take regular breaks. Overworking does more harm than good.

Advice for future mentors

Student selection

Short story: If you’re unsure about a student, don’t select him. It’s better to have a more quality than quantity.

Long story: First and foremost, it is very important to establish some organization-wide procedure that will be followed when selecting a student. As a start, consider making a proposal template that contains all the information and details that the student needs to fill up when submitting the proposal. Have a look at the SciRuby application template as an example.

When students start asking questions on the mailing list, it is important for the org admins to keep a watch on students and get a rough idea of who asks the better questions and who don’t. Community participation is a great measure of understanding whether a student will live upto your expectations or not. A proposal with a bad first draft might just turn out to be great simply because the student is open to feedback and is willing to put it in the effort to work on it.

We have 3 rounds: First round every mentor rates their own student only. In the next round all mentors rate all students (students without a mentor and bad proposals drop off).

In each case when rating a student. mentors put in a comment, making sure to tell how a student has interacted in the proposal phase, what his current coding looks like, how responsive he is. Mentors can still push their students to do stuff. We like it when students keep responsive in this phase.

In the 3rd round the org admins make the final ranking to set the number of slots. By this stage we are pretty clear about the individuals involved (and note that mentor activity counts). When Google allocates the slots the top-ranked students get in.

Coding period

Make sure you communicate with your student that they are supposed to send you daily updates of their progress. One paragraph about their work that particular day should suffice.

Setting Up a Lexical Analyser and Parser in Ruby

I wrote this post as I was setting up the lexer and parser for Rubex, a new superset of Ruby that I’m developing.

Let’s demonstrate the basic working of a lexical analyser and parser in action with a demonstration of a very simple addition program. Before you start, please make sure rake, oedipus_lex and racc are installed on your computer.

Configuring the lexical analyser

The most fundamental need of any parser is that it needs string tokens to work with, which we will provide by way of lexical analysis by using the oedipus_lex gem (the logical successor of rexical). Go ahead and create a file lexer.rex with the following code:

1
2
3
4
5
6
7
8
9
class AddLexer
macro
  DIGIT         /\d+/
rule
  /#{DIGIT}/    { [:DIGIT, text.to_i] }
  /.|\n/        { [text, text] }
inner
  def do_parse; end # this is a stub.
end # AddLexer

In the above code, we have defined the lexical analyser using Oedipus Lex’s syntax inside the AddLexer class. Let’s go over each element of the lexer one by one:

macro

The macro keyword lets you define macros for certain regular expressions that you might need to write repeatedly. In the above lexer, the macro DIGIT is a regular expression (\d+) for detecting one or more integers. We place the regular expression inside forward slashes (/../) because oedipus_lex requires it that way. The lexer can handle any valid Ruby regular expression. See the Ruby docs for details on Ruby regexps.

rule

The section under the rule keyword defines your rules for the lexical analysis. Now it so happens that we’ve defined a macro for detecting digits, and in order to use that macro in the rules, it must be inside a Ruby string interpolation (#{..}). The line to the right of the /#{DIGIT}/ states the action that must be taken if such a regular expression is encountered. Thus the lexer will return a Ruby Array that contains the first element as :DIGIT. The second element uses the text variable. This is a reserved variable in lex that holds the text that the lexer has matched. Similar the second rule will match any character (.) or a newline (/n) and return an Array with [text, text] inside it.

inner

Under the inner keyword you can specify any code that you want to occur inside your lexer class. This can be any logic that you want your lexer to execute. The Ruby code under the inner section is copied as-is into the final lexer class. In the above example, we’ve written an empty method called do_parse inside this section. This method is mandatory if you want your lexer to sucessfully execute. We’ll be coupling the lexer with racc shortly, so unless you want to write your own parsing logic, you should leave this method empty.

Configuring the parser

In order for our addition program to be successful, it needs to know what to do with the tokens that are generated by the lexer. For this purpose, we need racc, an LALR(1) parser generator for Ruby. It is similar to yacc or bison and let’s you specify grammars easily.

Go ahead and create a file called parser.racc in the same folder as the previous lexer.rex and Rakefile, and put the following code inside it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class AddParser
rule
  target: exp { result = 0 }
  
  exp: exp '+' exp { result += val[2]; puts result }
     | DIGIT
end

---- header
require_relative 'lexer.rex.rb'

---- inner
def next_token
  @lexer.next_token
end

def prepare_parser file_name
  @lexer = AddLexer.new
  @lexer.parse_file file_name
end

As you can see, we’ve put the logic for the parser inside the AddParser class. Yacc’s $$ is the result; $0, $1… is an array called val, and $-1, $-2… is an array called _values. Notice that in racc, only the parsing logic exists inside the class and everything else (i.e under header and inner) exists outside the class. Let’s go over each part of the parser one by one:

class AddParser

This is the core class that contains the parsing logic for the addition parser. Similar to oedipus_lex, it contains a rule section that specifies the grammar. The parser expects tokens in the form of [:TOKEN_NAME, matched_text]. The :TOKEN_NAME must be a symbol. This token name is matched to literal characters in the grammar (DIGIT in the above case). token and expr are varibles. Have a look at this introduction to LALR(1) grammars for further information.

header

The header keyword tells racc what code should be put at the top of the parser that it generates. You usually put your require statements here. In this case, we load the lexer class so that the parser can use it for accessing the tokens generated by the lexer. Notice that header has 4 hyphens (-) and a space before it. This is mandatory if your program is to not malfunction.

inner

The inner keyword tells racc what should be put inside the generated parser class. As you can see there are two methods in the above example - next_token and prepare_parser. The next_token method is mandatory for the parser to function and you must include it in your code. It should contain logic that will return the next token for the parser to consider. Moving on the prepare_parser method, it takes a file name that is to be parsed as an argument (how we pass that argument in will be seen later), and initialzes the lexer. It then calls the parse_file method, which is present in the lexer class by default.

The next_token method in turn uses the @lexer object’s next_token method to get a token generated by the lexer so that it can be used by the parser.

Putting it all together

Our lexical analyser and parser are now coupled to work with each other, and we now use them in a Ruby program to parse a file. Create a new file called adder.rb and put the following code in it:

1
2
3
4
5
6
require_relative 'parser.racc.rb'

file_name = ARGV[0]
parser = AddParser.new
parser.prepare_parser(file_name)
parser.do_parse

The prepare_parser is the same one that was defined in the inner section of the parser.racc above. The do_parse method called on the parser will signal the parser to start doing it’s job.

In a separate file called text.txt put the following text:

2+2

Oedipus Lex does not have a command line tool like rexical for generating a lexer from the logic specified, but rather has a bunch of rake tasks defined for doing this job. So now create a Rakefile in the same folder and put this code inside it:

1
2
3
4
5
6
7
8
9
10
11
require 'oedipus_lex'

Rake.application.rake_require "oedipus_lex"

desc "Generate Lexer"
task :lexer  => "lexer.rex.rb"

desc "Generate Parser"
task :parser => :lexer do
  `racc parser.racc -o parser.racc.rb`
end

Running rake parser will generate a two new files - lexer.rex.rb and parser.racc.rb - which will house the classes and logic for the lexer and parser, respectively. You can use your newly written lexer + parser with a ruby adder.rb text.txt command. It should output 4 as the answer.

You can find all the code in this blogpost here.

Random Thoughts on Music Theory.

Title explains what this is about.

16 August 2016

Was checking out this video (Contortionist - Language 1) and learned about standard C# tuning on a 6 string bass guitar today. He’s used tuning G# C# F# B E A. Killer bass tone. This wiki says something different about C# standard, though.

30 November 2016

Trying out some interval training with this video today. Supposd to be really good.

So there are two types of intervals: harmonic and melodic. Harmonic is when is two or more notes are played at a time and melodic is when two or more notes are played separately.

Intervals are described by some properties:

  • Quality: Whether it is perfect, major, minor, augmented or diminished. Perfect intervals, if they’re raised by half step become augmented, if they are lowered by half step they become diminished. If perfect intervals are inverted, they remain perfect intervals. So a perfect fifth interved becomes a perfect fourth, and vice versa a perfect fourth interved becomes a perfect fifth. Minor or major intervals can become augmented or diminshed but never perfect.
  • Number: Unison, 2nd, 3rd, 4th, 5th,6th,7th, 8th, etc. Number of the interval is the number of letter names that the letter name spans. For example, C to G is a fifth because it spans 5 letter names C-D-E-F-G.

A dyad is a two note chord.

Aural characterestics of intervals: Consonance category: Perfect fifth and octaves are open consonances. Major and minor thirds and sixths are called soft consonances.

Dissonant category: Minor sevenths (C-Bb) and major seconds (C-D) are called mild dissonances. Minor seconds (like C-Db) and major sevenths (C-B) are called sharp dissonances.

The perfect fourth is characterized as a consonant or distant interval depending on its used in context. If a perfect 4th is part of a second inversion major triad

The major 6th interval can be remebered with ‘My Bonnie Lies…’.

To identify a minor 6th interval, play the first inversion of the triad and then play the 1st and 3rd of the inversion.

To identify a major 6th, play the second inversion of the triad so you get the 1st and 3rd notes at a major 6th interval.

11 Decemeber 2016

Songs for remembering ascending invtervals:

  • Major 2nd - Happy Birthday to You.
  • Major 3rd - Oh when the saints go marching.
  • Perfect 4th - Star Trek Theme (TNG).
  • Perfect 5th - Scarborough Fair. (are we ^going….)
  • Major 6th - My Bonnie Lies Over…
  • Major 7th - Superman theme
  • Octave - The Christmas Song

Searching for Graduate Degree Courses in USA and Japan.

I’m currently searching for master’s degree courses in various colleges in Japan and USA. I want to pursue a Computer Science degree specializing in distributed systems. Searching for the right graduate degree courses can get depressing. Here I’m posting various links and leads that I came across through the course of my search.

5 August 2016

Searching for options in Japan and started with University of Tokyo. Most of their courses seem to be in Japanese but there are a few in English as well. This page has some starting info about the English courses. Also found a collection of colleges here.

So apparently the process for getting into a Japanese college for Master’s can take two paths. The first is like so:

  1. Talk to a professor and gain a research assistantship with him/her.
  2. Give an exam and enroll for a 2 year master’s course if you pass that exam.

The second is directly give the exam, but I’m not sure how that can be done since they all appear to be written examinations that are conducted in Japan.

16 August 2016

Having a look at the graudate schools of University of Tokyo, Tokyo Insitute of Technology and Kyoto University today.

University of Tokyo

UoT seems to have some special selection process for international applicants (link), though it’s not useful for me. There’s a decent contact page here. They’ve also put up a check list for applications here.

Tokyo Inst. of Technology

This also has a good graduate program.Tokyo Inst. of Technology has an international graduate program for overseas applicants. The courses seems to be in English mostly. The school of computer science has also participated in the IGP and accept the IGP(A), IGP(B)3 and IGP(C) types of applicants. I seem to be most qualified for the IGP(A) and IGP(C) applications.

The ‘Education Program of Advanced Information Technology Leaders’ seems to be most relevant to my case. This looks like a good PDF to brief about the program.

All the courses require students to arrange for a Tokyo Tech faculty member to serve as their academic supervisor. This handy web application allows you to do that. They also have the MEXT scholarship for outstanding students.

University of Kyoto

Page of dept. of information science.

17 August 2016

Continuing my research on Tokyo Inst. of Technology. The PDF I pointed to yesterday brought out an interesting observation - IGP(A) students and IGP(C) students seem to have different course work.

18 August 2016

It seems the IGP C program at Tokyo Tech. is best for me. I will research that further today. Most probably I’ll need to do a 6 month research assistantship first. Here’s a list of the research groups of the Computer Sci. deptartment at Tokyo Tech.

20 August 2016

Tokyo Inst. of Technology

Found a list of faculties under the IGP(C) program here.

23 August 2016

Had a look at Kyushu Inst. of Technology today. The program for international students looks good.

Also check out scholarship opportunities at Tokyo Inst. of Technology. Links - 1, 2, 3. There are a bunch of scholarships that can be applied to before you enrol in university. Have a look here.

There’s also the MEXT scholarshipfrom the Japanese government.

24 August 2016

Found an interesting FAQ on the UoT website.

Also having a look at JASSO scholarships. Found some great scholarships here.

25 August 2016

Found some scholarships. Also, I can also enrol as a privately funded research student at Tokyo Tech.

This is a PDF that talks about privately funded research students.

Also checking out Keio University today. They have a program for internation graduate students. Have a look here.

I also had a look at the Kyoto University IGP. Here’s a listing of Japanese universities.

28 August 2016

Found a Computer Engineering IGP at Kyoto University, though I still cant find anything related to HPC. This is a link that has some details on admissions.

More details on Tokyo Tech.’s IGP(A) can be found here. This looks like a good resource for curriculum. This has resources for scholarships without recommendation.

29 August 2016

Found a good resource on IGP programs at Tokyo Tech here. Here’s a PPT about IGP(A) in particular. IGP(A) coursework can be found here.

4 November 2016

Posting after quite a while!

I’m currently having a look at Linz University, Austria. I came to know one of the research groups there is really good and are making some solid progress in high performant software.

Here’s the admissions page of the dept. of computer science. Here’s more info on admissions. This is a PDF on the Computer Science degree.

The System Software group looks nice.

25 December 2016

Checking out the Computer Science program at UIC and that at University of Houston.

This is UICs website. This is the detailed PDF of the MS in CS requirements.

Random Thoughts on Bass Tone

This post is about my learnings about bass tone. I’m currently using the following rig:

  • Laney RB2 amplifier
  • Tech 21 Sansamp Bass Driver Programmable DI
  • Fender Mexican Standard Jazz Bass (4 string)

I will updating this post as and when I learn something new that I’d like to document or share. Suggestions are welcome. You can email me (see the ‘about’ section) or post a comment below.

26 July 2016

As of now I’m tweaking the sansamp and trying to achieve good tone that will compliment the post/prog rock sound of my band Cat Kamikazee. I’m also reading up on different terminologies and use cases on the internet. For instance I found this explanation on DI boxes quite useful. For instance I learned that the ‘XLR Out Pad’ button on the sansamp actually provides a 20 db cut to the soundboard if your signal is too hot.

I am trying to couple the sansamp with a basic overdrive pedal I picked up from a friend. This thread on talkbass is pretty useful for that. The guy who answered the question states that it’s better to place the sansamp last in the chain so that the DI can deliver the output of the sound chain.

So the BLEND knob on the sansamp modulates how much of the dry signal is mixed with the sansamp tube amplifier emulation circutry. Can be useful when chaining effects pedals with the sansamp by reducing the blend and letting more of the dry signal pass through. Btw the bass, treble and level controls remain active irrespective of the position of BLEND.

One thing that was a little confusing was the whole thing about ‘harmonic partials’. I found a pretty informative thread about the same on this TalkBass thread.

Here’s an interesting piece on compressors.

Some more useful links I came across over the course of the past few days:

  • https://theproaudiofiles.com/amp-overdrive-vs-pedal-overdrive/
  • http://www.offbeatband.com/2009/08/the-difference-between-gain-volume-level-and-loudness/

28 July 2016

Found an interesting and informative piece on bass pedals here. It’s a good walkthrough of different pedal types and their functionality and purpose.

I wanted to check out some overdrive pedals today but was soon sinking in a sea of terminologies. One thing that intrigued me is the difference between an overdrive, distortion and fuzz. I found a pretty informative article on this topic. The author has the following to say about these 3 different but seemingly similar things.

I had a look at the Darkglass b3k and b7k pedals too. They look like promising overdrive pedals. I’ll explore the b3k more since the only difference between the 3 and the 7 is that the 7 also functions as a DI box and has an EQ, while the 3 doesn’t. I already have a DI with a 2 band EQ in the sansamp.

29 July 2016

One thing that I noticed when tweaking my sansamp is the level of ‘distortion’ in my tone varies a LOT when you change the bass or treble keeping the drive at the same level. Why does this happen?

2 August 2016

Trying to dive further into distortion today. Found this article kind of useful. It relates mostly to lead guitar tones, but I think it applies in a general case too. I learned about symmetric and asymmetric clipping in that article.

According to the article, symmetric clipping is more focused and clear, because it is only generating one set of harmonic overtones. Since asymmetric clipping can be hard-clipped on one side, and soft-clipped on the other, it has the potential to create very thick complex sounds. This means that if you want plenty of overtones, but do not want a lot of gain, asymmetric clipping can be useful. For full-blown distortion symmetric clipping is usually more suitable, since high-gain tones are already very harmonically complex. Typically asymmetric clipping will have a predominant first harmonic, which the symmetric clipping will not (that’s probably why in this video, the SD1 sounds brigther than than the TS-9). High gain distortion tones sound best with most of the distortion coming from the pre-amp, so try to use a fairly neutral pickup or even a slightly ‘bright’ pickup.

The follow up to the above post talks about EQ in relation with distortion. It has stuff on pre and post EQ distortion and how it can affect the overall tone. If you place the EQ before the distortion, you can actually shape which frequencies will be clipped. However if you place it after the distortion then the EQ will only act for shaping the already distorted tone. Pre-dist EQ is more useful in most cases since it let’s you control the frequencies for clipping.

It also says that humbucking pickups have a mid-boost that is more focused by the lower part of the frequency range. Single coil pickups on the other hand have a mid-boost focused by the upper part of the frequency range. Single coils generally have clearer, more articulate bass end.

3 August 2016

Read something about bass DI in this article today.

10 October 2016

Posting after quite a while!

Reading about the use of compression for bass guitars. Found this article which explains why we need compression in the first place.

Also, my band’s installation of Main Stage 3 has started giving some really weird problems. More about that soon.

11 October 2016

Coming back to Main Stage. For some reason, pressing Space Bar for play/pause reduces the default sampling rate and makes the tracks sound weird. We need to go to preferences and increase the sampling rate to 48 kHz again (that’s what our backing tracks are recorded at). I think its something to do with the key mappings, but I’m not sure. Will need to check it out.

It also so happens that after the space bar has been pressed and the issue with the sampling rate is resolved, the samples (which come from a M-Audio M-Track) start emitting a strange crackling sound. This sounds persists only if the headphones are connected into the audio jack (we use the onboard Mac sound card too). The sound goes away if the headphones are unplugged. Restarting the Mac resolves the issue. I suspect there might be a way without having to restart. Will investigate.

Turns out you just restart and it solves the problem (and be careful about what keys you press when on stage!). Not worth scratching your head too much.

9 November 2016

I just got a new EHX Micro POG octaver pedal and a TC electronic booster pedal. Also got a TC electronics Polytune. Finally on my way to creating a pedal chain :)

So for now I’m using the pedals in this order:

Tuner -> Octaver -> Booster -> Sansamp

I think this works fine for me for now, though I might change something later on.

I read in this thread that using one octave down with an overdrive (on the sansamp) works wonders. Gonna try that now!

I am also having a look at this guide on setting up a pedal board.

18 November 2016

Also found an interesting rig rundown by Tim Commerford (RATM).

Making Screencasts in Debian

Overview

I thought I’ll try something new by recording screencasts for some of my work on Ruby open source libraries.

This is quite a change for me since I’m primarily focused on the programming and designing side of things. Creating documentation is something I’ve not ventured into a lot except the usual YARD markup for Ruby methods and classes.

In this blog post (which I will keep updating as time progresses) I hope to document my efforts in creating screencasts. Mind you this is the first time I’m creating a screencast so if you find any potential improvements in my methods please point them out in the comments.

Creating the video

My first ever screencast will be for my benchmark-plot gem. For creating the video I’m mainly using two tools - Kdenlive for video editing and Kazam for recording screen activity. I initially tried using Pitivi and OpenShot for video editing, but the former did not seem user friendly and the latter kept crashing on my system. For the desktop recording I first tried using RecordMyDesktop but gave up on it since it’s too heavy on resources and recoreded poor quality screencasts with not too many customization options.

For creating informative visuals, I’m using LibreOffice Impress so that I can create a slide, take it’s screenshot when in slideshow mode and put in the screencast. However I’ve generally found that using slides does not serve well the content delivery in a screencast and will probably not feature too many slides in future screencasts.

Sublime Text 3 is my primary text editor. I use it’s in built code execution functionality (by pressing Ctrl + Shift + B) to execute a code snippet and display the results immediately.

Creating the audio

I am using Audacity for recording sound. Sadly my mic produces a lot of noise, so for removing that noise in Audacity, I use the inbuilt noise reduction tools.

Noise reduction in Audacity can be achieved by first selecting a small part of the sound that does not contain speech, then go to Effects -> Noise Reduction and click on ‘Get Noise Profile’. Then select the whole sound wave with Ctrl + A. Go to Effects -> Noise Reduction again and click ‘OK’. It should considerably reduce static noise from your sound file.

All files are exported to Ogg Vorbis.

Putting it all together

I did some research on the screencasting process and found this article by Avdi Grimm and this one by Sayanee Basu extremely helpful.

I first started by writing the transcript along with any code samples that I had to show. I made it a point to describe the code being typed/displayed on the screen since it’s generally more useful to have a voice over explaning the code than having to pause the video and go over it yourself.

Then I recorded the voice over just for the part that featured slides. I imported the screenshots of the slides in kdenlive and adjusted them such that they fit the voice over. Recording the code samples was a bit of a challenge. I started typing out the code and talking about it into the mic. This was more difficult than I thought, almost like playing a Guitar and singing at the same time. I ended up recording the screencast in 4 separate takes, with several retakes for each take.

After importing the screencast with voice over into kdenlive and separating the audio and video components, I did some cuts to reduce redundancy or imperfections in my VO. Some of the parts of the video where there was a lot of typing had to be sped up by using kdenlive’s Speed tool.

Once this was upto my satisfaction, I exported it to mp4.

The video of my first screencast is now up on YouTube in the video below. Have a look and leave your feedback in the comments!

Summary of Work This Summer for GSOC 2015

Over this summer as a part of Google Summer of Code 2015, daru received a lot of upgrades and new features which have made a pretty robust tool for data analysis in pure ruby. Of course, a lot of work still remains for bringing daru at par with the other data analysis solutions on offer today, but I feel the work done this summer has put daru on that path.

The new features led to the inclusion of daru in many of SciRuby’s gems, which use daru’s data storage, access and indexing features for storing and carrying around data. Statsample, statsample-glm, statsample-timeseries, statsample-bivariate-extensions are all now compatible with daru and use Vector and DataFrame as their primary data structures. Daru’s plotting functionality, that interfaced with nyaplot for creating interactive plots directly from the data was also significantly overhauled.

Also, new gems developed by other GSOC students, notably Ivan’s GnuplotRB gem and Alexej’s mixed_models gem both accept data from daru data structures. Do see their repo pages for seeing interesting ways of using daru.

The work on daru is also proving to be quite useful for other people, which led a talk/presentation at DeccanRubyConf 2015, which is one of the three major ruby conferences in India. You can see the slides and notebooks presented at the talk here. Given the current interest in data analysis and the need for a viable solution in ruby, I plan to take daru much further. Keep watching the repo for interesting updates :)

In the rest of this post I’ll elaborate on all the work done this summer.

Pre-mid term submissions

Daru as a gem before GSOC was not exactly user friendly. There were many cases, particularly the iterators, that required some thinking before anybody used them. This is against the design philosophy of daru, or even ruby general, where surprising programmers with ubiqtuos constructs is usually frowned down upon by the community. So the first thing that I did mainly concerned overhauling the daru’s many iterators for both Vector and DataFrame.

For example, the #map iterator from Enumerable returns an Array no matter object you call it on. This was not the case before, where #map would a Daru::Vector or Daru::DataFrame. This behaviour was changed, and now #map returns an Array. If you want a Vector or a DataFrame of the modified values, you should call #recode on Vector or DataFrame.

Each of these iterators also accepts an optional argument, :row or :vector, which will define the axis over which iteration is supposed to be carried out. So now there are the #each, #map, #map!, #recode, #recode!, #collect, #collect_matrix, #all?, #any?, #keep_vector_if and #keep_row_if. To iterate over elements along with their respective indexes (or labels), you can likewise use #each_row_with_index, #each_vector_with_index, #map_rows_with_index, #map_vector_with_index, #collect_rows_with_index, #collect_vector_with_index or #each_index. I urge you to go over the docs of each of these methods to utilize the full power of daru.

Apart from this there was also quite a bit of refactoring involved for many methods (courtesy Alexej). This has made daru much faster than previous versions.

The next (major) thing to do was making daru compatible with statsample. This was very essential since statsample is very important tool for statistics in ruby and it was using its own Vector and Dataset classes, which weren’t very robust as computation tools and very difficult to use when it came to cleaning or munging data. So I replaced statsample’s Vector and Dataset clases with Daru::Vector and Daru::DataFrame. It involved a significant amount of work on both statsample and daru. Statsample because many constructs had to changed to make them compatible with daru, and daru because there was a lot of essential functionality in these classes that had to be ported to daru.

Porting code from statsample to daru improved daru significantly. There were a whole of statistics methods in statsample that were imported into daru and you can now use all them from daru. Statsample also works well with rubyvis, a great tool for visualization. You can now do that with daru as well.

Many new methods for reading and writing data to and from files were also added to daru. You can now read and write data to and from CSV, Excel, plain text files or even SQL databases.

In effect, daru is now completely compatible with statsample (and all the other statsample extensions). You can use daru data structures for storing data and pass them to statsample for performing computations. The biggest advantage of this approach is that the analysed data can be passed around to other scientific ruby libraries (some of which listed above) that use daru as well. Since daru offers in-built functions to better ‘see’ your data, better visualization is possible.

See these blogs and notebooks for a complete overview of daru’s new features.

Also see the notebooks in the statsample README for using daru with statsample.

Post-mid term submissions

Most of time post the mid term submissions was spent in implementing the time series functions for daru.

I implemented a new index, the DateTimeIndex, which can used for indexing data on time stamps. It enables users to query data based on time stamps. Time stamps can either be specified with precise ruby DateTime objects or can be specified as strings, which will lead to retrival of all the data falling under that time. For example specifying ‘2012’ returns all data that falls in the year 2012. See detailed usage of DateTimeIndex in conjunction with other daru constructs in the daru README.

An essential utility in implementing DateTimeIndex was DateOffset, which is a new set of classes that offsets dates based on certain rules or business logic. It can advance or lag a ruby DateTime to the nearest day or any day of the week or the end or beginning of the month etc. DateOffset is an essential part of DateTimeIndex and can also be used as a standalone utility for advancing/lagging DateTime objects. This blog post elaborates more on the nuances of DateOffset and its usage.

The last thing done during the post mid term was complete compatibility with statsample-timeseries, which was created by Ankur Goel during GSOC 2013. It offers many uesful functions for analysis of time series data. It now works with daru containers. See some use cases here.

Thats all, as far as I can remember.

Elaboration on Certain Internals of Daru

In this blog post I will elaborate on how a few of the features in daru were implemeted. Notably I will stress on what spurred a need for that particular design of the code.

This post is primarily intended to serve as documentation for me and future contributors. If readers have any inputs on improving this post, I’d be happy to accept new contributions :)

Index factory architecture

Daru currently supports three types of indexes, Index, MultiIndex and DateTimeIndex.

It became very tedious to write if statements in the Vector or DataFrame codebase whenever a new data structure was to be created, since there were 3 possible indexes that could be attached with every data set. This mainly depended on what kind of data was present in the index, i.e. tuples would create a MultiIndex, DateTime objects or date-like strings would create a DateTimeIndex, and everything else would create a Daru::Index.

This looked something like the perfect use case for the factory pattern, the only hurdle being that the factory pattern in the pure sense of the term would be a superclass, something called Daru::IndexFactory that created an Index, DateTimeIndex or MultiIndex index using some methods and logic. The problem is that I did not want to call a separate class for creating Indexes. This would break existing code and possibly cause problems in libraries that were already using daru (viz. statsample), not to mention confusing users about which class they’re actually supposed to be using.

The solution came after I read this blog post, which demonstrates that the .new method for any class can be overridden. Thus, instead of calling initialize for creating the instance of a class, it calls the overridden new, which can then call initialize for instantiating an instance of that class. It so happens that you can make new return any object you want, unlike initialize which must an instance of the class it is declared in. Thus, for the factory pattern implementation of Daru::Index, we over-ride the .new method of the Daru::Index and write logic such that it manufactures the appropriate kind of index based on the data that is passed to Daru::Index.new(data). The pseudo code for doing this looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Daru::Index
  # some stuff...

  def self.new *args, &block
    source = args[0]

    if source_looks_like_a_multi_index
      create_multi_index_and_return
    elsif source_looks_like_date_time_index
      create_date_time_index_and_return
    else # Create the Daru::Index by calling initialize
      i = self.allocate
      i.send :initialize, *args, &block
      i
    end
  end

  # more stuff...
end

Also, since over-riding .new tampers with the subclasses of the class as well, an inherited hook that replaces the over-ridden .new of the inherited class with the original one was added to Daru::Index.

Working of the where clause

The where clause in daru lets users query data with a Array containing boolean variables. So whenever you call where on Daru::Vector or DataFrame, and pass in an Array containing true or false values, all the rows corresponding with true will be returned as a Vector or DataFrame respectively.

Since the where clause works in cojunction with the comparator methods of Daru::Vector (which return a Boolean Array), it was essential for these boolean arrays to be combined together such that piecewise AND and OR operations could be performed between multiple boolean arrays. Hence, the Daru::Core::Query::BoolArray class was created, which is specialized for handling boolean arrays and performing piecewise boolean operations.

The BoolArray defines the #& method for piecewise AND operations and it defines the #| method for piecewise OR operations. They work as follows:

1
2
3
4
5
6
7
8
9
10
11
require 'daru'

a = Daru::Core::Query::BoolArray.new([true,false,false,true,false,true])
#=> (Daru::Core::Query::BoolArray:84314110 bool_arry=[true, false, false, true, false, true])
b = Daru::Core::Query::BoolArray.new([false,true,false,true,false,true])
#=> (Daru::Core::Query::BoolArray:84143650 bool_arry=[false, true, false, true, false, true])
a & b
#=> (Daru::Core::Query::BoolArray:83917880 bool_arry=[false, false, false, true, false, true])
a | b
#=> (Daru::Core::Query::BoolArray:83871560 bool_arry=[true, true, false, true, false, true])