About Me

My photo
I am passionate .NET developer who believes in the power of collaboration and craftsmanship.

Blog Archive

Monday, January 9, 2017

Star Problem

Here is the problem. Let say you have a large CSV file that can not be opened completely in memory. Each line is a record that represents a star. The star record has a name and it's coordinates x, y and z. From this file you need to determine the 10 closest stars to a given star by the distance between the given star and the ones in the file using a function that will return the distance value as an double. How would you solve this problem?

Now that I have set up the question I am going to show you how I solved it.

To start off I am going to build the Star object. The star object has some important properties including X,Y,Z and Name that are defined in the question plus a property call DistanceToCompare. This property will hold the value of the distance between the star and the given star. To populate this value their is a method called CalculateDistanceToCompare which takes in the given star object as a parameter. This function follows the standard calculation for determining the distance between two points via their x,y,z coordinates.

In addition to mentioned properties and methods there is a factory method called Create that takes in a string value which will be a line value from CSV file and creates a new star. The Star object also implements IComparable which is important part to the solution it makes it's comparison based on the property value DistanceToCompare. The complete listing is below.

public class Star : IComparable<Star>
{
    public Star(int x, int y, int z, string name)
    {
        this.X = x;
        this.Y = y;
        this.Z = z;
        this.Name = name;
    }
    public int X { get; private set; }
    public int Y { get; private set; }
    public int Z { get; private set; }

    public double? DistanceToCompare { get; private set; }
    public string Name { get; private set; }
    public void CalculateDistanceToCompare(Star point2)
    {
        Func<int, double> squared = (value) =>
        Convert.ToDouble(value) * Convert.ToDouble(value);

        this.DistanceToCompare = Math.Sqrt(
            (squared(this.X - point2.X)) +
            (squared(this.Y - point2.Y)) +
            (squared(this.Z - point2.Z)));
    }

    public static Star Create(string value)
    {
        var parts = value.Split(new string[] { "," },
            StringSplitOptions.RemoveEmptyEntries);

        return new Star(
            int.Parse(parts[0]),
            int.Parse(parts[1]),
            int.Parse(parts[2]),
            parts[3]);
    }
    public override string ToString()
    {
        return string.Format("{0},{1},{2},{3}", X, Y, Z, Name);
    }

    public int CompareTo(Star other)
    {
        if (!this.DistanceToCompare.HasValue ||
                !other.DistanceToCompare.HasValue)
            throw new ArgumentException("No DistanceFromBaseStart to compare exception");

        if (this.DistanceToCompare.Value < other.DistanceToCompare.Value) return -1;
        if (this.DistanceToCompare.Value > other.DistanceToCompare.Value) return 1;
        return 0;
    }
}

Now we can create a given star and compare that star against any other star. But we still need a way to track the top 10 closest so what is the other part of the solution. What is needed is a data structure that can hold 10 items and each time a new item is added the maximum value moves to the top spot and if that structure has more then 10 items the maximum value can be removed and the new max value will move to the top spot.

The data structure that meets this criteria is a Max Heap AKA Priority Queue. If you do some searching in Google you can easily find implementations. Since I was recently reading "Problem Solving in Data Structures & Algorithms Using C#" by Hemant Jain I decided to use his implementation.

As you can see from the example code below it's quite easy to enumerate each line of the file and create a new star. Then compare that star to the given star and check the priority queue if the new star is less than the max value. If it is enqueue the new star and if the priority queue size is bigger then ten then execute the method dequeue which will remove the max value.

Once each line has been read you need to sort the values left in the queue if you want the shortest first and voila you got it.

    private static void StarQuestion()
    {
        System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
        watch.Start();

        Star startToBeCompared = new Star(1, 2, 3, "My Little Star");
        PriorityQueue<Star> queue = new PriorityQueue<Star>(false);

        using (var reader = new System.IO.StreamReader(".\\dump.csv"))
        {
            while (!reader.EndOfStream)
            {
                var newStar = Star.Create(reader.ReadLine());
                newStar.CalculateDistanceToCompare(startToBeCompared);
                if (queue.Count < 10)
                {
                    queue.Enqueue(newStar);
                }
                else
                {
                    if (queue.First().DistanceToCompare > newStar.DistanceToCompare)
                    {
                        queue.Enqueue(newStar);
                    }
                }

                if (queue.Count > 10) queue.Dequeue();
            }
        }
        queue.
            Take(10).
            OrderBy(k => k.DistanceToCompare).
            Select(item => string.Format("{0} - {1}", item.Name, item.DistanceToCompare)).
            ToList().
            ForEach(item => Console.WriteLine(item));


        watch.Stop();
        Console.WriteLine(watch.Elapsed.ToString());
        Console.WriteLine("Complete");

    }

The priority queue code listed below is from the Hemant Jain's book I mentioned earlier in the post with some minor naming modifications to make it clearer to me. If you are interested in Algorithms in C# I highly recommend the book. I find it complements MIT's Open Course Ware 6-006 Introduction To Algorithms which is from a Python perspective.

I hope you enjoyed this post.

class PriorityQueue<T> : ICollection<T> where T: IComparable<T>
{
    private const int CAPACITY = 16;
    private int size; // Number of elements in PriorityQueue
    private T[] queueStorage; // The PriorityQueue array
    bool isMinHeap;

    public PriorityQueue(bool minHeap = true)
    {
        queueStorage = new T[CAPACITY];
        size = 0;
        isMinHeap = minHeap;
    }
    public PriorityQueue(T[] array,bool minHeap = true)
    {
        size = array.Length;
        queueStorage = new T[array.Length + 1];
        isMinHeap = minHeap;
        Array.Copy(array, 0, queueStorage, 1, array.Length);

        for (int i = (size / 2); i > 0;i--)
        {
            PercolateDown(i);
        }
    }
    private int Compare(int first, int second)
    {
        if (isMinHeap)
            return queueStorage[first].CompareTo(queueStorage[second]);
        else
            return queueStorage[second].CompareTo(queueStorage[first]);

    }
    private void PercolateDown(int position)
    {
        int leftChild = 2 * position;
        int rightChild = leftChild + 1;
        int small = -1;
        T temp;

        if (leftChild <= size)
        {
            small = leftChild;
        }
        if(rightChild <= size && Compare(rightChild,leftChild)< 0)
        {
            small = rightChild;
        }
        if(small != -1 && Compare(small,position) <0)
        {
            temp = queueStorage[position];
            queueStorage[position] = queueStorage[small];
            queueStorage[small] = temp;
            PercolateDown(small);
        }
    }
    private void PercolateUp(int currentPosition)
    {
        int parentPostion = currentPosition / 2;
        if(parentPostion == 0) return;

        if(Compare(parentPostion,currentPosition) > 0) //parent greater that child.
        {
            T temp = queueStorage[currentPosition];
            queueStorage[currentPosition] = queueStorage[parentPostion];
            queueStorage[parentPostion] = temp;
            PercolateUp(parentPostion);
        }
    }
    public virtual void Enqueue(T value)
    {
        if(size == queueStorage.Length -1)
        {
            DoubleStorageCapacity();
        }

        queueStorage[++size] = value;
        PercolateUp(size);
    }
    private void DoubleStorageCapacity()
    {
        T[] oldQueueStorage = queueStorage;
        queueStorage = new T[queueStorage.Length * 2];
        Array.Copy(oldQueueStorage, 1, queueStorage, 1, size);
    }
    public virtual T Dequeue()
    {
        if (IsEmpty())
        {
            throw new InvalidOperationException("HeapEmptyException");
        }

        T value = queueStorage[1];
        queueStorage[1] = queueStorage[size];
        size--;
        PercolateDown(1);
        return value;
    }
    public virtual bool IsEmpty()
    {
        return (size == 0);
    }
    public virtual void Print()
    {
        for (int index = 1; index <= size + 1; index++)
        {
            Console.WriteLine("value is ::" + queueStorage[index]);
        }
    }
    public int Count
    {
        get { return size; }
    }
    public static void Sort(T[] array)
    {
        PriorityQueue<T> hp = new PriorityQueue<T>(array);
        for(int i =0; i < array.Length; i++)
        {
            array[i] = hp.Dequeue();
        }
    }
    public void Add(T item)
    {
        Enqueue(item);
    }

    public bool IsReadOnly
    {
        get { return false; }
    }
    public void Clear()
    {
        size = 0;
    }
    public T GetItem(int index)
    {
        return queueStorage[index];
    }
    public bool Contains(T item)
    {
        for(int i = 1;i <= size; i++)
        {
            if (queueStorage[i].Equals(item)) return true;

        }
        return false;
    }
    public void CopyTo(T[] array,int arrayIndex)
    {
        Array.Copy(queueStorage, 0, array, arrayIndex, size);
    }
    public bool Remove(T item)
    {
        for(int i =1; i <= size; i++)
        {
            if (queueStorage[i].Equals(item))
            {
                queueStorage[i] = queueStorage[size];
                size--;
                PercolateDown(i);
                PercolateUp(i);
                return true;
            }
        }
        return false;
    }
    public IEnumerator<T> GetEnumerator()
    {
        return new PriorityQueueEnumerator<T>(this);
    }
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    internal class PriorityQueueEnumerator<T> : IEnumerator<T> where T : IComparable<T>
    {
        private PriorityQueue<T> priorityQueue;
        int index;
        T current;

        public PriorityQueueEnumerator(PriorityQueue<T> pq)
        {
            this.priorityQueue = pq;
            current = default(T);
            index = 1;
        }
        public T Current
        {
            get { return current; }
        }
        object IEnumerator.Current
        {
            get
            {
                return this.Current;
            }
        }
        public void Dispose()
        {
            priorityQueue = null;
            current = default(T);
            index = 1;
        }
        public bool MoveNext()
        {
            //Console.WriteLine("hit");
            if (index <= priorityQueue.Count)
            {
                current = priorityQueue.GetItem(index);
                index++;
                return true;
            }
            return false;
        }
        public void Reset()
        {
            current = default(T);
            index = 1;
        }

    }
  }