Codes, Packages, and Scripts

Naeem Khoshnevis
Collabrative Filter - Recommendation System based on Breese et al (1998)

Key Words(s): Collabrative filter, Recommendation system, Matlab


Summary:

This package represents the implementation of the collaborative filtering algorithm of Breese et al (1998) in Matlab programming language. Based on input data, I compute the unique vector of users and movies. The movie-rating matrix (movie_rate_mat) which rows are representing users and columns are representing movies are being populated based on input data. I define a logical matrix (rated_mat) with 1 suggesting that specific user rated the movie and 0 otherwise. Mean value of all ratings per user is calculated. In order to compute the predicted vote (equation 1), I generate the user weight matrix (user_weight_mat) according to the equation 2. This section is the most time consuming part of the code. Therefore, the code first looks for the data structure of the input file, if it finds it in the same folder, it will use it. Otherwise it will generate the necessary variables that will be time consuming. The correlation between users is defined based on movies for which users have recorded votes. In practice I compute the all movies weight and multiply the rated vector of each user, therefor only common rated items remain. Upon running the program (providing some time for loading the data) user can input the user id and number of recommended movies through the following format:

> userId, number of recommend movies

If there are fewer movies in the database (than requested), program will return the available items. Use 0,0 for termination.

Go to the GitHub page of the package.

Reference: Breese, John S., David Heckerman, and Carl Kadie. "Empirical analysis of predictive algorithms for collaborative filtering." Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1998.



1D Finite Element Site Response Analysis (Linear + Equivalent Linear)

Key Words(s): Site Response, Finite Element, Linear, Equivalent Linear


Summary:

A 1D site response analysis code to conduct linear and equivalent linear site response analysis (written in Matlab). I developed the code to have an open source platform to test different damping model and also compare simplified 3D equivalent linear method including DRM element with 1D solutions. The code solves the elastic wave equation by approximating the spatial variability of the displacements and the time evolution with finite elements (1D linear element) and central differences, respectively. The Newmark implicit solution in time domain is also implemented. Here are some features of the program:

  • FE (1D linear element) + Central differences (time)
  • No limitations in number of soil layers and properties
  • Equivalent linear method with user defined damping and shear modules degredation curve
  • Rayleigh, Extended Rayleigh and BKT dampings model
  • Different plotting options for comparison purposes
  • Wave traveling animation generator
  • Wave traveling snapshot plots
  • Automatic name and serial number generator to save the resutls
  • Verified with Deepsoil and Seismosoil

  • Go to the GitHub page of the package.



    Prediction Application for Easier Typing

    Key Words(s): NLP, R, Quanteda, ShinyApp, Katz back-off method


    Summary:

    As a part of a data science specialization capstone project (by Johns Hopkins University and Coursera), I developed a prediction application for easier typing. The application receives a word or sequence of words as an input and predicts the most probable upcoming word. In order to generate the n-grams, I used a corpus of formal and informal contemporary American English (including news, blogs, and Twitter). The probability of unseen words is assigned using the Katz back-off method. The application is uploaded on the Shiny server through the following link:
    https://naeem.shinyapps.io/shinyapp-NLP/
    Please take a look at the application and let me know your thoughts. If you are interested in details, please refer to the application repository on my Githiub account.
    https://github.com/Naeemkh/DataScienceCapstone
    I used quanteda package in R to process the corpus. If you are interested in R programming, natural language processing (NLP), regular expression, or developing a Shiny application, I encourage you to take a look at the source codes. In order to run the application in your computer, clone the repository and source the myapp.r file.



    Matlab Program in Unix Platform

    Key Words(s): Matlab, Unix, Shell script, Data Processing

    Summary:

    Data processing is an important step of many scientific studies. Type and size of data as well as the type of processing can determine the processing method. In seismological studies, we mostly deal with numerical data with different customized processing. In many cases, users write a function in a programming language (e.g. MATLABR , Python, C, Fortran) and process the data. Depending on the type of data and processing, we may need several steps of processing data in a project. Also we may have to change a parameter and repeat the processing several times. Writing down the processing steps is a good practice to avoid confusion, keep track of the processing, and be able to return back and look for the probable bugs in case of wrong results. However, since it is not automated, you may unintentionally skip one step of processing or forget to properly document it. Even with complete documentation, in the case of finnding the problem in processing steps, you need to repeat the whole process and go through all the steps.


    Go to the GitHub page of the package.

    Probabilistic Seismic Hazard Analysis (PSHA)

    Key Words(s): PSHA, GMPE, Hazard Curve, MATLAB

    Summary:

    A Matlab code for conducting Probabilistic Seismic Hazard Analysis. Seismic source could be point, line, area source or any number and combination of them. Tavakoli and Pezeshk (2005) GMPE is implemented. However, one cand add his/her own GMPE. Final results are the plot of study region, source-to-site distance distribution and hazard curve.


    Go to the GitHub page of the package.