Python

In Python, pip (short for “Pip Installs Packages”) is a package management system that allows users to easily install and manage libraries and dependencies for Python projects.

With pip, you can install packages from the Python Package Index (PyPI) or from local package files. PyPI is a repository of Python packages that can be installed with pip. It contains thousands of open-source packages that can be used for various purposes, such as data analysis, machine learning, web development, and more.

A wheel in Python is a package format for distributing Python libraries. It is a built distribution format, which means that it contains pre-built and pre-compiled versions of the library, making installation faster and more efficient.

A wheel file has the file extension .whl, and it contains the library code, as well as metadata such as version and dependencies. When you install a wheel package, pip will look for a wheel that is compatible with your system and install it directly, instead of building the package from source.

This is particularly useful for large libraries or libraries with many dependencies, as building them from source can take a long time and require additional dependencies to be installed.

Wheel files are useful when the user wants to share a package with others, or when you want to distribute a package to other users, because it makes the installation process faster and easier.

Here is a temporary collection of useful tips for Python

$: pip3 install –upgrade pip

$: pip3 cache purge

$: pip3 install –upgrade numpy

$: pip3 install scikit-learn

$: pip3 uninstall scipy

$: pip3 install –upgrade scipy

$: pip3 install –upgrade scikit-learn

$: pip3 install pandas

$: pip3 install nltk

Python Pandas

Pandas is a Python library that provides data structures and data analysis tools. The two main data structures in pandas are the Series and DataFrame. A Series is a one-dimensional array-like object that can hold any data type, while a DataFrame is a two-dimensional table of data with rows and columns. Pandas provides a variety of functions and methods for manipulating and analyzing data, including reading and writing data to/from various file formats (such as CSV, Excel, and JSON), filtering, aggregation, and more. It is a very powerful and widely used library for data manipulation and analysis.

Scikit-learn

Scikit-learn, also known as sklearn, is a Python library for machine learning. It provides a wide range of tools for tasks such as classification, regression, clustering, and dimensionality reduction. It is built on top of other popular Python libraries such as NumPy and pandas, and is designed to be easy to use and consistent across different algorithms.

The library includes a wide range of supervised and unsupervised learning algorithms, including popular ones such as linear regression, k-means, decision trees, and Random Forest. It also includes tools for model evaluation and selection, such as cross-validation and metrics for classification and regression.

Scikit-learn is a widely used library in the data science and machine learning community and is considered to be one of the most comprehensive libraries for machine learning in Python.

Tf-Idf Vectorizer

In scikit-learn, a Tf-Idf Vectorizer is a class that can be used to convert a collection of raw documents (i.e., a list of strings) into a numerical representation, called a Tf-Idf matrix. This matrix can then be used as input to a machine learning model.

Tf-Idf stands for “term frequency-inverse document frequency”. It is a numerical statistic that is intended to reflect how important a word is to a document in a collection of documents.

The term frequency (tf) is the number of times a word appears in a document. The inverse document frequency (idf) is a measure of how rare a word is across all documents. The product of these two values is the Tf-Idf value for a given word in a given document.

The Tf-Idf Vectorizer in scikit-learn converts a collection of raw documents into a Tf-Idf matrix by:

Tokenizing the documents (i.e., splitting them into individual words)
Building a vocabulary of all the words in the documents
Counting the number of occurrences of each word in each document
Computing the Tf-Idf values for each word in each document
Representing each document as a vector of Tf-Idf values
The resulting matrix has one row for each document and one column for each word in the vocabulary. The value at the intersection of a row and a column is the Tf-Idf value for the corresponding word in the corresponding document.

The Tf-Idf Vectorizer can also be used in text classification, clustering, and information retrieval tasks, as it provides a way to convert text into numerical features that can be used as input to machine learning algorithms.

Category : Knowledge Base


Crontab Examples

A crontab (cron tables) is a configuration file that specifies shell commands to run periodically on a given schedule. The commands in a crontab file (also known as a “cron job”) are executed by the cron daemon, a built-in Linux utility that runs processes on your system at a scheduled time.

Each line in a crontab file represents a separate cron job and follows a specific format, consisting of six fields separated by spaces:

* * * * * command

The fields represent the following:

  1. Minute (0-59)
  2. Hour (0-23)
  3. Day of the month (1-31)
  4. Month (1-12)
  5. Day of the week (0-6, with 0 being Sunday)
  6. The command to be run

The asterisks in the first five fields indicate that the command should be run every minute, every hour, every day of the month, every month, and every day of the week, respectively.

To edit your crontab file, you can use the crontab -e command, which will open the file in a text editor. When you are finished editing the file, save and exit the editor to activate the changes.

Crontab Examples

To run a command at 4:00am every day, the entry in the crontab file would be:

  
0 4 * * * command
  

To run a command every Monday at 6:00am, the entry would be:

  
0 6 * * 1 command
  

To run a command every 15 minutes using a crontab, you can use the following entry:

  
*/15 * * * * command
  

To run a command every 4 hours using a crontab, you can use the following entry:

  
0 */4 * * * command
  

To run a command at 3:15am every day using a crontab, you can use the following entry:

  
15 3 * * * command
  

Category : Knowledge Base


Linux Disk Usage

To display the amount of disk used in Linux in megabytes, you can use the df command with the -h option. This will display a list of all mounted filesystems on the system, along with the total size of the filesystem, the amount of space used, and the available space. The -h option stands for “human-readable”, and it causes the df command to display the sizes in a more readable format (e.g. “1G” for 1 gigabyte).

You can also use the –total option to display a grand total of all used and available space on all filesystems.

  
df -h --total
  

Category : Knowledge Base


Linux Memory Usage

To display the memory used by applications in Linux, you can use the ps command with the aux option. To sort the list by memory usage, you can use the –sort option and specify the rss (resident set size) field. The resident set size is the amount of memory that is currently being used by a process, in kilobytes.

Here is an example of the ps command with the –sort option:

  
ps aux --sort -rss
  

This will display a list of processes sorted in descending order by memory usage.

You can also use the top command to display a real-time view of the memory usage of processes on the system. The top command displays a list of processes, along with their CPU and memory usage, and updates the list periodically.

To sort the list by memory usage, you can press the M key.

Category : Knowledge Base


MySQL

MariaDB is a fork of the MySQL database management system. It was created as a community-driven alternative to MySQL, after concerns arose over its acquisition by Oracle Corporation.

MySQL Configuration Settings

MySQL Configuration settings are located at: /etc/mysql/my.cnf

If MariaDB is installed, the my.cnf file likely references: /etc/mysql/mariadb.conf.d/50-server.cnf

MySQL Memory Configuration

To increase the amount of memory available to MySQL on Linux, you can use the –innodb_buffer_pool_size option in the MySQL configuration file (e.g. /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf) and then restart the MySQL server to apply the change.

innodb_buffer_pool_size=256M

MySQL Tuner

MySQLTuner is a script written in Perl that allows you to review a MySQL installation quickly and make adjustments to increase performance and stability.

To install: sudo apt-get install mysqltuner

To run: sudo mysqltuner

MariaDB [(none)]> SHOW VARIABLES;

MariaDB [(none)]> select information_schema.system_variables.variable_name, information_schema.system_variables.default_value, global_variables.variable_value from information_schema.system_variables,information_schema.global_variables where system_variables.variable_name=global_variables.variable_name and system_variables.default_value <> global_variables.variable_value and system_variables.default_value <> 0;

Category : Knowledge Base


Linux Error Logs

To view the error log file on Linux and filter for errors related to a specific website, you can use the tail command in combination with the grep command.

You can also use the -i option with grep to perform a case-insensitive search, so that lines containing “www.example.com”, “WWW.EXAMPLE.COM”, and “WwW.ExAmPlE.CoM” would all be displayed.

Here is an example of the command with the -i option:

  
tail -f /path/to/error.log | grep -i "www.example.com"
  

On Linux systems, there are several common error logs that you might want to check when troubleshooting issues. Here are some examples:

  • /var/log/syslog: This is the general system log file, where various system messages are logged.

  • /var/log/auth.log: This log file contains messages related to authentication and authorization, such as login and logout messages.

  • /var/log/kern.log: This log file contains messages related to the Linux kernel, such as system startup and shutdown messages, as well as hardware and driver-related messages.

  • /var/log/cron.log: This log file contains messages related to the execution of cron jobs.

  • /var/log/messages: This log file contains miscellaneous messages that are not logged elsewhere.

  • /var/log/apache2/access_log: If you are running the Apache web server, this log file contains the web server’s access log, which records incoming HTTP requests.

  • /var/log/apache2/error_log: If you are running the Apache web server, this log file contains the web server’s error log, which records errors and problems encountered by the web server.

Keep in mind that these log files are just examples, and the actual log files on your system may be named differently or may be located in a different directory.

Category : Knowledge Base