Contact Us


100,000 Online Courses

Explore a variety of fresh topics

Expert Instruction

Find the right instructor for you

Lifetime Access

Learn on your schedule

Student are Viewing

Enroll, Learn, Grow, Repeat! Get ready to achieve your learning goals with Gyansetu

Recent Additions

What our students have to say

View All

Popular Instructors

View All

Gyansetu Advantages

Logic building that brings real transformation

Problem solving is the key essential need of any programmer. A good coder has strong analytical thinking, logical and mathematical skills.

Instructor-led Classroom training experience

Take live, structured classroom & online classes from the convenience of wherever, with instant, one-on-one help.

Faculty having exposure with top companies

We deliver training by experts from top companies like Microsoft, Amazon, American Express, Mckinsey, Barclays & more.

Career Support

We connect our student to software companies via our placement assistance program.

Master Course


Job Profiles & Responsibilities of Data Scientist at Microsoft, Google, Amazon

Data scientists are in huge demand for the skill they bring to the table. Businesses, medium or large scale expect to grow bigger with the insights provided by them. Analyzing the business from every angle progressively becomes possible with the help of a data scientist. So What is a Data Scientist Job about? Data Scientist Responsibilities They have to work hand in hand with stakeholders for understanding their objectives and how data can help in achieving them. The Data Scientist Job requirements include the following 1. Gathering data and prompting the proper questions for further data cleaning and processing. 2. Conducting data investigation and exploratory analysis after storing and integrating data. 3. Implementing techniques including machine learning, artificial intelligence and statistical modeling, after selecting potential algorithms and models. 4. Showcasing the final outcome after improvising and measuring the results, doing necessary adjustments if required and repeating the procedure. Different Job Profiles in Data Science These are some common Career paths, which includes the data scientist role 1. Data Architect: responsible for creating, designing and managing the data architecture of an organization. 2. Data Analyst: maneuvering massive data sets for recognizing trends and making significant conclusions for insights and business-related decisions. 3. Business Intelligence Expert: gathering patterns from the data. 4. Data Scientist: performs data modeling for creating predictive modeling and algorithms, also deals with customized analysis. 5. Data Engineer: they deal in organizing, cleaning and aggregating data from diverse sources, further shifting them to data warehouse. Data Scientist at Microsoft Microsoft has a sub-department named applied sciences and data that is classified under engineering. Teams are divided based on the major titles which include machine learning engineer, data scientist and applied scientist. These are some general functions: 1. Writing code to be implemented by data scientists for machine learning algorithms. Also, for forwarding the models towards production. 2. Dealing with experimentation, product features, metrics, customers (direct or indirect), technical issues. Data scientist jobs at Microsoft are team-based, for instance, one team is dedicated to machine learning while the other deals with analytics. Data Scientist required skills at Microsoft General requirements are Bachelor/ Master Degree in a quantitative field. For a mid-level role, a 2-year experience is preferred. 1. Prior experience in reinforcement Learning, casual inference, DNN, time series, network Analysis, NLP or in other relative fields. 2. Substantial experience in Azure or AWS, the cloud-based architecture. 3. Proficiency in R, Python, SQL, NumPy, Spark, SciPy or C# or similar numerical programming language. Data Scientist at Amazon Like any other global firm, Amazon has departments set up for everything. Data scientists join a specific team, but regardless of that, they all have some similarities. It includes a background in Statistics, programming, analytics, mathematics, computer science, and scripting languages like Java, Python etc. They also possess a thorough understanding of artificial intelligence and machine learning algorithms. Some specializations available include: 1. Amazon Web services: the data scientist assists AWS customers by creating ML models and tending their business requirements. 2. Alexa: here the data scientist is expected to be proficient in natural language processing plus information retrieving. This is needed for training the AI to comprehend the commands in several languages. 3. Demand forecasting: the data scientist allotted here has to develop algorithms that comprise learning from huge data like product attributes, prices, similar products, promotions for predicting the demand of millions of products on Amazon. They are expected to collaborate with information architects, marketers, data engineers, designers, software developers. Levels of Data Scientist at Amazon 1. Entry-level: This position is often held by those who are still studying or are there for internships. They need proficiency in one language PHP, Java or Python plus some working knowledge of SQL. They should be adept at dealing with analytical problems via a quantitative approach. 2. Senior-level: Apart from management roles, this level requires degrees in Statistics, engineering, computer science, economics, mathematics. For a specialized role, expertise in computer vision and natural language processing can be expected along with work experience in analytics. Data Scientist at Google The data scientist has to be either product-oriented or analysis-oriented. Product Analyst They usually have domain and other specialized knowledge. They work on: 1. Consumer's different choices and sentiments 2. Product Popularity and failure reasons 3. Target market statistics 4. Datasets (external and internal) Quantitative Analyst They generally have degrees in mathematics, quantitative study, and Statistics. They work on: 1. Product research 2. Forecasting customer lifetime value 3. A/B experiments 4. Modifying search algorithms 5. Estimating future internet reach in different countries 6. Statistical modeling Data Scientist Responsibilities at Google 1. Inspecting and improvising the products. Collaborating with engineers and analysts. Working on big data sets 2. Conducting requirements specifications, ongoing deliverables, data gathering, processing, presentations and analysis 3. Implementing optimization and forecasting, R&D Analysis. Doing cost-benefit recommendation, communicating inter-functionally 4. Conducting presentation findings with experimental analysis, displaying quantitative information related to stakeholders. 5. Understanding metrics and data structures, recommending necessary product development changes. 6. Prototyping Analysis pipelines and building iteratively for insights. Data Scientist Salaries These are the mean salaries in different firms for data scientists. 1. The average Data Scientist Microsoft salary is around 25 lakhs + Stocks annually 2. The average Data Scientist Amazon salary is around 23 lakhs + Stocks yearly 3. The average Data Scientist Google salary is around 24 lakhs + Stocks for a year   Want One-On-One session with Instructor to clear doubts and help you clear Interviews? Contact Us on +91-9999201478 or fill the Enquiry Form Data Science Instructor Profile Check Machine Learning Course Content

Machine Learning Project Ideas for 2022

Machine learning is expected to take over large-scale production in almost every field as it’s constantly evolving. There are hundreds of machine learning projects suggestions which when implemented can save a ton of time via automation. A practical machine learning project can open the doors to newer horizons and improve productivity. In the recent past, massive breakthroughs have resulted in the realm of technology and with these machine learning project ideas, it’s going to make businesses smoother and operations optimal. Online Fake Logo Detection This idea is great as it helps in 1. Assisting customers with product verification before making a purchase, thus preventing them from being swindled. 2. The design is user-friendly allows normal people to utilize it. 3. Piracy and Logo copycats can cause confusion among customers, this system will give the firms control over forgeries. However, incorrect input can yield wrong output. Plagiarism Checker Copied content is a common problem, which makes this project worth it. A detector can be built this way. 1. Loading plagiarized data corpus. Exploring data distribution and existing features. Further, preprocessing and cleaning the data. 2. Defining and extracting features for similarity comparison of answer and source text. Analyzing correlation, selecting features and creating .csv files. 3. Uploading test feature to S3, defining training script plus binary classification model. Train & deploy model via SageMaker then evaluate. Uber Pickup Analysis This project can help in identifying patterns as to which hour is the busiest, or the maximum trips/pickups. This can be done as follows 1. Import the dataset and libraries. Categorize into hours and days. The number of pickups should likely increase on weekdays. 2. The hourly data should show fewer pickups from midnight to 6 am, from then increases, making 6 in the evening a peak hour. 3. The data would show Saturday with the least pickups, Sunday, substantially more for leisure, and Monday for the most work-related pickups. Stock Price Prediction This project will allow determining the near future value that is held by the stock. Estimation can be done using ml algorithms and long short-term memory (LSTM) 1. Import libraries and start by visualizing the stock data, Print DataFrame shape to detect null values. 2. Select features & set target variables. Make test & training sets, process data for LSTM then build & train the model for predictions. 3. Make comparisons between the true adjusted values and the predicted. Sentiment Analyzer This idea is useful for businesses as many users express their views regarding a product, service or a company/organization. Analyzing a sentiment reveals if a user is satisfied with the product or not, thus rendering what’s hot and what’s not in terms of demand. It can be done as follows 1. Choose a classifier model, import data. 2. Train the analysis classifier by tagging tweets if needed, then test the classifier. Customer Segmentation It is an evergreen project that not only maximizes clarity for businesses but also benefits customers. Customer segmentation has numerous types ranging from demographics to psychology, techno-graphics, behavioral, geographical etc. With these steps, customer segmentation can be done. 1. Design, categorize a business case. Prepare data after collecting. Then segment via k-means clustering. 2. Tune the model’s hyperparameters and visualize the results. Recommendation System The recommendation system is time-saving and efficient for the customers. Correlated items and other varieties are easily accessible. A system of movie recommendations can be built in this manner. 1. Collect data needed for building the model. Reverse map titles and indices. 2. Test the content-based recommending system. Churn Prediction The churn rate is the pace at which entities are opting out of an organization over a period of time. The churn prediction allows identifying the issues of customers, their pain points, and those who are at the highest risk. This prediction requires a workflow, it can be done as follows 1. Define the issues and the objective, gather sources for data such as CRM systems or customer feedback. 2. Prepare data and explore. Further, preprocess it for modeling and tests. Deploy the model and monitor if required. YouTube Video Classification Plenty of videos exist on YouTube and without proper classification, they would not be found in searches. The categorization helps to index the videos into relevance. The classification system can be built by 1. Collecting data and setting up, then defining the hyperparameters and preparing the data. 2. Using a network pre-trained for extracting relevant features. Feed data into the sequence model. Text Summarizer This is also an evergreen project. Trying to access the gist from a large piece of article can be time-consuming which brings the need for a text summarizer for quick results. The summarizer can be created in these ways 1. Start with data preparation and then process it, basic clean it. Then do article tokenization into sentences. 2. Further, locate their respective weighted frequency, then do threshold calculation, and generate a summary. Image Regeneration for old and damaged reels Doing damaged image repair manually can be a cumbersome task, one that requires skill. But with deep learning, these defects in imagery and reels can be easily corrected via inpainting algorithms. It can help in 1. Colorizing the black and white pics, the areas where pigment has eroded. Different anomalies include tears, holes, scuffs. 2. Pixel values can be altered and old photos can be transformed into a newer edit. Techniques used for restoration 1. SC-FEGAN: it’s useful in face restoration, filling the void with the most probable pixels. 2. EdgeConnect: it utilizes adversarial edge learning to continue over minor imperfections. 3. Pluralistic image completion: yields various outcomes when dealing with hugs gaps. Music Generator Music is a creative human pursuit, however, it can also be generated using LSTM neural networks. This can be done as follows 1. Collect data (royalty-free midi music), use python toolkit music21, Keras for extracting midi files data. 2. Implement LSTM, set the sequence length to 100. Set further 500 note sequences for extending music duration. The repeat feature of this recurrent neural network will generate music. Want One-On-One session with Instructor to clear doubts and help you clear Interviews? Contact Us on +91-9999201478 or fill the Enquiry Form Data Science Instructor Profile Check Machine Learning Course Content

15 Machine Learning Interview Questions (with Answers) for Data Scientists

Data science is a progressive field that deals with handling large chunks of data that normal software fails to do. Although machine learning is a vast field in itself, machine learning interview questions are a common occurrence in a job interview of a data scientist. Some very basic data scientist interview questions deal with various aspects of it, including Statistics and programming. Over here, we will focus on the machine learning part of data science. Machine Learning Interview Questions 1. Differentiate between supervised learning and unsupervised learning These are some notable differences between the two. Supervised Learning Unsupervised Learning Trained on labeled dataset Trained on unlabeled dataset Algorithms used: regression and classification Algorithms used: clustering, association and density estimation Suited for predictions Suited for analysis Maps input to the known output labels Finds hidden patterns and discovers the output   2. Define logistic regression with example Also known as the Logit model, it’s used for predicting a binary outcome from predictor variables having a linear combination. For instance, predicting a politician's victory or defeat in an election is binary. The predictor variables would be time spent in the camp and total money used for the camp. 3. How do classification machine learning techniques and regression differ? These are the key differences Classification Regression Target variables can have discrete values Target variables can have continuous values, usually real numbers Evaluated by measuring accuracy Evaluated by measuring root mean square error   4. What is meant by collaborative filtering? The kind of filtering done by recommender systems for fetching information or patterns, by integrating data sources, agents, and viewpoints is called collaborative filtering. For example, predicting a user’s rating based on his recommendations/ratings for other movies. This technique is very commonly used in referring to sites like Bookmyshow, IMDb, Amazon, Snapdeal, Flipkart, Netflix, YouTube, etc. 5. What are the numerous steps in an analytics project? These are the steps taken in an analytics project:- 1. Comprehending the business problems. 2. Transforming the variables, outlier detection for Data preparation for modeling, checking missing values. 3. Analyzing the outcome, using tweaked approaches after running the model, this is done for achieving a good outcome. 4. Validation of the model via a few data sets. Further, implementing the model and analyzing its performance over a specific duration. 6. Explain in brief a few types of ensemble learning There are several types of ensemble learning, below are some of the more common types. Boosting An iterative technique that helps in weight adjustment of a particular observation based on previous classification. In case, the classification is incorrect, then observation weight is increased. This helps in building reliable predictive models, as it reduces the bias error, but there’s also a possibility of overfitting into the training data. Bagging It attempts to implement learners on a particular sample bunch, further taking a mean of the productions. One can implement other learners on varying bunches in generalized bagging, this prevents some of the variance errors. 7. Describe box-cox transformation In a regression analysis, the dependent variable might not be able to satisfy ordinary least square regression assumptions. The residuals could be following the distribution (skewed) or curve, in case the prediction increases. In such scenarios, the transformation of response variable becomes a necessity in order for data to meet specific assumptions. The box-cox transformation relates to Statistical techniques for transforming dependent, non-normal variables to a conventional shape. When the available data is unconventional, then many statistical techniques assume normality. Numerous tests can be run when box-cox transformation is applied, it’s a method for transforming unconventional, dependent variables into a more conventional shape. This transformation gets its name from its developers, who were Statisticians. Sir David Roxbee Cox and George Box collaborated on a paper in 1964 developing this technique. 8. What’s Random Forest, and how does it function? Also, explain it's working. It’s a versatile method for machine learning that can do classification and regression both. It gets used in outlier values, dimensionality reduction, treating missing values. It’s a kind of ensemble learning method, wherein clusters of weak models integrate to build a powerful model. Numerous decision trees are created instead of a single tree in a random forest. For classifying new attribute-based objects, every tree provides a classification, and the one that has maximum votes (total trees in the forest) gets selected by the forest, as for regression average output of varying trees gets considered. Working of Random Forest This technique's main principle is that various weak learners combine to make a strong learner. The steps include:- 1. Randomly pick k records from the dataset. 2. Build a decision tree on these k records. 3. Repeat the above 2 steps for each decision tree you want to build (repeats for #trees to make) 4. Predictions are based on the majority rule. In regression problem, it predicts a value for output whereas, in classification problem, it predicts the class. 9. If you were to train a model using 10 GB of data set and had only 4 GB RAM, then how would you approach this problem? To start, it’s best to ask about the type of ML model that requires training. For SVM (partial fit will suit best) Follow these steps 1. Start by division of a large data set into smaller size sets. 2. Implement SVM's partial fit method, it will need the full data set's subset. 3. Repeat the second step for different subsets. For neural networks (NumPy array plus batch size will do) Follow these measures 1. In NumPy array, load the full data, NumPy array has a tendency to make mapping of the full data set. It doesn’t load into the memory, the full data set. 2. For attaining required data, pass index into the NumPy array. 3. Make use of this data for passing to neural networks. Maintain a smaller batch size. 10. In an analysis, how do missing values get treated? Once the variables having missing values get identified, the extent of the values that are absent also gets discovered. In case any patterns are picked, it becomes necessary for the analyst to pay attention as these could bring about a couple of significant and valuable business-related insights. And, if no patterns are discovered, then median or mean values can take place of the missing values, or it can be ignored. A default value can be allotted as maximum, minimum, or mean value. In case, the variable is categorical, the default value is assigned to the missing value. If data distribution is incoming, then a mean value is assigned for normal distribution. Also, if a variable’s 80% values seem missing, then it’s reasonable to drop the variable than treat the missing values. 11. How to treat outlier values? For detection of outlier values, some graphical analysis or univariate method can be used. If the outliers are large, then either the 1st percentile value or the 99th percentile can replace the values. Also if the outliers are fewer, then the individual assessment can be done. It should be noted that all outlier values are not necessarily extreme values. For treating outlier values, the values can either be modified and brought within range or they can be discarded. 12. Which cross-validation technique can be used on a time-series dataset? Rather than Implementing the K-Fold technique, one should know that time-series have an inherent chronological order, and is not some randomly distributed data. As far as time-series data is concerned, one can implement the forward-chaining technique, where one has to model previous data, then consider data that is forward-facing. fold 1: training[1], test[2] fold 1: training[1 2], test[3] fold 1: training[1 2 3], test[4] fold 1: training[1 2 3 4], test[5] 13. How often an algorithm requires updating? If these requirements call, then it’s suitable for an algorithm to be updated:- 1. Model evolution is a must as data runs through infrastructure. 2. Data source (underlying) is not constant. 3. A non-stationary case shows up. 4. Results don’t have good precision and accuracy as the algorithm doesn’t perform well. 14. List some drawbacks of linear model These are a few drawbacks of the linear model 1. For binary and count outcomes, it is not usable. 2. Error linearity assumptions occur too often. 3. Over-fitting problems that cannot be solved. 15. Describe SVM algorithm SVM (Support Vector Machine) is an algorithm (supervised machine learning) implemented for classification and regression. If one’s training data set has n features, then SVM does their plotting in a space that is n-dimensional, where every feature's value is a specific coordinate's value. SVM implements hyperplanes for the segregation of distinct classes. Want One-On-One session with Instructor to clear doubts and help you clear Interviews? Contact Us on +91-9999201478 or fill the Enquiry Form Data Science Instructor Profile Check Machine Learning Course Content

Top NLP (Natural Language Processing) Interview Question Answers

An Introduction to Natural language processing is fairly a good start for students who wish to bridge the gap between what’s human-like and what’s mechanical. Natural language processing is widely utilized in artificial intelligence and also implemented in machine learning. Its use is expected to go up in the coming years, along with rising job opportunities. Students preparing for natural language processing (NLP) should have a decent understanding of the type of questions that get asked in the interview.  1. Discuss real-life apps based on Natural Language Processing (NLP). Chatbot: Businesses and companies have realized the importance of chatbots, as they assist in maintaining good communication with customers, any queries that a chatbot fails to resolve gets forwarded. They help keep the business moving as they are used 24/7. This feature makes use of natural language processing. Google Translate: Spoken words or written text can be converted into another language, proper pronunciation is also available of words, Google Translate makes use of advanced NLP which makes all of this possible. 2. What is meant by NLTK? Natural language toolkit is a Python library that processes human languages, different techniques including tokenization, stemming, parsing, lemmatization are used for grasping the languages. Also used for classification of text, and assessing documents. Some libraries include DefaultTagger, wordnet, patterns, treebank, etc. 3. Explain parts of speech tagging (POS tagging). POS is also known as parts of speech tagging is Implemented for assigning tags onto words like verbs, nouns, or adjectives. It allows the software to understand the text, then recognize word differences using algorithms. The purpose is to make the machine comprehend the sentences correctly.  Example:- import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize, sent_tokenize stop_words = set (stopwords.Words('english')) txt = "A, B, C are longtime classmates."   ## Tokenized via sent_tokenize tokenized_text = sent_tokenize (txt)   ## Using word_tokenizer to identify a string’s words and punctuation then removing stop words for n in tokenized_text: wordsList = nltk.word_tokenize(i) wordsList = [w for w in wordsList if not w in stop_words]   ## Using POS tagger tagged_words = nltk.pos_tag(wordsList) print (tagged_words)   Output:- [(‘A’, 'NNP'), ('B', 'NNP’), ('C', 'NNP’), ('longtime', 'JJ’), ('classmates', 'NNS')]   4. Define pragmatic analysis In a given data of human language, different meaning exists, in order to understand more, pragmatic analysis is used for discovering different facets of the data or document. Actual meaning of words or sentences can be understood by the systems, and for this purpose pragmatic analysis is deployed. 5. Elaborate on Natural language processing components These are the major NLP components:- 1. Lexical/morphological analysis: word structure is made comprehensible via analysis through parsing. 2. Syntactic analysis: specific text meaning is assessed 3. Entity extraction: information like the place, institution, individual gets retrieved via sentence dissection. Entities present in a sentence get identified. 4. Pragmatic analysis: helps in finding real meaning and relevancy behind the sentences. 6. List the steps in NLP problem-solving The steps in NLP problem-solving include:- 1. Web scraping or collecting the texts from the dataset. 2. For text cleaning, making use of lemmatization and stemming. 3. Use feature engineering 4. Use word2vec for embedding 5. Using machine learning techniques or with neural networks, start training the models. 6. Assess the performance. 7. Do the required model modifications and deploy. 7. Elaborate stemming with examples When a root word is gained by detaining the prefix or suffix involved, then that process is known as stemming. For instance, the word 'playing' can be minimized to ‘play’ by removing the rest. Different algorithms are deployed for implementation of stemming, for example, PorterStemmer which can be imported from NLTK as follows:- from nltk.stem import PorterStemmer pst = PorterStemmer() pst.stem(“running”), pst.stem(“cookies”), pst.stem(“flying”)   Output:- (‘run’, 'cooki', 'fly' )   8. Define and implement named entity recognition For retrieving information and identifying entities present in the data for instance location, time, figures, things, objects, individuals, etc. NER (named entity recognition) is used in AI, NLP, machine learning, implemented for making the software understand what the text means. Chatbots are a real-life example that makes use of NER. Implementing NER with spacy package:- import spacy nlp = spacy.load('en_core_web_sm') Text = "The head office of Tesla is in California" document = nlp(text)  for ent in document.ents: print(ent.text, ent.start_char, ent.end_char, ent.label_)   Output:- Office 9 15 Place Tesla 19 25 ORG California 32 41 GPE   9. Explain checking word similarity with spacy package Spacy library allows the implementation of word similarity techniques for detecting similar words. The evaluation is done with a number between 0 & 1 (where 0 tends towards less similar and 1 tends toward highly similar). import spacy nlp = spacy.load('en_core_web_md') print ("Enter the words:") input_words = input() tokens = nlp(input_words) for i in tokens: print(i.text, i.has_vector, i.vector_norm, i.is_oov) token_1, token_2 = tokens[0], tokens[1] print("Similarity between words:", token_1.similarity(token_2))   Output:- hot  True 5.6898586  False cold True6.5396233 False Similarity: 0.597265 This implies that the similarity between the two words cold and hot is  59%. 10. Describe recall and precision. Also, explain TF-IDF. Precision and recall Precision, F1 and Recall, accuracy are NLP model testing metrics. The ratio of predictions with required output provides for a model's accuracy. Precision: The ratio of positive instances and total predicted instances. Recall: The ratio between real positive instances and total (real + unreal) positive instances. TF-IDF Term frequency-inverse document frequency is used for retrieval of information via numerical Statistics. It helps in identifying keywords present in any document. The real usage of it revolves around getting information from important documents using Statistical data. It’s also useful in filtering out the stop words and text summarizing plus classification in the documents. With TF one can calculate the ratio of term frequency in a document wrt total terms, whereas IDF implies the significance of the term. TF IDF calculation formula: TF = frequency of term 'W' in a document / total terms in document IDF = log( total documents / total documents with the term ‘W’) If TF*IDF appears higher then term frequency is likely less. Google implements TF-IDF for deciding search results index, which helps in optimization or ranking the relevant quality content higher. Want One-On-One session with Instructor to clear doubts and help you clear Interviews? Contact Us on +91-9999201478 or fill the Enquiry Form Data Science Instructor Profile Check Data Science Course Content

Top SQL Interview Questions with Answers for a Data Analyst Interview

Data analysts perform a variety of roles including providing reports using statistical methods and analyzing data, implementing systems for data collection, and developing databases, identifying trends, and interpreting complex data set patterns. SQL is the industry-standard language used by data analysts for providing data insights. In a job interview, SQL being a major component of data analysis features highly in the interrogation. These are some of SQL Query Interview Questions for Data Analyst that are frequently asked. Data Analyst interview questions and answers for freshers Consider the following tables Employee table employee_id full_name manager_id date_of_joining city 121 Shanaya Gupta 321 1/31/2014 Bangalore 321 Snehil Aggarwal 986 1/30/2015 Delhi   Salary table employee_id project salary variable 121 P1 8000 500 321 P2 10000 1000 421 P1 12000 0   1. Write a query fetching the available projects from salary table. Upon looking at the employee salary table, it is observable that every employee has a project value correlated to it. Duplicate values also exist, so a unique clause will be used in this case to get distinct values. SELECT DISTINCT(project) FROM Salary;   2. Write a query fetching full name and employee ID of workers operating under manager having ID 986 Take a look at the employee details table, here we can fetch employee details working under the manager with ID 986 using a WHERE clause. SELECT employee_id, full_name FROM Employee WHERE manager_id=986;   3. Write a query to track the employee ID who has a salary ranging between 9000 and 15000 In this case, we will use a WHERE clause, with BETWEEN operator SELECT employee_id, salary FROM Salary WHERE salary BETWEEN 9000 and 15000;   4. Write a query for employees that reside in Delhi or work with manager having ID 321 Over here, one of the conditions needs to be satisfied. Either worker operating under Manager with ID 321 or workers residing in Delhi. In this scenario, we will require using OR operator. SELECT employee_id, city, manager_id FROM Employee WHERE manager_id='321' OR city='Delhi';   5. Write a query displaying each employee's net salary added with value of variable Now we will require using the + operator. SELECT employee_id, salary+variable AS Net Salary FROM Salary;   6. Write a query fetching employee IDs available in both tables We will make use of subquery SELECT employee_id FROM Employee WHERE employee_id IN (SELECT employee_id FROM Salary);   7. Write a query fetching the employee’s first name (string before space) from the employee table via full_name column First, we will require fetching space character’s location from full_name field, then further extracting the first name out of it. We will use LOCATE in MySQL, then CHARINDEX in SQL server. MID or SUBSTRING method will be utilized for string before space Via MID (MySQL) SELECT MID(full_name, 1, LOCATE(' ', full_name)) FROM Employee;   Via SUBSTRING (SQL server) SELECT SUBSTRING(full_name, 1, CHARINDEX(' ', full_name)) FROM Employee;   8. Write a query fetching the workers who have their hands-on projects except for P1 In this case, NOT operator can be used for fetching rows that do not satisfy the stated condition. SELECT employee_id FROM Salary WHERE NOT project = 'P1';   Also, using not equal to operator SELECT employee_id FROM Salary WHERE project <> 'P1';   9. Write a query fetching name of employees who have salary equating 5000 or more than that, also equating 10000 or less than that Over here, BETWEEN will be used in WHERE for returning employee ID of workers whose remuneration satisfies the stated condition, further using it as subquery for getting the employee full name via the table (employee). SELECT full_name FROM Employee WHERE employee_id IN (SELECT employee_id FROM Salary  WHERE salary BETWEEN 5000 AND 10000);   10. Write a query fetching details of the employees who started working in 2020 from employee details table. For this, we will use BETWEEN for time period ranging 01/01/2020 to 31/12/2020 SELECT * FROM Employee WHERE date_of_joining BETWEEN '2020/01/01' AND '2020/12/31';   Now the year can be extracted from date_of_joining using YEAR function in MySQL SELECT * FROM Employee WHERE YEAR(date_of_joining) = '2020';   11. Write a query fetching salary data and employee names. Display the details even if an employee's salary record isn’t there. Here, the interviewer is trying to gauge your knowledge related to SQL JOINS. Left JOIN will be used here, with Employee table being on the left side of Salary table. SELECT E.full_name, S.salary FROM Employee E LEFT JOIN Salary S ON E.employee_id = S.employee_id;   Advanced SQL, DBMS interview questions These SQL interview questions for 6 years of experience can help you in your job application. 12. Write a query for removing duplicates in a table without utilizing the temporary table Inner join along with delete will be used here. Equality of matching data will be assessed further, the rows with higher employee ID will be discarded. DELETE E1 FROM Employee E1 INNER JOIN Employee E2 WHERE E1.employee_id > E2.employee_id AND E1.full_name = E2.full_name AND E1.manager_id = E2.manager_id AND E1.date_of_joining = E2.date_of_joining AND =;   13. Write a query fetching just the even rows in the Salary table If there’s an auto-increment field, for instance, employee_id, then the below-mentioned query can be used. SELECT * FROM Salary WHERE MOD(employee_id,2) = 0;   If the above-stated field is absent (auto-increment field), then these queries can be used. Verifying the remainder is 0 when divided with 2, and by using ROW_NUMBER (in SQL server) SELECT E.employee_id, E.project, E.salary FROM (       SELECT *, ROW_NUMBER()       OVER(ORDER BY employee_id) AS RowNumber       FROM Salary      ) E WHERE E.RowNumber % 2 = 0;   Using variable (user-defined) in MySQL SELECT * FROM (            SELECT *, @rowNumber := @rowNumber+1 RowNo      FROM Salary      JOIN(SELECT @rowNumber := 0) r      ) t WHERE RowNo % 2 = 0;   14. Write a query fetching duplicate data from Employee table without referring to employee_id (primary key) In this case, on all the fields, we will use GROUP BY, further HAVING clause will be used for returning duplicate data that has more than one count. SELECT full_name, manager_id, date_of_joining, city, COUNT(*) FROM Employee GROUP BY full_name, manager_id, date_of_joining, city HAVING COUNT(*) > 1;   15. Write a query creating the same structured empty table as any other Over here, false WHERE condition will be used. CREATE TABLE NewTable SELECT * FROM Salary WHERE 1=0;   The above mentioned are some of the most common SQL data analyst interview questions to prepare for entry-level, intermediate and advanced level jobs. Check the SQL Training Program

How Power BI is Better than Excel?

Analysis of business data is essential to make it big as far as commerce is concerned. Be it a small enterprise or a multinational company looking to widen its reach. Several businesses are waking up and realizing the significance of data analysis. Two of the most common used tools are Power BI and excel, choosing the right one to work with can be a bit cumbersome.   What is Power BI tool? Power BI is a product from Microsoft, which focuses on processing data associated with a business. To be more specific, it is a toolset that caters to the deeper demographics of a business, and its functional operations. It is often compared to Excel, as both are very similar in what they do. There are quite some differences as visualization in Power BI is far more appealing for instance, reports also are more concise. Power BI Advantages Power BI has a couple of advantages over Excel: 1. Exclusive data visualization tool 2. Designed keeping business intelligence as the focus 3. Handles large chunks of data easily 4. Can be used on mobile devices 5. Connects to several data sources 6. Quicker processing 7. Customizable dashboards 8. Better interactivity 9. Appealing visualization 10. In-depth comparison of data files and reports 11. User friendly 12. Actionable insights can be achieved thanks to incredible data visualization. 13. Facilitates the exploring of data via natural language query. Excel and its most common uses Microsoft Excel is ideal in many ways 1. Faster calculations: making formulas in data and doing calculations is quick work with Excel. 2. Versatility: users don’t have to switch to another app due to its versatility. 3. Table creation: complex tables can be created for advanced calculations. Why Power BI is highly preferred? One of the reasons it’s the go-to tool is that Power BI dashboard can be accessed on a mobile device and can be shared among co-workers. Although a dashboard contains a single page, Power BI report allow for more than one page. Data interrogation is possible with dashboards. Power BI uses a combo of dashboards and reports for specific usage. Monitoring a business gets easier as various metrics are available to analyze and look for answers. Integration of cloud and on-premises data gives a compact view regardless of data location. Apart from its appealing looks, the tiles appear dynamic and change alongside the circulating data to facilitate updates. Prebuilt reports are also available for SaaS solutions. Secure environment, quick deployment, and hybrid configuration are a big plus of Power BI. Start Learning Power BI Packed with versatile tools There are a bunch of Power BI tools that allow better interactivity 1. Data gateway: installed via admin, it acts as a bridge between on-premise data sources such as Live Query and Power BI service. 2. Service: an online software service, where admin sharing occurs via cloud. Dashboard, data models, and reports also get hosted. 3. Desktop: Primary tool for publishing and authoring. Used by developers for creating reports and models. 4. Report server: hosts several types of reports including mobile, Power BI, paginated, and KPIs. Gets updated every fourth month, as IT professionals manage it. 5. Mobile apps: made for windows, Android and iOS. On the report server, users can view the dashboard and reports. Power BI filters and Data sorting The filters in Power BI allow for refined results to appear based on value selection. Some commonly used filters are: 1. Report level 2. Visual level 3. Automatic 4. Page-level 5. Drill-through 6. Cross drill What’s better is that users get both basic and advanced modes of utilizing the filters to get the desired results. Check Business Analytics Course Content More factors that make Power BI the first choice 1. Q/A and custom pack 2. Quick spotting of Data trends 3. Available on the go access 4. Scheduling Data refresh 5. Intuitive and better UX features 6. Storing, analyzing, accessing huge amounts of data without hassles 7. Data integration into a centralized dashboard 8. Forecasting via inbuilt predictive models 9. Security features (row-level) 10. Various cloud services integration 11. Access control Apart from the listed plus points, one can also use Power BI API which allows pushing data into a set, rows to a table can be further added. The data then shows up in dashboard tiles as a visual in the reports. Advanced Excel Crash Course Conclusion Power BI is the right choice compared to excel when the target is 1. Maneuvering large data for insights 2. Creating complex, graphically interactive visualizations 3. Making tabular format reports 4. Collaborative teamwork 5. Dealing in business intelligence and profound data analysis

Corporate Clients