QuestionQuestion

Transcribed TextTranscribed Text

Data Analytics In [ ]: import time time.ctime() In [ ]: name = "ABC" # Enter your name user = "ABC" # Enter your BC username print("Final submission for {} ({})".format(name,user)) In [ ]: # You might find them helpful import math import statistics import pandas import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats # Basic package for basic univariate regres sions import statsmodels.api as statsmod # More sophisticated package for univ ariate and multivariate regressions In [ ]: # list all files in the current directory % ls 1. FOMC Minutes Background: The Federal Open Market Committee (FOMC), a committee within the Federal Reserve System, is charged under United States law with overseeing the nation's open market operations. This Federal Reserve committee makes key decisions about interest rates and the growth of the United States money supply. Committee members meet regularly (typically once a month or once every other month) to make these decisions. Their meeting minutes are made public shortly after the meeting. Goal: Your research team aims to (a) measure and quantify central bankers' positive and negative sentiment expressed during these high-level meetings, and then (b) examine how the two sentiment measures evolve over time on a yearly basis. Suggestive steps: 1.1 Download and unzip folder "ALL_FOMC_MINUTES_1990_2020". This folder provides all 242 FOMC meeting minutes from February 7, 1990 (see file '19900207.txt') to the latest meeting on March 15, 2020 (see file '20200315.txt'). 1.2 Create a "negative" sentiment score of the FOMC meeting on February 7, 1990, using the following equation: , where file "1.6_LM_negative.txt" provides a collection of negative words suggesed by Loughran and McDonald (https://sraf.nd.edu/textual-analysis/resources/ (https://sraf.nd.edu/textual-analysis/resources/)). Similarly, we can create a "positive" sentiment score of the FOMC meeting on February 7, 1990, using the following equation: , where file "1.6_LM_positive.txt" provides a collection of positive words from the same academic source. You can pick any text file that you think would be more interesting to you. In other words, you don't have to use the 19900207 file. I will help you start with several (but not all) input codes. In [ ]: # Input function def Input(filename): f = open(filename, 'r',encoding="mbcs") # or "utf8" # If "mbcs" doesn't work, you can try changing the encoding from "mbcs" to "ut f8". # "mbcs" works on some computer systems but not all; same for "utf8"; but the goal of both # encoding options is to clean up possible special characteristics in text fil es. lines = f.readlines() lines = [l.strip() for l in lines] # output will be a list of strings where e ach string corresponds to each line f.close() return lines In [ ]: # Import the negative and positive words list_neg = Input('1.6_LM_negative.txt') list_pos = Input('1.6_LM_positive.txt') In [ ]: # Import the FOMC meeting file # Your codes here In [ ]: # Calculate the negative score # Your codes here In [ ]: print("During this FOMC meeting, central bankers and officials expressed slightly more {} sentiment. To be specific, {} of {} words (or {:2.2f}%) are related to po sitive sentiment, while {:2.2f}% are related to negative sentiment.".format(YOUR A NSWERS HERE)) 1.3 Repeat steps in 1.2 for all the 242 FOMC meeting minutes files, and obtain a list of negative scores (call this list "NEGSCORE") and a list of positive scores (call this list "POSSCORE"). For instance, in list "NEGSCORE", the first element corresponds to the negative score calculated using the 19900207 FOMC minutes, and the last element corresponds to the negative score calculated using the 20200315 FOMC minutes. NegativeScore19900207 = × 100 CountsofNegativeWords19900207 TotalWordCounts19900207 PositiveScore19900207 = × 100 CountsofPositiveWords19900207 TotalWordCounts19900207 In [ ]: # Here is a complete list of file names, which might be useful in a for loop TITLE = ['19900207.txt','19900327.txt', '19900515.txt', '19900703.txt', '19900821. txt', '19901002.txt', '19901113.txt', '19901218.txt', '19910206.txt', '19910326. txt', '19910514.txt', '19910703.txt', '19910820.txt', '19911001.txt', '19911105. txt', '19911217.txt', '19920205.txt', '19920331.txt', '19920519.txt', '19920701. txt', '19920818.txt', '19921006.txt', '19921117.txt', '19921222.txt', '19930203. txt', '19930323.txt', '19930518.txt', '19930707.txt', '19930817.txt', '19930921. txt', '19931116.txt', '19931221.txt', '19940204.txt', '19940322.txt', '19940517. txt', '19940706.txt', '19940816.txt', '19940927.txt', '19941115.txt', '19941220. txt', '19950201.txt', '19950328.txt', '19950523.txt', '19950706.txt', '19950822. txt', '19950926.txt', '19951115.txt', '19951219.txt', '19960130.txt', '19960326. txt', '19960521.txt', '19960702.txt', '19960820.txt', '19960924.txt', '19961113. txt', '19961217.txt', '19970204.txt', '19970325.txt', '19970520.txt', '19970701. txt', '19970819.txt', '19970930.txt', '19971112.txt', '19971216.txt', '19980203. txt', '19980331.txt', '19980519.txt', '19980630.txt', '19980818.txt', '19980929. txt', '19981117.txt', '19981222.txt', '19990202.txt', '19990330.txt', '19990518. txt', '19990629.txt', '19990824.txt', '19991005.txt', '19991116.txt', '19991221. txt', '20000202.txt', '20000321.txt', '20000516.txt', '20000628.txt', '20000822. txt', '20001003.txt', '20001115.txt', '20001219.txt', '20010131.txt', '20010320. txt', '20010515.txt', '20010627.txt', '20010821.txt', '20011002.txt', '20011106. txt', '20011211.txt', '20020130.txt', '20020319.txt', '20020507.txt', '20020626. txt', '20020813.txt', '20020924.txt', '20021106.txt', '20021210.txt', '20030129. txt', '20030318.txt', '20030506.txt', '20030625.txt', '20030812.txt', '20030916. txt', '20031028.txt', '20031209.txt', '20040128.txt', '20040316.txt', '20040504. txt', '20040630.txt', '20040810.txt', '20040921.txt', '20041110.txt', '20041214. txt', '20050202.txt', '20050322.txt', '20050503.txt', '20050630.txt', '20050809. txt', '20050920.txt', '20051101.txt', '20051213.txt', '20060131.txt', '20060328. txt', '20060510.txt', '20060629.txt', '20060808.txt', '20060920.txt', '20061025. txt', '20061212.txt', '20070131.txt', '20070321.txt', '20070509.txt', '20070628. txt', '20070807.txt', '20070918.txt', '20071031.txt', '20071211.txt', '20080130. txt', '20080318.txt', '20080430.txt', '20080625.txt', '20080805.txt', '20080916. txt', '20081029.txt', '20081216.txt', '20090128.txt', '20090318.txt', '20090429. txt', '20090624.txt', '20090812.txt', '20090923.txt', '20091104.txt', '20091216. txt', '20100127.txt', '20100316.txt', '20100428.txt', '20100623.txt', '20100810. txt', '20100921.txt', '20101103.txt', '20101214.txt', '20110126.txt', '20110315. txt', '20110427.txt', '20110622.txt', '20110809.txt', '20110921.txt', '20111102. txt', '20111213.txt', '20120125.txt', '20120313.txt', '20120425.txt', '20120620. txt', '20120801.txt', '20120913.txt', '20121024.txt', '20121212.txt', '20130130. txt', '20130320.txt', '20130501.txt', '20130619.txt', '20130731.txt', '20130918. txt', '20131030.txt', '20131218.txt', '20140129.txt', '20140319.txt', '20140430. txt', '20140618.txt', '20140730.txt', '20140917.txt', '20141029.txt', '20141217. txt', '20150128.txt', '20150318.txt', '20150429.txt', '20150617.txt', '20150729. txt', '20150917.txt', '20151028.txt', '20151216.txt', '20160127.txt', '20160316. txt', '20160427.txt', '20160615.txt', '20160727.txt', '20160921.txt', '20161102. txt', '20161214.txt', '20170201.txt', '20170315.txt', '20170503.txt', '20170614. txt', '20170726.txt', '20170920.txt', '20171101.txt', '20171213.txt', '20180131. txt', '20180321.txt', '20180502.txt', '20180613.txt', '20180801.txt', '20180926. txt', '20181108.txt', '20181219.txt', '20190130.txt', '20190320.txt', '20190501. txt', '20190619.txt', '20190731.txt', '20190918.txt', '20191030.txt', '20191211. txt', '20200129.txt', '20200315.txt'] print(TITLE[0]) # FIRST FILE NAME print(TITLE[241]) # LAST FILE NAME print(type(TITLE),type(TITLE[0])) In [ ]: # Your codes here In [ ]: # print the following print(NEGSCORE[0], NEGSCORE[-1]) print(POSSCORE[25], POSSCORE[-25]) In [ ]: # Your codes here In [ ]: # print them out 1.5 Plot the time series of positive and negative scores in one plot. Appropriate legend labels, line coloring, and line shapes are expected to help differentiate the two lines. Save your plot to "sentiment_1990_2020_YOURBCID.png" (e.g., in my case, "sentiment_1990_2020_xuaeh.png"). Hint: plt.plot() In [ ]: # Your codes here 1.4 Obtain yearly positive and negative scores. That is, take an average of positive scores from all FOMC meetings within the same year; similarly, take an average of negative scores from all FOMC meetings within the same year. You should obtain 2 lists (or arrays), each with 31 numbers corresponding to the 31 years in the data (i.e., from Year 1990 to Year 2020). Finally, print them out. Hint: You probably have realized by now that there were different numbers of FOMC meetings in different years. Therefore, you cannot simply take the average of 12 numbers when calculating a yearly average. Instead, notice that the first 4 digits of each element in "TITLE" (a list variable that was defined in 1.3) indicates Year already. You can use this information; for example, to obtain the yearly negative score of Year 1990, one can keep all negative scores from meetings where the first 4 digits of the corresponding meeting filenames are "1990", and then take the average. This question can be answered using list or numpy. 2. Regression Analysis Motivation and question: In Question 1, we obtained yearly negative sentiment scores. By construction, the higher the negative score is, the more negative sentiment there was during FOMC meetings of that year. However, it is still unclear: what is negative sentiment? Because the purpose of these meetings is indeed to discuss the recent past market and economic conditions and future interest rate strategies, it would be interesting to understand what fundamental events or indicators explain negative sentiment. This is what we are after in this question. Suggestive steps: Please download "QUESTION2_DATA.csv", Column 1: year, 1990-2020 Column 2: yearly negative sentiment score, constructed from Question 1 [You may use my answers to check your solutions in Q1 :)] Column 3: yearly litigation score, constructed using FOMC meeting minutes and a "litigious" dictionary. The higher the score is, the more discussions or concerns about business lawsuits there were at the meeting. Column 4: yearly ambiguity score, constructed using FOMC meeting minutes and an "ambiguity" dictionary. It captures how uncertain central bankers are about their own comments, opinions, and speeches made during the meeting. This score is measured by the percentage of ambiguity words. The higher the score is, the more ambiguity there was at the meeting. In short, the methodology is exactly the same as in Question 1. Column 5: yearly VIX index (source: CBOE). This measure indicates stock market fear and anxiety from investors. It is usually dubbed as the "fear index" by mainstream newspapers. The higher the index is, the higher the anxiety there is in the market. As a result, to put it in a context of a regression framework, our dependent variable is the negative sentiment score (Column 2), and the three potential explanatory variables capture three different perspectives of the economy: Business litigation concerns (Column 3) Ambiguity and belief (Column 4) Stock market anxiety (Column 5) Below, you can choose your own regression framework; for instance, you can run 1 regression with all three explanatory variables; or perhaps, you want to run univariate regressions first and then include more variables. To receive full points, you need to write at least 5 full sentences (1) describing your regression results (i.e., from the regression table) and (2) making useful inferences (i.e., what can we learn from your results?). In [ ]: def reg_m(y, x): X = np.hstack((np.ones((len(x),1)), x)) # adds column of ones to X results = statsmod.OLS(y, X).fit() # creates object containing regression results return results 2.1 Import and understand the csv dataset using numpy In [ ]: # Import data Q2DATA = np.genfromtxt("./QUESTION2_DATA.csv", delimiter=',') # import data Q2DATA_n = np.array(["YEAR", "NEGSCORE","LITSCORE", "ABGSCORE", "VIX"]) In [ ]: print(Q2DATA[0:2]) In [ ]: Q2DATA = Q2DATA[1:, :] In [ ]: Q2DATA.shape 2.2 Regressions In [ ]: # Your codes here 2.3 Please enter your discussions below. The box below is what-we-call a markdown box; please double click on it, and you will see that "Type and Latex: " becomes a blank box where you can type in your discussions. Once you finish composing, you can use shift-enter to compile it as usual. If you want to change your writing after you compiled it, you can double click on the same area and it will change back to its editing mode. In [ ]: import time time.ctime() Markdown α 2

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

Python Programming: Data and Regression Analysis Python Programming: Data and Regression Analysis
    $60.00 for this solution

    PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

    Find A Tutor

    View available Python Programming Tutors

    Get College Homework Help.

    Are you sure you don't want to upload any files?

    Fast tutor response requires as much info as possible.

    Decision:
    Upload a file
    Continue without uploading

    SUBMIT YOUR HOMEWORK
    We couldn't find that subject.
    Please select the best match from the list below.

    We'll send you an email right away. If it's not in your inbox, check your spam folder.

    • 1
    • 2
    • 3
    Live Chats