Tagged: robots.txt
-
AuthorPosts
-
July 17, 2020 at 8:34 am #1231134
I looked at this thread https://kriesi.at/support/topic/google-indexing-wp-contentthemes-content-even-after-blocking-with-robotx-txt/ and saw that it was discouraged to disallow these files.
When I google site:raleighteletherapy.com I see the pages I want to see, but I am also seeing 9 or 10 wp-content/theme files that don’t need to be there.
My current bots file is this (I do see y’all’s comment in the thread to use htaccess to block directories, but I don’t know if maybe that has changed in the past 4 years?):
User-agent: *
Disallow: /wp-admin/
crawl-delay: 10
Disallow: /feed/
Disallow: /*/feed/
Disallow: /xmlrpc
Disallow: /?p=
Disallow: /*trackback
Allow: /wp-admin/admin-ajax.php
Allow: /wp-content/uploads/
Allow: /wp-content/plugins/
Allow: /wp-*/*.css
Allow: /wp-*/*.jsSitemap: https://raleighteletherapy.com/sitemap_index.xml
Do I need to add anything, move anything, remove anything in order to get those wp-content/theme pages off of google’s index? My site is very new and still being crawled. I’m looking at what is allowed in this and am not sure why plugins need to be indexed/crawled?
Thanks y’all
JonJuly 20, 2020 at 5:06 am #1231614Google has now indexed a good number of other files that are not in my sitemap. All of them are theme files etc. Things that I don’t think need to be indexed, but I’ll defer to y’all’s expertise on it.
J
July 21, 2020 at 8:11 pm #1232084Can anybody (Enfold support OR general community here) tell me if these need to be indexed, or if they could dilute my rankings? In Google Search Console, they show up under the’ Coverage > Indexed, not submitted in sitemap’:
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/avia_shortcodes/tinymce/img/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/avia_shortcodes/tinymce/js/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/avia_shortcodes/sc/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/avia_shortcodes/tinymce/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/avia_shortcodes/img/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/images/icons/new/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/avia_shortcodes/css/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/wordpress-importer/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/images/icons/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/avia_shortcodes/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/font-management/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/images/misc/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/images/colorpicker/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/images/layout/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/js/conditional_load/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/auto-updates/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/images/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/php/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/js/
https://raleighteletherapy.com/wp-content/themes/enfold/framework/Most of these at first glance seem to be unrelated to my services and the content of my website, but I know that there may be elements in there that may have a positive impact that I may not be aware of.
July 23, 2020 at 1:59 pm #1232593Hi,
Sorry for the late reply, try adding “Disallow: /framework/” to your robots.txt
These don’t need to be indexed.Best regards,
MikeJuly 29, 2020 at 8:59 am #1233910Thanks Mike. The folks over at Google Webmaster Forum said that I ought to leave them? I’m very confused as I tend to agree with you, but I admittedly am not a tech expert, and certainly not a Google expert. I’m lucky if I don’t crash the entire internet when I tie my shoes.
Here are the 2 responses I get from the folks at Google Search Console Community Help Forum:
____________________________________________________________________________________________
1) Barry is shown to be a Platinum Product Expert, and I have had some good help from him in the past:
He says,
“That pages appears to be generated by mod_autoindex
https://httpd.apache.org/docs/2.4/mod/mod_autoindex.html
Couple of options
1) Ignore them. these are not important pages – doesn’t really matter that google has indexed them.
2) Disable them. Do you really need mod_autoindex? Most of the time is enabled by accident, rather than something
wilful. Doesn’t really harm for most part, but doesnt help anything either.
https://www.google.com/search?q=disable+mod_autoindex
3) Noindex them. As not important pags, could just prevent google indexing them. (while keeping them available for
USERs). eg could inject the noindex HTTP header. Can’t be done with robots.txt
https://support.google.com/webmasters/answer/93710?hl=en
____________________________________________________________________________________________
2) William does not show to be “expert”, but does seem to know what he is talking about . . . but that has backfired on me before.
He says,
“Hi Jonathan, most of those folders contain resources (JS, CSS and images) but they do not have any content so they can not rank on their own for any keywords. Have not said that, Google Bot needs to be allowed to access all resources that are required for rendering the page.Those files will not hurt your ranking, but if you want to block them to reduce server load, make sure you do not block resources that are required to render the page, you can use the Google Mobile Friendly tool to find out if you are blocking any resources that are required for rendering.”
____________________________________________________________________________________________
I am barely able to understand these things. Are you able to make heads or tails of it? If having those pages indexed won’t hurt anything, then I’ll just leave them (unless they start showing up when people search for Anxiety Counseling in Austin (and other related searches).No problem on the delay. I know Covid has us all on wonky schedules. Thanks!
JonJuly 29, 2020 at 1:04 pm #1233939Hi,
Thank you for the feedback, I recommend Barry’s approach, and see if you can disable “mod_autoindex”, please ask your webhost for assistance. I would also check your .htaccess file forOptions +Indexes
and change to or addOptions -Indexes
as this link pointed to.
As for just ignoring them, you can do that but I believe someday they may show up next to your keywords, I’m not an Apache expert, and I had not heard of “mod_autoindex” before, but after reading the links this does seem to be the answer.
So if you can try this and report back in a few weeks to see if these Google results have dropped, it would be great.BTW, William is correct about Google needs access to all assets like css & js, but within the /framework/ directory the assets are for the backend admin view, and not the frontend of your site.
PS, after trying this you will probably need to go to your Google Webmaster account and request a new index or submit a “sitemap” to force a re-index to drop the other results, and then wait a few weeks.
Best regards,
MikeSeptember 1, 2023 at 4:12 am #1417740Not sure how I missed this last reply. Looks like it was around the start of Covid stuff, so my practice went haywire with going all virtual somewhere in 2020. I am going to review this thread now and go see if these things are still showing up or not. I’ll check back soon. If y’all have any new info in the past few years on it, please feel free to link to those threads and I’ll check them out as well.
Thanks
JonSeptember 1, 2023 at 4:19 am #1417741Quick update after Googling site:raleighteletherapy.com
The only 2 things that are now listed that shouldn’t be are:
https://raleighteletherapy.com/wp-content/themes/enfold/framework/images/colorpicker/
and
https://raleighteletherapy.com/wp-content/themes/enfold/framework/js/deprecated/September 1, 2023 at 11:58 am #1417792 -
AuthorPosts
- You must be logged in to reply to this topic.